Skip to content

S3 Storage Backend Guide for Readur

Overview

Starting with version 2.5.4, Readur supports Amazon S3 and S3-compatible storage services as an alternative to local filesystem storage. This implementation provides full support for AWS S3, MinIO, Wasabi, Backblaze B2, and other S3-compatible services with automatic multipart upload for files larger than 100MB, structured storage paths with year/month organization, and automatic retry mechanisms with exponential backoff.

This guide provides comprehensive instructions for configuring, deploying, and managing Readur with S3 storage.

Key Benefits

  • Scalability: Unlimited storage capacity without local disk constraints
  • Durability: 99.999999999% (11 9's) durability with AWS S3
  • Cost-Effective: Pay only for what you use with various storage tiers
  • Global Access: Access documents from anywhere with proper credentials
  • Backup: Built-in versioning and cross-region replication capabilities

Table of Contents

  1. Prerequisites
  2. Configuration
  3. Migration from Local Storage
  4. Storage Structure
  5. Performance Optimization
  6. Troubleshooting
  7. Best Practices

Prerequisites

Before configuring S3 storage, ensure you have:

  1. S3 Bucket Access
  2. An AWS S3 bucket or S3-compatible service (MinIO, Wasabi, Backblaze B2, etc.)
  3. Access Key ID and Secret Access Key with appropriate permissions
  4. Bucket name and region information

  5. Required S3 Permissions

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "s3:PutObject",
            "s3:GetObject",
            "s3:DeleteObject",
            "s3:ListBucket",
            "s3:HeadObject",
            "s3:HeadBucket",
            "s3:AbortMultipartUpload",
            "s3:CreateMultipartUpload",
            "s3:UploadPart",
            "s3:CompleteMultipartUpload"
          ],
          "Resource": [
            "arn:aws:s3:::your-bucket-name/*",
            "arn:aws:s3:::your-bucket-name"
          ]
        }
      ]
    }
    

  6. Readur Build Requirements

  7. Readur must be compiled with the s3 feature flag enabled
  8. Build command: cargo build --release --features s3

Configuration

Environment Variables

Configure S3 storage by setting the following environment variables:

# Enable S3 storage backend
S3_ENABLED=true

# Required S3 credentials
S3_BUCKET_NAME=readur-documents
S3_ACCESS_KEY_ID=your-access-key-id
S3_SECRET_ACCESS_KEY=your-secret-access-key
S3_REGION=us-east-1

# Optional: For S3-compatible services (MinIO, Wasabi, etc.)
S3_ENDPOINT=https://s3-compatible-endpoint.com

Configuration File Example (.env)

# Database Configuration
DATABASE_URL=postgresql://readur:password@localhost/readur

# Server Configuration
SERVER_ADDRESS=0.0.0.0:8000
JWT_SECRET=your-secure-jwt-secret

# S3 Storage Configuration
S3_ENABLED=true
S3_BUCKET_NAME=readur-production
S3_ACCESS_KEY_ID=AKIAIOSFODNN7EXAMPLE
S3_SECRET_ACCESS_KEY=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
S3_REGION=us-west-2

# Optional S3 endpoint for compatible services
# S3_ENDPOINT=https://minio.example.com

# Upload Configuration
UPLOAD_PATH=./temp_uploads
MAX_FILE_SIZE_MB=500

S3-Compatible Services Configuration

MinIO

S3_ENABLED=true
S3_BUCKET_NAME=readur-bucket
S3_ACCESS_KEY_ID=minioadmin
S3_SECRET_ACCESS_KEY=minioadmin
S3_REGION=us-east-1
S3_ENDPOINT=http://localhost:9000

Wasabi

S3_ENABLED=true
S3_BUCKET_NAME=readur-bucket
S3_ACCESS_KEY_ID=your-wasabi-key
S3_SECRET_ACCESS_KEY=your-wasabi-secret
S3_REGION=us-east-1
S3_ENDPOINT=https://s3.wasabisys.com

Backblaze B2

S3_ENABLED=true
S3_BUCKET_NAME=readur-bucket
S3_ACCESS_KEY_ID=your-b2-key-id
S3_SECRET_ACCESS_KEY=your-b2-application-key
S3_REGION=us-west-002
S3_ENDPOINT=https://s3.us-west-002.backblazeb2.com

Migration from Local Storage

Using the Migration Tool

Readur includes a migration utility to transfer existing local files to S3:

  1. Prepare for Migration

    # Backup your database first
    pg_dump readur > readur_backup.sql
    
    # Set S3 configuration
    export S3_ENABLED=true
    export S3_BUCKET_NAME=readur-production
    export S3_ACCESS_KEY_ID=your-key
    export S3_SECRET_ACCESS_KEY=your-secret
    export S3_REGION=us-east-1
    

  2. Run Dry Run First

    # Preview what will be migrated
    cargo run --bin migrate_to_s3 --features s3 -- --dry-run
    

  3. Execute Migration

    # Migrate all files
    cargo run --bin migrate_to_s3 --features s3
    
    # Migrate with options
    cargo run --bin migrate_to_s3 --features s3 -- \
      --delete-local \           # Delete local files after successful upload
      --limit 100 \              # Limit to 100 files (for testing)
      --enable-rollback          # Enable automatic rollback on failure
    

  4. Migrate Specific User's Files

    cargo run --bin migrate_to_s3 --features s3 -- \
      --user-id 550e8400-e29b-41d4-a716-446655440000
    

  5. Resume Failed Migration

    # Resume from specific document ID
    cargo run --bin migrate_to_s3 --features s3 -- \
      --resume-from 550e8400-e29b-41d4-a716-446655440001
    

Migration Process Details

The migration tool performs the following steps:

  1. Connects to database and S3
  2. Identifies all documents with local file paths
  3. For each document:
  4. Reads the local file
  5. Uploads to S3 with structured path
  6. Updates database with S3 path
  7. Migrates associated thumbnails and processed images
  8. Optionally deletes local files
  9. Tracks migration state for recovery
  10. Supports rollback on failure

Post-Migration Verification

-- Check migrated documents
SELECT 
    COUNT(*) FILTER (WHERE file_path LIKE 's3://%') as s3_documents,
    COUNT(*) FILTER (WHERE file_path NOT LIKE 's3://%') as local_documents
FROM documents;

-- Find any remaining local files
SELECT id, filename, file_path 
FROM documents 
WHERE file_path NOT LIKE 's3://%'
LIMIT 10;

Storage Structure

S3 Path Organization

Readur uses a structured path format in S3:

bucket-name/
├── documents/
│   └── {user_id}/
│       └── {year}/
│           └── {month}/
│               └── {document_id}.{extension}
├── thumbnails/
│   └── {user_id}/
│       └── {document_id}_thumb.jpg
└── processed_images/
    └── {user_id}/
        └── {document_id}_processed.png

Example Paths

readur-production/
├── documents/
│   └── 550e8400-e29b-41d4-a716-446655440000/
│       └── 2024/
│           └── 03/
│               ├── 123e4567-e89b-12d3-a456-426614174000.pdf
│               └── 987fcdeb-51a2-43f1-b321-123456789abc.docx
├── thumbnails/
│   └── 550e8400-e29b-41d4-a716-446655440000/
│       ├── 123e4567-e89b-12d3-a456-426614174000_thumb.jpg
│       └── 987fcdeb-51a2-43f1-b321-123456789abc_thumb.jpg
└── processed_images/
    └── 550e8400-e29b-41d4-a716-446655440000/
        ├── 123e4567-e89b-12d3-a456-426614174000_processed.png
        └── 987fcdeb-51a2-43f1-b321-123456789abc_processed.png

Performance Optimization

Multipart Upload

Readur automatically uses multipart upload for files larger than 100MB:

  • Chunk Size: 16MB per part
  • Automatic Retry: Exponential backoff with up to 3 retries
  • Progress Tracking: Real-time upload progress via WebSocket

Network Optimization

  1. Region Selection: Choose S3 region closest to your Readur server
  2. Transfer Acceleration: Enable S3 Transfer Acceleration for global users
  3. CloudFront CDN: Use CloudFront for serving frequently accessed documents

Caching Strategy

# Nginx caching configuration for S3-backed documents
location /api/documents/ {
    proxy_cache_valid 200 1h;
    proxy_cache_valid 404 1m;
    proxy_cache_bypass $http_authorization;
    add_header X-Cache-Status $upstream_cache_status;
}

Troubleshooting

Common Issues and Solutions

1. S3 Connection Errors

Error: "Failed to access S3 bucket"

Solution:

# Verify credentials
aws s3 ls s3://your-bucket-name --profile readur

# Check IAM permissions
aws iam get-user-policy --user-name readur-user --policy-name ReadurS3Policy

# Test connectivity
curl -I https://s3.amazonaws.com/your-bucket-name

2. Upload Failures

Error: "Failed to store file: RequestTimeout"

Solution: - Check network connectivity - Verify S3 endpoint configuration - Increase timeout values if using S3-compatible service - Monitor S3 request metrics in AWS CloudWatch

3. Permission Denied

Error: "AccessDenied: Access Denied"

Solution:

# Verify bucket policy
aws s3api get-bucket-policy --bucket your-bucket-name

# Check object ACLs
aws s3api get-object-acl --bucket your-bucket-name --key test-object

# Ensure CORS configuration for web access
aws s3api put-bucket-cors --bucket your-bucket-name --cors-configuration file://cors.json

4. Migration Stuck

Problem: Migration process hangs or fails repeatedly

Solution:

# Check migration state
cat migration_state.json | jq '.failed_migrations'

# Resume from last successful migration
LAST_SUCCESS=$(cat migration_state.json | jq -r '.completed_migrations[-1].document_id')
cargo run --bin migrate_to_s3 --features s3 -- --resume-from $LAST_SUCCESS

# Force rollback if needed
cargo run --bin migrate_to_s3 --features s3 -- --rollback

Debugging S3 Operations

Enable detailed S3 logging:

# Set environment variables for debugging
export RUST_LOG=readur=debug,aws_sdk_s3=debug
export AWS_SDK_LOAD_CONFIG=true

# Run Readur with debug logging
cargo run --features s3

Performance Monitoring

Monitor S3 performance metrics:

-- Query document upload times
SELECT 
    DATE(created_at) as upload_date,
    AVG(file_size / 1024.0 / 1024.0) as avg_size_mb,
    COUNT(*) as documents_uploaded,
    AVG(EXTRACT(EPOCH FROM (updated_at - created_at))) as avg_processing_time_seconds
FROM documents
WHERE file_path LIKE 's3://%'
GROUP BY DATE(created_at)
ORDER BY upload_date DESC;

Best Practices

1. Security

  • Encryption: Enable S3 server-side encryption (SSE-S3 or SSE-KMS)
  • Access Control: Use IAM roles instead of access keys when possible
  • Bucket Policies: Implement least-privilege bucket policies
  • VPC Endpoints: Use VPC endpoints for private S3 access
# Enable default encryption on bucket
aws s3api put-bucket-encryption \
    --bucket readur-production \
    --server-side-encryption-configuration '{
        "Rules": [{
            "ApplyServerSideEncryptionByDefault": {
                "SSEAlgorithm": "AES256"
            }
        }]
    }'

2. Cost Optimization

  • Lifecycle Policies: Archive old documents to Glacier
  • Intelligent-Tiering: Enable for automatic cost optimization
  • Request Metrics: Monitor and optimize S3 request patterns
{
  "Rules": [{
    "Id": "ArchiveOldDocuments",
    "Status": "Enabled",
    "Transitions": [{
      "Days": 90,
      "StorageClass": "GLACIER"
    }],
    "NoncurrentVersionTransitions": [{
      "NoncurrentDays": 30,
      "StorageClass": "GLACIER"
    }]
  }]
}

3. Reliability

  • Versioning: Enable S3 versioning for document recovery
  • Cross-Region Replication: Set up for disaster recovery
  • Backup Strategy: Regular backups to separate bucket or region
# Enable versioning
aws s3api put-bucket-versioning \
    --bucket readur-production \
    --versioning-configuration Status=Enabled

# Set up replication
aws s3api put-bucket-replication \
    --bucket readur-production \
    --replication-configuration file://replication.json

4. Monitoring

Set up CloudWatch alarms for: - High error rates - Unusual request patterns - Storage quota approaching - Failed multipart uploads

# Create CloudWatch alarm for S3 errors
aws cloudwatch put-metric-alarm \
    --alarm-name readur-s3-errors \
    --alarm-description "Alert on S3 4xx errors" \
    --metric-name 4xxErrors \
    --namespace AWS/S3 \
    --statistic Sum \
    --period 300 \
    --threshold 10 \
    --comparison-operator GreaterThanThreshold

5. Compliance

  • Data Residency: Ensure S3 region meets data residency requirements
  • Audit Logging: Enable S3 access logging and AWS CloudTrail
  • Retention Policies: Implement compliant data retention policies
  • GDPR Compliance: Implement proper data deletion procedures
# Enable access logging
aws s3api put-bucket-logging \
    --bucket readur-production \
    --bucket-logging-status '{
        "LoggingEnabled": {
            "TargetBucket": "readur-logs",
            "TargetPrefix": "s3-access/"
        }
    }'

Next Steps