Storage Migration Guide¶
Overview¶
Readur supports migrating documents between storage backends (Local ↔ S3) using a built-in migration tool. This enterprise-grade utility ensures safe, reliable data migration with comprehensive rollback capabilities.
When You Need This¶
- Moving from local filesystem to S3 cloud storage
- Switching between S3 buckets or regions
- Disaster recovery scenarios
- Infrastructure upgrades or server migrations
- Scaling to cloud-based storage
Migration Tool Features¶
✅ Dry-run mode - Test migration without making any changes
✅ Progress tracking - Resume interrupted migrations from saved state
✅ Rollback capability - Complete undo functionality if needed
✅ Batch processing - Efficiently handle large datasets
✅ Associated files - Automatically migrates thumbnails & processed images
✅ Data integrity - Verifies successful uploads before cleanup
✅ Selective migration - Migrate specific users or document sets
Prerequisites¶
System Requirements¶
- Admin access to your Readur deployment
- Ability to run commands on the server (Docker exec or direct access)
- Sufficient disk space for temporary files during migration
- Network connectivity to target storage (S3)
Before You Start¶
-
Complete database backup
-
File system backup (if migrating from local storage)
-
S3 credentials configured (for S3 migrations)
- Verify bucket access and permissions
- Test connectivity with AWS CLI
Step-by-Step Migration Process¶
Step 1: Configure Target Storage¶
For S3 migrations, ensure environment variables are set:
# Required S3 configuration
export S3_BUCKET_NAME="your-readur-bucket"
export S3_ACCESS_KEY_ID="your-access-key"
export S3_SECRET_ACCESS_KEY="your-secret-key"
export S3_REGION="us-east-1"
# Optional: Custom endpoint for S3-compatible services
export S3_ENDPOINT="https://s3.amazonaws.com"
Step 2: Test with Dry Run¶
Always start with a dry run to validate the migration plan:
# Docker deployment
docker exec readur-app cargo run --bin migrate_to_s3 -- --dry-run
# Direct deployment
./target/release/migrate_to_s3 --dry-run
# Dry run for specific user
docker exec readur-app cargo run --bin migrate_to_s3 -- --dry-run --user-id "uuid-here"
The dry run will show: - Number of documents to migrate - Estimated data transfer size - Potential issues or conflicts - Expected migration time
Step 3: Run the Migration¶
Once dry run looks good, execute the actual migration:
# Full migration with rollback enabled (recommended)
docker exec readur-app cargo run --bin migrate_to_s3 -- --enable-rollback
# Migration with progress tracking
docker exec readur-app cargo run --bin migrate_to_s3 -- --enable-rollback --verbose
# User-specific migration
docker exec readur-app cargo run --bin migrate_to_s3 -- --enable-rollback --user-id "uuid-here"
Step 4: Monitor Progress¶
The migration tool provides real-time progress updates:
📊 Migration Progress:
┌─────────────────────────────────────────────────────────────┐
│ Documents: 1,247 / 2,500 (49.9%) │
│ Data Transferred: 2.3 GB / 4.7 GB │
│ Time Elapsed: 00:15:32 │
│ ETA: 00:16:12 │
│ Current: uploading user_documents/report_2024.pdf │
└─────────────────────────────────────────────────────────────┘
Step 5: Verify Migration¶
After completion, verify the migration was successful:
# Check migration status
docker exec readur-app cargo run --bin migrate_to_s3 -- --status
# Verify document count matches
docker exec readur-app psql -d readur -c "SELECT COUNT(*) FROM documents;"
# Test document access through API
curl -H "Authorization: Bearer YOUR_TOKEN" \
"https://your-readur-instance.com/api/documents/sample-uuid/download"
Step 6: Update Configuration¶
Update your deployment configuration to use the new storage backend:
# docker-compose.yml
environment:
- STORAGE_BACKEND=s3
- S3_BUCKET_NAME=your-readur-bucket
- S3_ACCESS_KEY_ID=your-access-key
- S3_SECRET_ACCESS_KEY=your-secret-key
- S3_REGION=us-east-1
Restart the application to use the new storage configuration.
Advanced Usage¶
Resuming Interrupted Migrations¶
If a migration is interrupted, you can resume from the saved state:
# Resume from automatically saved state
docker exec readur-app cargo run --bin migrate_to_s3 -- --resume-from /tmp/migration_state.json
# Check what migrations are available to resume
ls /tmp/migration_state_*.json
Rolling Back a Migration¶
If you need to undo a migration:
# Rollback using saved state file
docker exec readur-app cargo run --bin migrate_to_s3 -- --rollback /tmp/migration_state.json
# Verify rollback completion
docker exec readur-app cargo run --bin migrate_to_s3 -- --rollback-status
Batch Processing Large Datasets¶
For very large document collections:
# Process in smaller batches
docker exec readur-app cargo run --bin migrate_to_s3 -- \
--enable-rollback \
--batch-size 1000 \
--parallel-uploads 5
Migration Scenarios¶
Scenario 1: Local to S3 (Most Common)¶
# 1. Configure S3 credentials
export S3_BUCKET_NAME="company-readur-docs"
export S3_ACCESS_KEY_ID="AKIA..."
export S3_SECRET_ACCESS_KEY="..."
# 2. Test the migration
docker exec readur-app cargo run --bin migrate_to_s3 -- --dry-run
# 3. Run migration with safety features
docker exec readur-app cargo run --bin migrate_to_s3 -- --enable-rollback
# 4. Update docker-compose.yml to use S3
# 5. Restart application
Scenario 2: S3 to Different S3 Bucket¶
# 1. Configure new bucket credentials
export S3_BUCKET_NAME="new-bucket-name"
# 2. Migrate to new bucket
docker exec readur-app cargo run --bin migrate_to_s3 -- --enable-rollback
# 3. Update configuration
Scenario 3: Migrating Specific Users¶
# Get user IDs that need migration
docker exec readur-app psql -d readur -c \
"SELECT id, email FROM users WHERE created_at > '2024-01-01';"
# Migrate each user individually
for user_id in $user_ids; do
docker exec readur-app cargo run --bin migrate_to_s3 -- \
--enable-rollback --user-id "$user_id"
done
Performance Considerations¶
Optimization Tips¶
- Network Bandwidth: Migration speed depends on upload bandwidth to S3
- Parallel Processing: The tool automatically optimizes concurrent uploads
- Large Files: Files over 100MB use multipart uploads for better performance
- Memory Usage: Migration is designed to use minimal memory regardless of file sizes
Expected Performance¶
Document Count | Typical Time | Network Impact |
---|---|---|
< 1,000 | 5-15 minutes | Low |
1,000-10,000 | 30-90 minutes | Medium |
10,000+ | 2-8 hours | High |
Security Considerations¶
Data Protection¶
- All transfers use HTTPS/TLS encryption
- Original files remain until migration is verified
- Database transactions ensure consistency
- Rollback preserves original state
Access Control¶
- Migration tool respects existing file permissions
- S3 bucket policies should match security requirements
- Consider enabling S3 server-side encryption
Audit Trail¶
- All migration operations are logged
- State files contain complete operation history
- Failed operations are tracked for debugging
Next Steps¶
After successful migration:
- Monitor the application for any storage-related issues
- Update backup procedures to include S3 data
- Configure S3 lifecycle policies for cost optimization
- Set up monitoring for S3 usage and costs
- Clean up local files once confident in migration success
Support¶
If you encounter issues during migration:
- Check the troubleshooting guide
- Review application logs for detailed error messages
- Use the
--verbose
flag for detailed migration output - Keep state files for support debugging
Remember: Always test migrations in a staging environment first when possible.