Skip to content

S3 Storage Troubleshooting Guide

Overview

This guide addresses common issues encountered when using S3 storage with Readur and provides detailed solutions.

Quick Diagnostics

S3 Health Check Script

#!/bin/bash
# s3-health-check.sh

echo "Readur S3 Storage Health Check"
echo "=============================="

# Load configuration
source .env

# Check S3 connectivity
echo -n "1. Checking S3 connectivity... "
if aws s3 ls s3://$S3_BUCKET_NAME --region $S3_REGION > /dev/null 2>&1; then
    echo "✓ Connected"
else
    echo "✗ Failed"
    echo "   Error: Cannot connect to S3 bucket"
    exit 1
fi

# Check bucket permissions
echo -n "2. Checking bucket permissions... "
TEST_FILE="/tmp/readur-test-$$"
echo "test" > $TEST_FILE

if aws s3 cp $TEST_FILE s3://$S3_BUCKET_NAME/test-write-$$ --region $S3_REGION > /dev/null 2>&1; then
    echo "✓ Write permission OK"
    aws s3 rm s3://$S3_BUCKET_NAME/test-write-$$ --region $S3_REGION > /dev/null 2>&1
else
    echo "✗ Write permission failed"
fi
rm -f $TEST_FILE

# Check multipart upload
echo -n "3. Checking multipart upload capability... "
if aws s3api put-bucket-accelerate-configuration \
    --bucket $S3_BUCKET_NAME \
    --accelerate-configuration Status=Suspended \
    --region $S3_REGION > /dev/null 2>&1; then
    echo "✓ Multipart enabled"
else
    echo "⚠ May not have full permissions"
fi

echo ""
echo "Health check complete!"

Common Issues and Solutions

1. Connection Issues

Problem: "Failed to access S3 bucket"

Symptoms: - Error during startup - Cannot upload documents - Migration tool fails immediately

Diagnosis:

# Test basic connectivity
aws s3 ls s3://your-bucket-name

# Check credentials
aws sts get-caller-identity

# Verify region
aws s3api get-bucket-location --bucket your-bucket-name

Solutions:

Incorrect credentials: Verify that your AWS credentials are properly configured and accessible.

# Verify environment variables
echo $S3_ACCESS_KEY_ID
echo $S3_SECRET_ACCESS_KEY

# Test with AWS CLI
export AWS_ACCESS_KEY_ID=$S3_ACCESS_KEY_ID
export AWS_SECRET_ACCESS_KEY=$S3_SECRET_ACCESS_KEY
aws s3 ls

Wrong region: Ensure the S3 region configuration matches your bucket's actual region.

# Find correct region
aws s3api get-bucket-location --bucket your-bucket-name

# Update configuration
export S3_REGION=correct-region

Network issues: Test network connectivity to S3 endpoints and resolve any connection problems.

# Test network connectivity
curl -I https://s3.amazonaws.com

# Check DNS resolution
nslookup s3.amazonaws.com

# Test with specific endpoint
curl -I https://your-bucket.s3.amazonaws.com

2. Permission Errors

Problem: "AccessDenied: Access Denied"

Symptoms: - Can list bucket but cannot upload - Can upload but cannot delete - Partial operations succeed

Diagnosis:

# Check IAM user permissions
aws iam get-user-policy --user-name readur-user --policy-name ReadurPolicy

# Test specific operations
aws s3api put-object --bucket your-bucket --key test.txt --body /tmp/test.txt
aws s3api get-object --bucket your-bucket --key test.txt /tmp/downloaded.txt
aws s3api delete-object --bucket your-bucket --key test.txt

Solutions:

Update IAM policy: Ensure your IAM user or role has all the necessary permissions for S3 operations.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:ListBucket",
        "s3:GetBucketLocation"
      ],
      "Resource": "arn:aws:s3:::your-bucket-name"
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject",
        "s3:DeleteObject",
        "s3:PutObjectAcl",
        "s3:GetObjectAcl"
      ],
      "Resource": "arn:aws:s3:::your-bucket-name/*"
    }
  ]
}

  1. Check bucket policy:

    aws s3api get-bucket-policy --bucket your-bucket-name
    

  2. Verify CORS configuration:

    {
      "CORSRules": [
        {
          "AllowedOrigins": ["*"],
          "AllowedMethods": ["GET", "PUT", "POST", "DELETE", "HEAD"],
          "AllowedHeaders": ["*"],
          "ExposeHeaders": ["ETag"],
          "MaxAgeSeconds": 3000
        }
      ]
    }
    

3. Upload Failures

Problem: Large files fail to upload

Symptoms: - Small files upload successfully - Large files timeout or fail - "RequestTimeout" errors

Diagnosis:

# Check multipart upload configuration
aws s3api list-multipart-uploads --bucket your-bucket-name

# Test large file upload
dd if=/dev/zero of=/tmp/large-test bs=1M count=150
aws s3 cp /tmp/large-test s3://your-bucket-name/test-large

Solutions:

Increase timeouts: Configure longer timeout values to accommodate large file uploads.

// In code configuration
const UPLOAD_TIMEOUT: Duration = Duration::from_secs(3600);

Optimize chunk size: Adjust multipart upload chunk sizes based on your network conditions.

# For slow connections, use smaller chunks
export S3_MULTIPART_CHUNK_SIZE=8388608  # 8MB chunks

Resume failed uploads: Clean up incomplete multipart uploads and retry failed transfers.

# List incomplete multipart uploads
aws s3api list-multipart-uploads --bucket your-bucket-name

# Abort stuck uploads
aws s3api abort-multipart-upload \
  --bucket your-bucket-name \
  --key path/to/file \
  --upload-id UPLOAD_ID

4. S3-Compatible Service Issues

Problem: MinIO/Wasabi/Backblaze not working

Symptoms: - AWS S3 works but compatible service doesn't - "InvalidEndpoint" errors - SSL certificate errors

Solutions:

  1. MinIO configuration:
    # Correct endpoint format
    S3_ENDPOINT=http://minio.local:9000  # No https:// for local
    S3_ENDPOINT=https://minio.example.com  # With SSL
    
    # Path-style addressing
    S3_FORCE_PATH_STYLE=true
    

Wasabi configuration: Configure the correct endpoint and region for Wasabi storage.

S3_ENDPOINT=https://s3.wasabisys.com
S3_REGION=us-east-1  # Or your Wasabi region

SSL certificate issues: Handle SSL certificate validation problems with self-signed or custom certificates.

# Disable SSL verification (development only!)
export AWS_CA_BUNDLE=/path/to/custom-ca.crt

# Or for self-signed certificates
export NODE_TLS_REJECT_UNAUTHORIZED=0  # Not recommended for production

5. Migration Problems

Problem: Migration tool hangs or fails

Symptoms: - Migration starts but doesn't progress - "File not found" errors during migration - Database inconsistencies after partial migration

Diagnosis:

# Check migration state
cat migration_state.json | jq '.'

# Find failed migrations
cat migration_state.json | jq '.failed_migrations'

# Check for orphaned files
find ./uploads -type f -name "*.pdf" | head -10

Solutions:

Resume from last successful point: Continue migration from where it left off to avoid re-processing completed files.

# Get last successful migration
LAST_ID=$(cat migration_state.json | jq -r '.completed_migrations[-1].document_id')

# Resume migration
cargo run --bin migrate_to_s3 --features s3 -- --resume-from $LAST_ID

Fix missing local files: Identify and handle documents that reference missing local files.

-- Find documents with missing files
SELECT id, filename, file_path 
FROM documents 
WHERE file_path NOT LIKE 's3://%'
AND NOT EXISTS (
  SELECT 1 FROM pg_stat_file(file_path)
);

Rollback failed migration: Safely revert a partially completed migration to restore the previous state.

# Automatic rollback
cargo run --bin migrate_to_s3 --features s3 -- --rollback

# Manual cleanup
psql $DATABASE_URL -c "UPDATE documents SET file_path = original_path WHERE file_path LIKE 's3://%';"

6. Performance Issues

Problem: Slow document retrieval from S3

Symptoms: - Document downloads are slow - High latency for thumbnail loading - Timeouts on document preview

Diagnosis:

# Measure S3 latency
time aws s3 cp s3://your-bucket/test-file /tmp/test-download

# Check S3 transfer metrics
aws cloudwatch get-metric-statistics \
  --namespace AWS/S3 \
  --metric-name AllRequests \
  --dimensions Name=BucketName,Value=your-bucket \
  --start-time 2024-01-01T00:00:00Z \
  --end-time 2024-01-02T00:00:00Z \
  --period 3600 \
  --statistics Average

Solutions:

Enable S3 Transfer Acceleration: Use AWS Transfer Acceleration to improve upload and download speeds globally.

aws s3api put-bucket-accelerate-configuration \
  --bucket your-bucket-name \
  --accelerate-configuration Status=Enabled

# Update endpoint
S3_ENDPOINT=https://your-bucket.s3-accelerate.amazonaws.com

Implement caching: Add caching layers to reduce repeated S3 requests and improve response times.

# Nginx caching configuration
proxy_cache_path /var/cache/nginx/s3 levels=1:2 keys_zone=s3_cache:10m max_size=1g;

location /api/documents/ {
    proxy_cache s3_cache;
    proxy_cache_valid 200 1h;
    proxy_cache_key "$request_uri";
}

Use CloudFront CDN: Deploy a CDN to cache frequently accessed documents closer to users.

# Create CloudFront distribution
aws cloudfront create-distribution \
  --origin-domain-name your-bucket.s3.amazonaws.com \
  --default-root-object index.html

Advanced Debugging

Enable Debug Logging

# Set environment variables
export RUST_LOG=readur=debug,aws_sdk_s3=debug,aws_config=debug
export RUST_BACKTRACE=full

# Run Readur with debug output
./readur 2>&1 | tee readur-debug.log

S3 Request Logging

# Enable S3 access logging
aws s3api put-bucket-logging \
  --bucket your-bucket-name \
  --bucket-logging-status '{
    "LoggingEnabled": {
      "TargetBucket": "your-logs-bucket",
      "TargetPrefix": "s3-access-logs/"
    }
  }'

Network Troubleshooting

# Trace S3 requests
tcpdump -i any -w s3-traffic.pcap host s3.amazonaws.com

# Analyze with Wireshark
wireshark s3-traffic.pcap

# Check MTU issues
ping -M do -s 1472 s3.amazonaws.com

Monitoring and Alerts

CloudWatch Metrics

# Create alarm for high error rate
aws cloudwatch put-metric-alarm \
  --alarm-name s3-high-error-rate \
  --alarm-description "Alert when S3 error rate is high" \
  --metric-name 4xxErrors \
  --namespace AWS/S3 \
  --statistic Sum \
  --period 300 \
  --threshold 10 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 2

Log Analysis

# Parse S3 access logs
aws s3 sync s3://your-logs-bucket/s3-access-logs/ ./logs/

# Find errors
grep -E "4[0-9]{2}|5[0-9]{2}" ./logs/*.log | head -20

# Analyze request patterns
awk '{print $8}' ./logs/*.log | sort | uniq -c | sort -rn | head -20

Recovery Procedures

Corrupted S3 Data

# Verify object integrity
aws s3api head-object --bucket your-bucket --key path/to/document.pdf

# Restore from versioning
aws s3api list-object-versions --bucket your-bucket --prefix path/to/

# Restore specific version
aws s3api get-object \
  --bucket your-bucket \
  --key path/to/document.pdf \
  --version-id VERSION_ID \
  /tmp/recovered-document.pdf

Database Inconsistency

-- Find orphaned S3 references
SELECT id, file_path 
FROM documents 
WHERE file_path LIKE 's3://%'
AND file_path NOT IN (
  SELECT 's3://' || key FROM s3_inventory_table
);

-- Update paths after bucket migration
UPDATE documents 
SET file_path = REPLACE(file_path, 's3://old-bucket/', 's3://new-bucket/')
WHERE file_path LIKE 's3://old-bucket/%';

Prevention Best Practices

  1. Regular Health Checks: Run diagnostic scripts daily
  2. Monitor Metrics: Set up CloudWatch dashboards
  3. Test Failover: Regularly test backup procedures
  4. Document Changes: Keep configuration changelog
  5. Capacity Planning: Monitor storage growth trends

Getting Help

If issues persist after following this guide:

  1. Collect Diagnostics:

    ./collect-diagnostics.sh > diagnostics.txt
    

  2. Check Logs:

  3. Application logs: journalctl -u readur -n 1000
  4. S3 access logs: Check CloudWatch or S3 access logs
  5. Database logs: tail -f /var/log/postgresql/*.log

  6. Contact Support:

  7. Include diagnostics output
  8. Provide configuration (sanitized)
  9. Describe symptoms and timeline
  10. Share any error messages