Backup and Recovery Guide¶
Overview¶
This guide covers comprehensive backup strategies for Readur, including database backups, document storage, configuration files, and disaster recovery procedures.
What to Backup¶
Critical Components¶
- PostgreSQL Database - Contains all metadata, user data, and system configuration
- Document Storage - Original documents and processed files
- Configuration Files - Environment variables and settings
- SSL Certificates - If using custom certificates
- Custom Code - Any modifications or plugins
Backup Priority Matrix¶
Component | Priority | RPO | RTO | Backup Frequency |
---|---|---|---|---|
Database | Critical | 1 hour | 30 min | Hourly |
Documents | Critical | 24 hours | 2 hours | Daily |
Config | High | 24 hours | 1 hour | On change |
Logs | Medium | 7 days | N/A | Weekly |
Cache | Low | N/A | N/A | Not required |
Database Backup¶
PostgreSQL Backup Methods¶
Method 1: pg_dump (Logical Backup)¶
#!/bin/bash
# backup-database.sh
# Configuration
DB_NAME="readur"
DB_USER="readur"
BACKUP_DIR="/backup/postgres"
DATE=$(date +%Y%m%d_%H%M%S)
# Create backup
pg_dump -U $DB_USER -d $DB_NAME -F custom -f "$BACKUP_DIR/readur_$DATE.dump"
# Compress backup
gzip "$BACKUP_DIR/readur_$DATE.dump"
# Keep only last 30 days
find $BACKUP_DIR -name "*.dump.gz" -mtime +30 -delete
# Upload to S3 (optional)
aws s3 cp "$BACKUP_DIR/readur_$DATE.dump.gz" s3://backup-bucket/postgres/
Method 2: Physical Backup with pg_basebackup¶
#!/bin/bash
# physical-backup.sh
# Stop application (optional for consistency)
docker-compose stop readur
# Perform base backup
pg_basebackup -U replicator -D /backup/pgdata_$(date +%Y%m%d) \
-Fp -Xs -P -R
# Start application
docker-compose start readur
Method 3: Continuous Archiving (WAL)¶
# postgresql.conf
wal_level = replica
archive_mode = on
archive_command = 'test ! -f /archive/%f && cp %p /archive/%f'
max_wal_senders = 3
wal_keep_segments = 64
Docker Database Backup¶
#!/bin/bash
# docker-db-backup.sh
# Backup database from Docker container
docker-compose exec -T postgres pg_dump -U readur readur | \
gzip > backup_$(date +%Y%m%d_%H%M%S).sql.gz
# Alternative: Using docker run
docker run --rm \
--network readur_default \
postgres:14 \
pg_dump -h postgres -U readur readur | \
gzip > backup_$(date +%Y%m%d_%H%M%S).sql.gz
Document Storage Backup¶
Local Storage Backup¶
#!/bin/bash
# backup-documents.sh
SOURCE="/data/readur/documents"
BACKUP_DIR="/backup/documents"
DATE=$(date +%Y%m%d)
# Incremental backup with rsync
rsync -avz --delete \
--backup --backup-dir="$BACKUP_DIR/incremental_$DATE" \
"$SOURCE/" "$BACKUP_DIR/current/"
# Create tar archive
tar -czf "$BACKUP_DIR/documents_$DATE.tar.gz" \
-C "$BACKUP_DIR" current/
# Keep only last 7 daily backups
find $BACKUP_DIR -name "documents_*.tar.gz" -mtime +7 -delete
S3 Storage Backup¶
#!/bin/bash
# backup-s3.sh
# Sync S3 bucket to another bucket
aws s3 sync s3://readur-documents s3://readur-backup \
--delete \
--storage-class GLACIER_IR
# Or to local storage
aws s3 sync s3://readur-documents /backup/s3-documents \
--delete
Deduplication Strategy¶
#!/bin/bash
# dedup-backup.sh
# Use restic for deduplication
restic -r /backup/restic init
# Backup with deduplication
restic -r /backup/restic backup \
/data/readur/documents \
--tag documents \
--host readur-server
# Prune old snapshots
restic -r /backup/restic forget \
--keep-daily 7 \
--keep-weekly 4 \
--keep-monthly 12 \
--prune
Configuration Backup¶
Environment and Settings¶
#!/bin/bash
# backup-config.sh
CONFIG_DIR="/etc/readur"
BACKUP_DIR="/backup/config"
DATE=$(date +%Y%m%d_%H%M%S)
# Create config archive
tar -czf "$BACKUP_DIR/config_$DATE.tar.gz" \
$CONFIG_DIR/.env \
$CONFIG_DIR/docker-compose.yml \
$CONFIG_DIR/nginx.conf \
/etc/ssl/certs/readur* \
/etc/systemd/system/readur*
# Encrypt sensitive configuration
gpg --encrypt --recipient [email protected] \
"$BACKUP_DIR/config_$DATE.tar.gz"
# Remove unencrypted file
rm "$BACKUP_DIR/config_$DATE.tar.gz"
Automated Backup Solution¶
Complete Backup Script¶
#!/bin/bash
# readur-backup.sh
set -e
# Configuration
BACKUP_ROOT="/backup"
S3_BUCKET="s3://company-backups/readur"
SLACK_WEBHOOK="https://hooks.slack.com/services/XXX"
DATE=$(date +%Y%m%d_%H%M%S)
# Functions
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1"
}
notify() {
curl -X POST -H 'Content-type: application/json' \
--data "{\"text\":\"$1\"}" $SLACK_WEBHOOK
}
# Create backup directories
mkdir -p "$BACKUP_ROOT"/{database,documents,config,logs}
# 1. Database backup
log "Starting database backup..."
docker-compose exec -T postgres pg_dump -U readur readur | \
gzip > "$BACKUP_ROOT/database/readur_$DATE.sql.gz"
# 2. Documents backup (if local storage)
if [ "$STORAGE_BACKEND" = "local" ]; then
log "Starting documents backup..."
rsync -avz --delete \
/data/readur/documents/ \
"$BACKUP_ROOT/documents/current/"
tar -czf "$BACKUP_ROOT/documents/documents_$DATE.tar.gz" \
-C "$BACKUP_ROOT/documents" current/
fi
# 3. Configuration backup
log "Starting configuration backup..."
tar -czf "$BACKUP_ROOT/config/config_$DATE.tar.gz" \
.env docker-compose.yml
# 4. Upload to S3
log "Uploading to S3..."
aws s3 sync "$BACKUP_ROOT" "$S3_BUCKET" \
--exclude "*/current/*" \
--storage-class STANDARD_IA
# 5. Cleanup old backups
log "Cleaning up old backups..."
find "$BACKUP_ROOT/database" -name "*.sql.gz" -mtime +7 -delete
find "$BACKUP_ROOT/documents" -name "*.tar.gz" -mtime +7 -delete
find "$BACKUP_ROOT/config" -name "*.tar.gz" -mtime +30 -delete
# 6. Verify backup
BACKUP_SIZE=$(du -sh "$BACKUP_ROOT" | cut -f1)
log "Backup completed. Total size: $BACKUP_SIZE"
# 7. Send notification
notify "Readur backup completed successfully. Size: $BACKUP_SIZE"
Cron Schedule¶
# /etc/crontab
# Hourly database backup
0 * * * * root /opt/readur/scripts/backup-database.sh
# Daily full backup at 2 AM
0 2 * * * root /opt/readur/scripts/readur-backup.sh
# Weekly configuration backup
0 3 * * 0 root /opt/readur/scripts/backup-config.sh
Recovery Procedures¶
Database Recovery¶
From pg_dump Backup¶
#!/bin/bash
# restore-database.sh
BACKUP_FILE="$1"
# Stop application
docker-compose stop readur
# Drop existing database
docker-compose exec postgres psql -U postgres -c "DROP DATABASE IF EXISTS readur;"
docker-compose exec postgres psql -U postgres -c "CREATE DATABASE readur OWNER readur;"
# Restore backup
gunzip -c "$BACKUP_FILE" | docker-compose exec -T postgres psql -U readur readur
# Run migrations
docker-compose exec readur alembic upgrade head
# Start application
docker-compose start readur
Point-in-Time Recovery¶
# Restore to specific time
recovery_target_time = '2024-01-15 14:30:00'
# Restore base backup
pg_basebackup -R -D /var/lib/postgresql/data
# Apply WAL logs
restore_command = 'cp /archive/%f %p'
recovery_target_time = '2024-01-15 14:30:00'
Document Recovery¶
#!/bin/bash
# restore-documents.sh
BACKUP_FILE="$1"
TARGET_DIR="/data/readur/documents"
# Extract backup
tar -xzf "$BACKUP_FILE" -C /tmp/
# Restore with verification
rsync -avz --checksum \
/tmp/current/ \
"$TARGET_DIR/"
# Fix permissions
chown -R readur:readur "$TARGET_DIR"
chmod -R 755 "$TARGET_DIR"
Full System Recovery¶
#!/bin/bash
# disaster-recovery.sh
set -e
# 1. Install Docker and dependencies
apt-get update
apt-get install -y docker.io docker-compose
# 2. Restore configuration
gpg --decrypt config_backup.tar.gz.gpg | tar -xzf - -C /etc/readur/
# 3. Pull Docker images
docker-compose pull
# 4. Restore database
gunzip -c database_backup.sql.gz | \
docker-compose exec -T postgres psql -U readur
# 5. Restore documents
tar -xzf documents_backup.tar.gz -C /data/readur/
# 6. Start services
docker-compose up -d
# 7. Verify
curl -f http://localhost:8000/health || exit 1
echo "Recovery completed successfully"
Backup Verification¶
Automated Testing¶
#!/bin/bash
# verify-backup.sh
# Test database backup
TEST_DB="readur_test"
# Create test database
createdb $TEST_DB
# Restore backup to test database
gunzip -c "$1" | psql $TEST_DB
# Verify data integrity
RECORD_COUNT=$(psql -t -c "SELECT COUNT(*) FROM documents" $TEST_DB)
echo "Restored $RECORD_COUNT documents"
# Cleanup
dropdb $TEST_DB
Backup Monitoring¶
#!/usr/bin/env python3
# monitor-backups.py
import os
import time
from datetime import datetime, timedelta
import smtplib
from email.mime.text import MIMEText
BACKUP_DIR = "/backup"
MAX_AGE_HOURS = 25 # Alert if backup older than 25 hours
def check_backup_age(directory):
latest_backup = None
latest_time = 0
for file in os.listdir(directory):
if file.endswith('.gz'):
file_time = os.path.getmtime(os.path.join(directory, file))
if file_time > latest_time:
latest_time = file_time
latest_backup = file
if latest_backup:
age = time.time() - latest_time
return latest_backup, age / 3600 # Age in hours
return None, float('inf')
def send_alert(message):
msg = MIMEText(message)
msg['Subject'] = 'Readur Backup Alert'
msg['From'] = '[email protected]'
msg['To'] = '[email protected]'
s = smtplib.SMTP('localhost')
s.send_message(msg)
s.quit()
# Check each backup type
for backup_type in ['database', 'documents', 'config']:
dir_path = os.path.join(BACKUP_DIR, backup_type)
filename, age_hours = check_backup_age(dir_path)
if age_hours > MAX_AGE_HOURS:
send_alert(f"WARNING: {backup_type} backup is {age_hours:.1f} hours old")
else:
print(f"OK: {backup_type} backup is {age_hours:.1f} hours old")
Cloud Backup Solutions¶
AWS Backup Integration¶
# CloudFormation template
Resources:
BackupPlan:
Type: AWS::Backup::BackupPlan
Properties:
BackupPlan:
BackupPlanName: ReadurBackupPlan
BackupPlanRule:
- RuleName: DailyBackups
TargetBackupVault: Default
ScheduleExpression: "cron(0 5 ? * * *)"
StartWindowMinutes: 60
CompletionWindowMinutes: 120
Lifecycle:
DeleteAfterDays: 30
MoveToColdStorageAfterDays: 7
Backup to Multiple Destinations¶
#!/bin/bash
# multi-destination-backup.sh
BACKUP_FILE="readur_$(date +%Y%m%d).tar.gz"
# Local backup
cp "$BACKUP_FILE" /mnt/nas/backups/
# AWS S3
aws s3 cp "$BACKUP_FILE" s3://backup-bucket/
# Google Cloud Storage
gsutil cp "$BACKUP_FILE" gs://backup-bucket/
# Azure Blob Storage
az storage blob upload \
--container-name backups \
--name "$BACKUP_FILE" \
--file "$BACKUP_FILE"
Best Practices¶
Security¶
- Encrypt backups at rest and in transit
- Test recovery procedures regularly
- Store backups in multiple locations
- Rotate credentials used for backup access
- Monitor backup success and failures
Testing¶
- Monthly recovery drills to test procedures
- Quarterly full recovery to separate environment
- Annual disaster recovery exercise
- Document lessons learned and update procedures
Documentation¶
Maintain documentation for: - Backup schedules and retention policies - Recovery procedures and contact information - RTO/RPO requirements - Backup verification procedures - Encryption keys and access credentials