Skip to content

Backup and Recovery Guide

Overview

This guide covers comprehensive backup strategies for Readur, including database backups, document storage, configuration files, and disaster recovery procedures.

What to Backup

Critical Components

  1. PostgreSQL Database - Contains all metadata, user data, and system configuration
  2. Document Storage - Original documents and processed files
  3. Configuration Files - Environment variables and settings
  4. SSL Certificates - If using custom certificates
  5. Custom Code - Any modifications or plugins

Backup Priority Matrix

Component Priority RPO RTO Backup Frequency
Database Critical 1 hour 30 min Hourly
Documents Critical 24 hours 2 hours Daily
Config High 24 hours 1 hour On change
Logs Medium 7 days N/A Weekly
Cache Low N/A N/A Not required

Database Backup

PostgreSQL Backup Methods

Method 1: pg_dump (Logical Backup)

#!/bin/bash
# backup-database.sh

# Configuration
DB_NAME="readur"
DB_USER="readur"
BACKUP_DIR="/backup/postgres"
DATE=$(date +%Y%m%d_%H%M%S)

# Create backup
pg_dump -U $DB_USER -d $DB_NAME -F custom -f "$BACKUP_DIR/readur_$DATE.dump"

# Compress backup
gzip "$BACKUP_DIR/readur_$DATE.dump"

# Keep only last 30 days
find $BACKUP_DIR -name "*.dump.gz" -mtime +30 -delete

# Upload to S3 (optional)
aws s3 cp "$BACKUP_DIR/readur_$DATE.dump.gz" s3://backup-bucket/postgres/

Method 2: Physical Backup with pg_basebackup

#!/bin/bash
# physical-backup.sh

# Stop application (optional for consistency)
docker-compose stop readur

# Perform base backup
pg_basebackup -U replicator -D /backup/pgdata_$(date +%Y%m%d) \
  -Fp -Xs -P -R

# Start application
docker-compose start readur

Method 3: Continuous Archiving (WAL)

# postgresql.conf
wal_level = replica
archive_mode = on
archive_command = 'test ! -f /archive/%f && cp %p /archive/%f'
max_wal_senders = 3
wal_keep_segments = 64

Docker Database Backup

#!/bin/bash
# docker-db-backup.sh

# Backup database from Docker container
docker-compose exec -T postgres pg_dump -U readur readur | \
  gzip > backup_$(date +%Y%m%d_%H%M%S).sql.gz

# Alternative: Using docker run
docker run --rm \
  --network readur_default \
  postgres:14 \
  pg_dump -h postgres -U readur readur | \
  gzip > backup_$(date +%Y%m%d_%H%M%S).sql.gz

Document Storage Backup

Local Storage Backup

#!/bin/bash
# backup-documents.sh

SOURCE="/data/readur/documents"
BACKUP_DIR="/backup/documents"
DATE=$(date +%Y%m%d)

# Incremental backup with rsync
rsync -avz --delete \
  --backup --backup-dir="$BACKUP_DIR/incremental_$DATE" \
  "$SOURCE/" "$BACKUP_DIR/current/"

# Create tar archive
tar -czf "$BACKUP_DIR/documents_$DATE.tar.gz" \
  -C "$BACKUP_DIR" current/

# Keep only last 7 daily backups
find $BACKUP_DIR -name "documents_*.tar.gz" -mtime +7 -delete

S3 Storage Backup

#!/bin/bash
# backup-s3.sh

# Sync S3 bucket to another bucket
aws s3 sync s3://readur-documents s3://readur-backup \
  --delete \
  --storage-class GLACIER_IR

# Or to local storage
aws s3 sync s3://readur-documents /backup/s3-documents \
  --delete

Deduplication Strategy

#!/bin/bash
# dedup-backup.sh

# Use restic for deduplication
restic -r /backup/restic init

# Backup with deduplication
restic -r /backup/restic backup \
  /data/readur/documents \
  --tag documents \
  --host readur-server

# Prune old snapshots
restic -r /backup/restic forget \
  --keep-daily 7 \
  --keep-weekly 4 \
  --keep-monthly 12 \
  --prune

Configuration Backup

Environment and Settings

#!/bin/bash
# backup-config.sh

CONFIG_DIR="/etc/readur"
BACKUP_DIR="/backup/config"
DATE=$(date +%Y%m%d_%H%M%S)

# Create config archive
tar -czf "$BACKUP_DIR/config_$DATE.tar.gz" \
  $CONFIG_DIR/.env \
  $CONFIG_DIR/docker-compose.yml \
  $CONFIG_DIR/nginx.conf \
  /etc/ssl/certs/readur* \
  /etc/systemd/system/readur*

# Encrypt sensitive configuration
gpg --encrypt --recipient [email protected] \
  "$BACKUP_DIR/config_$DATE.tar.gz"

# Remove unencrypted file
rm "$BACKUP_DIR/config_$DATE.tar.gz"

Automated Backup Solution

Complete Backup Script

#!/bin/bash
# readur-backup.sh

set -e

# Configuration
BACKUP_ROOT="/backup"
S3_BUCKET="s3://company-backups/readur"
SLACK_WEBHOOK="https://hooks.slack.com/services/XXX"
DATE=$(date +%Y%m%d_%H%M%S)

# Functions
log() {
    echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1"
}

notify() {
    curl -X POST -H 'Content-type: application/json' \
      --data "{\"text\":\"$1\"}" $SLACK_WEBHOOK
}

# Create backup directories
mkdir -p "$BACKUP_ROOT"/{database,documents,config,logs}

# 1. Database backup
log "Starting database backup..."
docker-compose exec -T postgres pg_dump -U readur readur | \
  gzip > "$BACKUP_ROOT/database/readur_$DATE.sql.gz"

# 2. Documents backup (if local storage)
if [ "$STORAGE_BACKEND" = "local" ]; then
    log "Starting documents backup..."
    rsync -avz --delete \
      /data/readur/documents/ \
      "$BACKUP_ROOT/documents/current/"

    tar -czf "$BACKUP_ROOT/documents/documents_$DATE.tar.gz" \
      -C "$BACKUP_ROOT/documents" current/
fi

# 3. Configuration backup
log "Starting configuration backup..."
tar -czf "$BACKUP_ROOT/config/config_$DATE.tar.gz" \
  .env docker-compose.yml

# 4. Upload to S3
log "Uploading to S3..."
aws s3 sync "$BACKUP_ROOT" "$S3_BUCKET" \
  --exclude "*/current/*" \
  --storage-class STANDARD_IA

# 5. Cleanup old backups
log "Cleaning up old backups..."
find "$BACKUP_ROOT/database" -name "*.sql.gz" -mtime +7 -delete
find "$BACKUP_ROOT/documents" -name "*.tar.gz" -mtime +7 -delete
find "$BACKUP_ROOT/config" -name "*.tar.gz" -mtime +30 -delete

# 6. Verify backup
BACKUP_SIZE=$(du -sh "$BACKUP_ROOT" | cut -f1)
log "Backup completed. Total size: $BACKUP_SIZE"

# 7. Send notification
notify "Readur backup completed successfully. Size: $BACKUP_SIZE"

Cron Schedule

# /etc/crontab
# Hourly database backup
0 * * * * root /opt/readur/scripts/backup-database.sh

# Daily full backup at 2 AM
0 2 * * * root /opt/readur/scripts/readur-backup.sh

# Weekly configuration backup
0 3 * * 0 root /opt/readur/scripts/backup-config.sh

Recovery Procedures

Database Recovery

From pg_dump Backup

#!/bin/bash
# restore-database.sh

BACKUP_FILE="$1"

# Stop application
docker-compose stop readur

# Drop existing database
docker-compose exec postgres psql -U postgres -c "DROP DATABASE IF EXISTS readur;"
docker-compose exec postgres psql -U postgres -c "CREATE DATABASE readur OWNER readur;"

# Restore backup
gunzip -c "$BACKUP_FILE" | docker-compose exec -T postgres psql -U readur readur

# Run migrations
docker-compose exec readur alembic upgrade head

# Start application
docker-compose start readur

Point-in-Time Recovery

# Restore to specific time
recovery_target_time = '2024-01-15 14:30:00'

# Restore base backup
pg_basebackup -R -D /var/lib/postgresql/data

# Apply WAL logs
restore_command = 'cp /archive/%f %p'
recovery_target_time = '2024-01-15 14:30:00'

Document Recovery

#!/bin/bash
# restore-documents.sh

BACKUP_FILE="$1"
TARGET_DIR="/data/readur/documents"

# Extract backup
tar -xzf "$BACKUP_FILE" -C /tmp/

# Restore with verification
rsync -avz --checksum \
  /tmp/current/ \
  "$TARGET_DIR/"

# Fix permissions
chown -R readur:readur "$TARGET_DIR"
chmod -R 755 "$TARGET_DIR"

Full System Recovery

#!/bin/bash
# disaster-recovery.sh

set -e

# 1. Install Docker and dependencies
apt-get update
apt-get install -y docker.io docker-compose

# 2. Restore configuration
gpg --decrypt config_backup.tar.gz.gpg | tar -xzf - -C /etc/readur/

# 3. Pull Docker images
docker-compose pull

# 4. Restore database
gunzip -c database_backup.sql.gz | \
  docker-compose exec -T postgres psql -U readur

# 5. Restore documents
tar -xzf documents_backup.tar.gz -C /data/readur/

# 6. Start services
docker-compose up -d

# 7. Verify
curl -f http://localhost:8000/health || exit 1

echo "Recovery completed successfully"

Backup Verification

Automated Testing

#!/bin/bash
# verify-backup.sh

# Test database backup
TEST_DB="readur_test"

# Create test database
createdb $TEST_DB

# Restore backup to test database
gunzip -c "$1" | psql $TEST_DB

# Verify data integrity
RECORD_COUNT=$(psql -t -c "SELECT COUNT(*) FROM documents" $TEST_DB)
echo "Restored $RECORD_COUNT documents"

# Cleanup
dropdb $TEST_DB

Backup Monitoring

#!/usr/bin/env python3
# monitor-backups.py

import os
import time
from datetime import datetime, timedelta
import smtplib
from email.mime.text import MIMEText

BACKUP_DIR = "/backup"
MAX_AGE_HOURS = 25  # Alert if backup older than 25 hours

def check_backup_age(directory):
    latest_backup = None
    latest_time = 0

    for file in os.listdir(directory):
        if file.endswith('.gz'):
            file_time = os.path.getmtime(os.path.join(directory, file))
            if file_time > latest_time:
                latest_time = file_time
                latest_backup = file

    if latest_backup:
        age = time.time() - latest_time
        return latest_backup, age / 3600  # Age in hours
    return None, float('inf')

def send_alert(message):
    msg = MIMEText(message)
    msg['Subject'] = 'Readur Backup Alert'
    msg['From'] = '[email protected]'
    msg['To'] = '[email protected]'

    s = smtplib.SMTP('localhost')
    s.send_message(msg)
    s.quit()

# Check each backup type
for backup_type in ['database', 'documents', 'config']:
    dir_path = os.path.join(BACKUP_DIR, backup_type)
    filename, age_hours = check_backup_age(dir_path)

    if age_hours > MAX_AGE_HOURS:
        send_alert(f"WARNING: {backup_type} backup is {age_hours:.1f} hours old")
    else:
        print(f"OK: {backup_type} backup is {age_hours:.1f} hours old")

Cloud Backup Solutions

AWS Backup Integration

# CloudFormation template
Resources:
  BackupPlan:
    Type: AWS::Backup::BackupPlan
    Properties:
      BackupPlan:
        BackupPlanName: ReadurBackupPlan
        BackupPlanRule:
          - RuleName: DailyBackups
            TargetBackupVault: Default
            ScheduleExpression: "cron(0 5 ? * * *)"
            StartWindowMinutes: 60
            CompletionWindowMinutes: 120
            Lifecycle:
              DeleteAfterDays: 30
              MoveToColdStorageAfterDays: 7

Backup to Multiple Destinations

#!/bin/bash
# multi-destination-backup.sh

BACKUP_FILE="readur_$(date +%Y%m%d).tar.gz"

# Local backup
cp "$BACKUP_FILE" /mnt/nas/backups/

# AWS S3
aws s3 cp "$BACKUP_FILE" s3://backup-bucket/

# Google Cloud Storage
gsutil cp "$BACKUP_FILE" gs://backup-bucket/

# Azure Blob Storage
az storage blob upload \
  --container-name backups \
  --name "$BACKUP_FILE" \
  --file "$BACKUP_FILE"

Best Practices

Security

  1. Encrypt backups at rest and in transit
  2. Test recovery procedures regularly
  3. Store backups in multiple locations
  4. Rotate credentials used for backup access
  5. Monitor backup success and failures

Testing

  1. Monthly recovery drills to test procedures
  2. Quarterly full recovery to separate environment
  3. Annual disaster recovery exercise
  4. Document lessons learned and update procedures

Documentation

Maintain documentation for: - Backup schedules and retention policies - Recovery procedures and contact information - RTO/RPO requirements - Backup verification procedures - Encryption keys and access credentials