Command-Line Tools Reference¶
Overview¶
Readur includes several Rust-based command-line utilities for system administration and maintenance. These are compiled binaries (not Python scripts) designed for system administrators and DevOps teams managing Readur deployments.
Available CLI Tools: - migrate_to_s3
- Migrate documents between storage backends - batch_ingest
- Bulk import documents - debug_pdf_extraction
- Debug PDF processing issues - enqueue_pending_ocr
- Re-queue documents for OCR processing - test_metadata
- Test metadata extraction - test_runner
- Run test suites
migrate_to_s3¶
Purpose: Migrate document storage between backends (Local ↔ S3)
Usage¶
Command Options¶
Option | Description | Example |
---|---|---|
--dry-run | Test migration without making changes | --dry-run |
--enable-rollback | Enable rollback capabilities with state tracking | --enable-rollback |
--user-id <UUID> | Migrate documents for specific user only | --user-id "123e4567-..." |
--resume-from <FILE> | Resume migration from saved state file | --resume-from /tmp/state.json |
--rollback <FILE> | Rollback previous migration using state file | --rollback /tmp/state.json |
--batch-size <NUM> | Number of documents to process per batch | --batch-size 1000 |
--parallel-uploads <NUM> | Maximum concurrent S3 uploads | --parallel-uploads 5 |
--verbose | Enable detailed output and progress logging | --verbose |
--audit-files | Check file system consistency before migration | --audit-files |
--status | Show status of current/recent migrations | --status |
--help | Display help information | --help |
Examples¶
Basic Migration¶
# Test migration first
docker exec readur-app cargo run --bin migrate_to_s3 -- --dry-run
# Run actual migration with safety features
docker exec readur-app cargo run --bin migrate_to_s3 -- --enable-rollback
# Verbose migration with custom batch size
docker exec readur-app cargo run --bin migrate_to_s3 -- \
--enable-rollback --verbose --batch-size 500
User-Specific Migration¶
# Get user IDs from database
docker exec readur-app psql -d readur -c \
"SELECT id, email FROM users WHERE email LIKE '%@company.com';"
# Migrate specific user's documents
docker exec readur-app cargo run --bin migrate_to_s3 -- \
--enable-rollback --user-id "uuid-from-above"
Recovery Operations¶
# Resume interrupted migration
docker exec readur-app cargo run --bin migrate_to_s3 -- \
--resume-from /tmp/migration_state_20241201_143022.json
# Rollback completed migration
docker exec readur-app cargo run --bin migrate_to_s3 -- \
--rollback /tmp/migration_state_20241201_143022.json
# Check migration status
docker exec readur-app cargo run --bin migrate_to_s3 -- --status
Performance Optimization¶
# High-performance migration for large datasets
docker exec readur-app cargo run --bin migrate_to_s3 -- \
--enable-rollback \
--batch-size 2000 \
--parallel-uploads 10 \
--verbose
# Conservative migration for limited resources
docker exec readur-app cargo run --bin migrate_to_s3 -- \
--enable-rollback \
--batch-size 100 \
--parallel-uploads 2
State Files¶
The migration tool creates state files to track progress and enable recovery:
Location: /tmp/migration_state_YYYYMMDD_HHMMSS.json
Contents:
{
"migration_id": "uuid",
"started_at": "2024-12-01T14:30:22Z",
"completed_migrations": [
{
"document_id": "uuid",
"original_path": "/app/uploads/doc.pdf",
"s3_key": "documents/user123/doc.pdf",
"migrated_at": "2024-12-01T14:31:15Z"
}
],
"failed_migrations": [],
"total_files": 2500,
"processed_files": 1247,
"rollback_enabled": true
}
Exit Codes¶
Code | Meaning |
---|---|
0 | Success |
1 | General error |
2 | Configuration error |
3 | Database connection error |
4 | S3 access error |
5 | File system error |
10 | Migration already in progress |
11 | State file not found |
12 | Rollback failed |
enqueue_pending_ocr¶
Purpose: Add documents with pending OCR status to the processing queue
Usage¶
Description¶
This utility addresses situations where documents are marked as pending OCR in the database but haven't been added to the OCR processing queue. This can happen after: - Database restoration - System crashes during OCR processing - Migration from older versions
Example Output¶
🔍 Scanning for documents with pending OCR status...
📊 Found 45 documents with pending OCR status
🚀 Enqueuing documents for OCR processing...
✅ Successfully enqueued 45 documents
⏱️ Average queue priority: 5
📈 Current queue size: 127 items
When to Use¶
- After restoring from database backup
- When OCR queue appears empty but documents show "pending" status
- Following system recovery or migration
- As part of maintenance procedures
test_runner¶
Purpose: Execute comprehensive test suites with detailed reporting
Usage¶
Options¶
Option | Description |
---|---|
--unit | Run unit tests only |
--integration | Run integration tests only |
--e2e | Run end-to-end tests only |
--verbose | Detailed test output |
--parallel <N> | Number of parallel test threads |
Examples¶
# Run all tests
docker exec readur-app cargo run --bin test_runner
# Run only integration tests with verbose output
docker exec readur-app cargo run --bin test_runner -- --integration --verbose
# Run tests with limited parallelism
docker exec readur-app cargo run --bin test_runner -- --parallel 2
General Usage Patterns¶
Docker Deployments¶
For Docker-based Readur deployments:
# Using compiled binaries (production)
docker exec readur-app /app/migrate_to_s3 [OPTIONS]
# Or during development with cargo
docker exec readur-app cargo run --bin migrate_to_s3 -- [OPTIONS]
# With environment variables
docker exec -e S3_BUCKET_NAME=my-bucket readur-app \
/app/migrate_to_s3 --dry-run
# Interactive mode (if needed)
docker exec -it readur-app /app/migrate_to_s3 --help
Direct Deployments¶
For direct server deployments:
# Using compiled binaries
/opt/readur/bin/migrate_to_s3 --dry-run
# With environment variables
S3_ACCESS_KEY_ID="your_key" S3_SECRET_ACCESS_KEY="your_secret" \
/opt/readur/bin/migrate_to_s3 --status
# Never pass secrets in command line - use environment variables or config files
DATABASE_URL="postgresql://user:pass@host/db" /opt/readur/bin/migrate_to_s3
### Kubernetes Deployments
For Kubernetes environments:
```bash
# Find the pod name
kubectl get pods -l app=readur
# Execute tool in pod
kubectl exec deployment/readur -- \
cargo run --bin migrate_to_s3 -- --dry-run
# With environment variable override
kubectl exec deployment/readur -e S3_REGION=eu-west-1 -- \
cargo run --bin migrate_to_s3 -- --status
Best Practices¶
Before Running Tools¶
- Backup data - Always backup database and files
- Test in staging - Try commands in non-production first
- Check resources - Ensure sufficient CPU, memory, disk space
- Verify access - Confirm database and S3 connectivity
During Execution¶
- Monitor progress - Watch logs and system resources
- Keep sessions active - Use
screen
ortmux
for long operations - Save output - Redirect output to files for later analysis
- Document actions - Keep notes of commands and results
After Completion¶
- Verify results - Check that operations completed successfully
- Clean up - Remove temporary files and state data if appropriate
- Update documentation - Record any configuration changes
- Monitor application - Watch for any issues after changes
Environment Variables¶
Common environment variables used by CLI tools:
Variable | Purpose | Example |
---|---|---|
DATABASE_URL | PostgreSQL connection string | postgresql://user:pass@host:5432/readur |
S3_BUCKET_NAME | Target S3 bucket | my-company-readur |
S3_ACCESS_KEY_ID | AWS/S3 access key ID | AKIA... |
S3_SECRET_ACCESS_KEY | AWS/S3 secret access key | ... |
S3_REGION | AWS region | us-east-1 |
S3_ENDPOINT | Custom S3 endpoint | https://minio.company.com |
RUST_LOG | Logging level | debug , info , warn , error |
RUST_BACKTRACE | Error backtraces | 1 or full |
Security Warning: Never pass secrets like S3_SECRET_ACCESS_KEY
directly in command lines as they may be visible in process lists. Always use environment variables, configuration files, or secret management systems.
Troubleshooting¶
Common Issues¶
-
Permission Denied
-
Tool Not Found
# List available binaries docker exec readur-app ls -la /app/ | grep -E '(migrate_to_s3|batch_ingest|debug_pdf|enqueue|test_)' # For development, build tools if missing docker exec readur-app cargo build --release --bin migrate_to_s3 docker exec readur-app cargo build --release --bin enqueue_pending_ocr docker exec readur-app cargo build --release --bin batch_ingest
-
Database Connection Issues
# Test database connectivity (requires PGPASSWORD or .pgpass) docker exec -e PGPASSWORD="${DB_PASSWORD}" readur-app \ psql -h postgres -U readur -d readur -c "SELECT version();" # Check environment variables (be careful not to expose secrets) docker exec readur-app env | grep -E '(DATABASE_URL|POSTGRES_)' | sed 's/=.*/=***/'
Getting Help¶
For each tool, use the --help
flag:
# Production binaries
docker exec readur-app /app/migrate_to_s3 --help
docker exec readur-app /app/enqueue_pending_ocr --help
# Development with cargo
docker exec readur-app cargo run --bin migrate_to_s3 -- --help
docker exec readur-app cargo run --bin enqueue_pending_ocr -- --help
Logging and Debugging¶
Enable detailed logging:
# Debug level logging
docker exec -e RUST_LOG=debug readur-app \
cargo run --bin migrate_to_s3 -- --verbose
# With backtrace for errors
docker exec -e RUST_BACKTRACE=1 readur-app \
cargo run --bin migrate_to_s3 -- --status
Security Considerations¶
Access Control¶
- CLI tools should only be run by system administrators
- Use proper Docker user contexts
- Limit access to state files containing sensitive information
Credential Handling¶
- Never log full credentials or API keys
- Use environment variables instead of command-line parameters
- Rotate credentials after major operations
Network Security¶
- Ensure TLS/HTTPS for all S3 communications
- Use VPN or private networks when possible
- Monitor network traffic during migrations
Remember: These tools have significant impact on your Readur deployment. Always test in non-production environments first and maintain proper backups.