Test Infrastructure Documentation¶

This document provides a comprehensive guide to the test infrastructure in Readur, including test patterns, utilities, common issues, and best practices.

📋 Table of Contents¶

Test Architecture Overview
TestContext Pattern
Test Utilities
Test Isolation and Environment Variables
Common Patterns
Troubleshooting
Best Practices

Test Architecture Overview¶

Readur uses a three-tier testing approach:

Unit Tests (src/tests/) - Fast, isolated component tests
Integration Tests (tests/) - Full system tests with database
Frontend Tests (frontend/src/__tests__/) - React component and API tests

Test Execution Flow¶

┌─────────────────┐
│   Unit Tests    │ ← No external dependencies
│  (cargo test)   │ ← Milliseconds execution
└────────┬────────┘
         │
┌────────▼────────┐
│Integration Tests│ ← Real database (PostgreSQL)
│ (TestContext)   │ ← In-memory app instance
└────────┬────────┘
         │
┌────────▼────────┐
│ Frontend Tests  │ ← Mocked API responses
│   (Vitest)      │ ← Component isolation
└─────────────────┘

TestContext Pattern¶

The TestContext is the cornerstone of integration testing in Readur. It provides an isolated test environment with a real database.

Basic Usage¶

use readur::test_utils::{TestContext, TestAuthHelper};

#[tokio::test]
async fn test_document_workflow() {
    // Create a new test context with default configuration
    let ctx = TestContext::new().await;

    // Access the app router for making requests
    let app = ctx.app();

    // Access the application state
    let state = ctx.state();

    // Test runs with isolated database
}

How TestContext Works¶

Database Setup: Spins up a PostgreSQL container using testcontainers
Migrations: Runs all SQLx migrations automatically
App Instance: Creates an in-memory Axum router with full API routes
Isolation: Each test gets its own database container

Custom Configuration¶

use readur::test_utils::{TestContext, TestConfigBuilder};

#[tokio::test]
async fn test_with_custom_config() {
    let config = TestConfigBuilder::default()
        .with_concurrent_ocr_jobs(4)
        .with_upload_path("./test-uploads")
        .with_oidc_enabled(false);

    let ctx = TestContext::with_config(config).await;
}

Making Requests¶

use axum::http::{Request, StatusCode};
use axum::body::Body;
use tower::ServiceExt;

// Direct request to the test app
let request = Request::builder()
    .method("GET")
    .uri("/api/health")
    .body(Body::empty())
    .unwrap();

let response = ctx.app().clone().oneshot(request).await.unwrap();
assert_eq!(response.status(), StatusCode::OK);

Test Utilities¶

TestAuthHelper¶

Handles user creation and authentication in tests:

let auth_helper = TestAuthHelper::new(ctx.app().clone());

// Create a regular user
let mut test_user = auth_helper.create_test_user().await;
// Generates unique username: testuser_<pid>_<thread>_<nanos>

// Create an admin user
let admin_user = auth_helper.create_admin_user().await;

// Login and get token
let token = test_user.login(&auth_helper).await.unwrap();

// Make authenticated request
let response = auth_helper.make_authenticated_request(
    "GET",
    "/api/documents",
    None,
    &token
).await;

Document Helpers¶

Test data builders for consistent document creation:

use readur::test_utils::document_helpers::*;

// Basic test document
let doc = create_test_document(user_id);

// Document with specific hash
let doc = create_test_document_with_hash(
    user_id,
    "test.pdf",
    "abc123".to_string()
);

// Low confidence OCR document
let doc = create_low_confidence_document(user_id, 45.0);

// Document with OCR error
let doc = create_document_with_ocr_error(user_id);

Test User Pattern¶

Each test creates unique users to avoid conflicts:

// Unique username pattern: testuser_<process_id>_<thread_id>_<timestamp_nanos>
// Example: testuser_12345_2_1752870966778668050

// This prevents "Username already exists" errors in parallel tests

Test Isolation and Environment Variables¶

The TESSDATA_PREFIX Problem¶

One of the most challenging issues in the test suite was related to OCR language validation and environment variables.

The Issue¶

Tests set TESSDATA_PREFIX environment variable to point to temporary directories
Environment variables are global and shared across all threads
When tests run in parallel, they overwrite each other's TESSDATA_PREFIX
This caused 400 errors when validating OCR languages

The Solution¶

Modified the OCR retry endpoint to use custom tessdata paths:

// In src/routes/documents/ocr.rs
let health_checker = if let Ok(tessdata_path) = std::env::var("TESSDATA_PREFIX") {
    crate::ocr::health::OcrHealthChecker::new_with_path(tessdata_path)
} else {
    crate::ocr::health::OcrHealthChecker::new()
};

Test Setup Example¶

#[tokio::test]
async fn test_retry_ocr_with_language() {
    // Create temporary directory for tessdata
    let temp_dir = TempDir::new().unwrap();
    let tessdata_path = temp_dir.path();

    // Create mock language files
    fs::write(tessdata_path.join("eng.traineddata"), "mock").unwrap();
    fs::write(tessdata_path.join("spa.traineddata"), "mock").unwrap();

    // Set environment variable (careful with parallel tests!)
    let tessdata_str = tessdata_path.to_string_lossy().to_string();
    std::env::set_var("TESSDATA_PREFIX", &tessdata_str);

    let ctx = TestContext::new().await;
    // ... rest of test
}

Best Practices for Environment Variables¶

Avoid Global State: Prefer passing configuration through constructors
Use TestContext: It provides isolation for most test scenarios

Serial Execution: For tests that must modify environment variables:

#[tokio::test]
#[serial]  // Using serial_test crate
async fn test_that_modifies_env() {
    // This test runs in isolation
}

Common Patterns¶

Authentication Test Pattern¶

#[tokio::test]
async fn test_authenticated_endpoint() {
    let ctx = TestContext::new().await;
    let auth_helper = TestAuthHelper::new(ctx.app().clone());

    // Create and login user
    let mut user = auth_helper.create_test_user().await;
    let token = user.login(&auth_helper).await.unwrap();

    // Make authenticated request
    let request = Request::builder()
        .method("GET")
        .uri("/api/protected")
        .header("Authorization", format!("Bearer {}", token))
        .body(Body::empty())
        .unwrap();

    let response = ctx.app().clone().oneshot(request).await.unwrap();
    assert_eq!(response.status(), StatusCode::OK);
}

Document Upload Pattern¶

#[tokio::test]
async fn test_document_upload() {
    let ctx = TestContext::new().await;
    let auth_helper = TestAuthHelper::new(ctx.app().clone());
    let mut user = auth_helper.create_test_user().await;
    let token = user.login(&auth_helper).await.unwrap();

    // Create multipart form
    let form = multipart::Form::new()
        .text("tags", "test,document")
        .part("file", multipart::Part::bytes(b"test content")
            .file_name("test.txt")
            .mime_str("text/plain").unwrap());

    // Upload document
    let response = reqwest::Client::new()
        .post("http://localhost:8000/api/documents")
        .header("Authorization", format!("Bearer {}", token))
        .multipart(form)
        .send()
        .await
        .unwrap();

    assert_eq!(response.status(), 201);
}

Database Direct Access Pattern¶

#[tokio::test]
async fn test_database_operations() {
    let ctx = TestContext::new().await;
    let user_id = Uuid::new_v4();

    // Direct database access
    sqlx::query!(
        "INSERT INTO users (id, username, email, password_hash, role) 
         VALUES ($1, $2, $3, $4, $5)",
        user_id,
        "testuser",
        "[email protected]",
        "hash",
        "user"
    )
    .execute(&ctx.state().db.pool)
    .await
    .unwrap();

    // Verify through API
    // ...
}

Troubleshooting¶

Common Test Failures¶

1. "Username already exists" Error¶

Cause: Parallel tests creating users with same username

Solution: TestAuthHelper now generates unique usernames with timestamps

// Automatic unique username generation
let username = format!("testuser_{}_{}_{}",
    std::process::id(),
    thread_id,
    timestamp_nanos
);

2. "Server is not running" (Integration Tests)¶

Cause: Tests expecting external server on localhost:8000

Solution: Use TestContext instead of external HTTP requests

// ❌ Wrong - expects external server
let response = reqwest::get("http://localhost:8000/api/health").await;

// ✅ Correct - uses TestContext
let response = ctx.app().clone()
    .oneshot(Request::builder()
        .uri("/api/health")
        .body(Body::empty())
        .unwrap())
    .await
    .unwrap();

3. OCR Language Validation Failures (400 errors)¶

Cause: TESSDATA_PREFIX environment variable conflicts

Solution: Use new_with_path() for custom tessdata directories

4. Database Connection Errors¶

Cause: PostgreSQL container not ready or migrations failed

Debug Steps:

# Check if tests can connect to database
RUST_LOG=debug cargo test

# Run single test with output
cargo test test_name -- --nocapture

# Check Docker containers
docker ps

Debugging Techniques¶

Enable Detailed Logging¶

# Full debug output
RUST_LOG=debug cargo test -- --nocapture

# Specific module logging
RUST_LOG=readur::routes=debug cargo test

# With backtrace
RUST_BACKTRACE=1 cargo test

Run Tests Serially¶

# Avoid parallel execution issues
cargo test -- --test-threads=1

Inspect Test Database¶

// Add debug queries in test
let count: i64 = sqlx::query_scalar("SELECT COUNT(*) FROM users")
    .fetch_one(&ctx.state().db.pool)
    .await
    .unwrap();
println!("User count: {}", count);

Best Practices¶

1. Use Unique Identifiers¶

Always use timestamps or UUIDs for test data:

let unique_id = Uuid::new_v4();
let unique_email = format!("test_{}@example.com", unique_id);

2. Clean Test State¶

TestContext automatically provides isolated databases, but clean up external resources:

// TempDir automatically cleans up
let temp_dir = TempDir::new().unwrap();
// Directory deleted when temp_dir drops

3. Test Both Success and Failure Cases¶

#[tokio::test]
async fn test_endpoint_success() {
    // Happy path test
}

#[tokio::test]
async fn test_endpoint_unauthorized() {
    // No auth token - expect 401
}

#[tokio::test]
async fn test_endpoint_not_found() {
    // Invalid ID - expect 404
}

4. Use Type-Safe Assertions¶

// Parse response to proper types
let body_bytes = axum::body::to_bytes(response.into_body(), usize::MAX)
    .await
    .unwrap();
let document: DocumentResponse = serde_json::from_slice(&body_bytes).unwrap();

// Now assertions are type-safe
assert_eq!(document.filename, "test.pdf");

5. Document Test Purpose¶

#[tokio::test]
async fn test_ocr_retry_with_multiple_languages() {
    // Tests that OCR retry endpoint accepts multiple language codes
    // and validates them against available tessdata files.
    // This ensures multi-language OCR support works correctly.
}

6. Avoid External Dependencies¶

Use TestContext instead of external servers
Mock external services when possible
Use in-memory databases for unit tests
Create test fixtures instead of relying on external files

7. Handle Async Properly¶

// Use tokio::test for async tests
#[tokio::test]
async fn test_async_operation() {
    // Can use .await here
}

// For timeout handling
use tokio::time::{timeout, Duration};

let result = timeout(
    Duration::from_secs(30),
    long_running_operation()
).await;

Test Organization¶

Directory Structure¶

readur/
├── src/
│   └── tests/          # Unit tests
│       ├── mod.rs
│       ├── auth_tests.rs
│       ├── db_tests.rs
│       └── ...
├── tests/              # Integration tests
│   ├── integration_ocr_language_endpoints.rs
│   ├── integration_settings_tests.rs
│   └── ...
└── frontend/
    └── src/
        └── __tests__/  # Frontend tests
            ├── components/
            └── pages/

Naming Conventions¶

Unit tests: test_<component>_<behavior>
Integration tests: test_<workflow>_<scenario>
Test files: integration_<feature>_tests.rs

Summary¶

The test infrastructure in Readur provides:

Isolation: Each test runs in its own environment
Realism: Integration tests use real databases and full app instances
Speed: Parallel execution with proper isolation
Reliability: Unique identifiers prevent conflicts
Maintainability: Clear patterns and utilities

Key takeaways: - Always use TestContext for integration tests - Generate unique test data to avoid conflicts - Be careful with environment variables in parallel tests - Use the provided test utilities for common operations - Test both success and failure scenarios

For more examples, see the existing test files in tests/ directory.