Skip to content

Watch Folder Guide

The watch folder feature automatically monitors a directory for new files and processes them with OCR, making them searchable in Readur. Your original files are never modified or deleted - Readur simply copies and processes them while leaving the originals untouched.

What is Watch Folder?

Watch folder allows you to: - Drop files anywhere - Point Readur to any folder (local, network drive, cloud mount) - Automatic processing - New files are automatically detected and processed - Non-destructive - Original files remain exactly where you put them - Background operation - Processing happens in the background while you continue working

Perfect for scenarios where you want to automatically process files from: - Network drives (NFS, SMB shares) - Cloud storage mounts (Google Drive, Dropbox, OneDrive) - Local folders where you save scanned documents - Shared team folders

How It Works

  1. Point Readur to your folder - Set the WATCH_FOLDER path to any directory you want monitored
  2. Drop files - Add documents to that folder (PDFs, images, text files, Word docs)
  3. Automatic detection - Readur notices new files within seconds (local) or minutes (network)
  4. OCR processing - Files are automatically processed to extract searchable text
  5. Search and find - Your documents become searchable in the Readur web interface

Key Features

Works with any storage type - Local drives, network shares, cloud mounts
Smart processing - Only processes supported file types
Duplicate prevention - Won't process the same file twice
Safe operation - Never modifies or deletes your original files
Background processing - Doesn't interrupt your workflow

Quick Setup

Basic Setup (Docker Compose)

  1. Edit your docker-compose.yml:

    services:
      readur:
        image: readur:latest
        volumes:
          # Mount your folder to the watch directory
          - /path/to/your/documents:/app/watch
        environment:
          - WATCH_FOLDER=/app/watch
    

  2. Start Readur:

    docker compose up -d
    

  3. Start dropping files into /path/to/your/documents - they'll be automatically processed!

Configuration Options

Setting Default What it does
WATCH_FOLDER ./watch Which folder to monitor
WATCH_INTERVAL_SECONDS 30 How often to check for new files (network drives)
MAX_FILE_AGE_HOURS (none) Ignore files older than this
ALLOWED_FILE_TYPES pdf,png,jpg,jpeg,tiff,bmp,txt,doc,docx Which file types to process

Usage

Basic Setup

  1. Set the watch folder path:

    export WATCH_FOLDER=/path/to/your/mounted/folder
    

  2. Start the application:

    ./readur
    

  3. Copy files to the watch folder: The application will automatically detect and process new files.

Docker Usage

# Mount your folder to the container's watch directory
docker run -d \
  -v /path/to/your/files:/app/watch \
  -e WATCH_FOLDER=/app/watch \
  -e WATCH_INTERVAL_SECONDS=60 \
  readur:latest

Docker Compose

services:
  readur:
    image: readur:latest
    volumes:
      - /mnt/nfs/documents:/app/watch
      - readur_uploads:/app/uploads
    environment:
      WATCH_FOLDER: /app/watch
      WATCH_INTERVAL_SECONDS: 30
      FILE_STABILITY_CHECK_MS: 1000
      MAX_FILE_AGE_HOURS: 168  # 1 week
    ports:
      - "8000:8000"

Filesystem-Specific Configuration

NFS Mounts

# Recommended settings for NFS
export WATCH_INTERVAL_SECONDS=60
export FILE_STABILITY_CHECK_MS=1000
export FORCE_POLLING_WATCH=1

SMB/CIFS Mounts

# Recommended settings for SMB
export WATCH_INTERVAL_SECONDS=30
export FILE_STABILITY_CHECK_MS=2000

S3 Mounts (s3fs, goofys, etc.)

# Recommended settings for S3
export WATCH_INTERVAL_SECONDS=120
export FILE_STABILITY_CHECK_MS=5000
export FORCE_POLLING_WATCH=1

Local Filesystems

# Optimal settings for local storage (default behavior)
# No special configuration needed - uses inotify automatically

Supported File Types

The watch folder processes these file types for OCR:

  • PDF: *.pdf
  • Images: *.png, *.jpg, *.jpeg, *.tiff, *.bmp, *.gif
  • Text: *.txt
  • Word Documents: *.doc, *.docx

File Processing Priority

Files are prioritized for OCR processing based on:

  1. File Size: Smaller files get higher priority
  2. File Type: Images > Text files > PDFs > Word documents
  3. Queue Time: Older items get higher priority within the same size/type category

Monitoring and Logs

The application provides detailed logging for watch folder operations:

INFO  readur::watcher: Starting hybrid folder watcher on: /app/watch
INFO  readur::watcher: Using watch strategy: Hybrid
INFO  readur::watcher: Started polling-based watcher on: /app/watch
INFO  readur::watcher: Processing new file: "/app/watch/document.pdf"
INFO  readur::watcher: Successfully queued file for OCR: document.pdf (size: 2048 bytes)

Troubleshooting

Files Not Being Detected

  1. Check permissions:

    ls -la /path/to/watch/folder
    chmod 755 /path/to/watch/folder
    

  2. Verify file types:

    # Only supported file types are processed
    echo $ALLOWED_FILE_TYPES
    

  3. Check file stability:

    # Increase stability check time for slow networks
    export FILE_STABILITY_CHECK_MS=2000
    

High CPU Usage

  1. Increase polling interval:

    export WATCH_INTERVAL_SECONDS=120
    

  2. Limit file age:

    export MAX_FILE_AGE_HOURS=24
    

Network Mount Issues

  1. Force polling mode:

    export FORCE_POLLING_WATCH=1
    

  2. Increase stability check:

    export FILE_STABILITY_CHECK_MS=5000
    

Testing

Use the provided test script to verify functionality:

./test_watch_folder.sh

This creates sample files in the watch folder for testing.

Security Considerations

  • Files are copied to a secure upload directory, not processed in-place
  • Original files in the watch folder are never modified or deleted
  • System files and hidden files are automatically excluded
  • File size limits prevent processing of excessively large files (>500MB)

Performance

  • Local filesystems: Near-instant detection via inotify
  • Network filesystems: Detection within polling interval (default 30s)
  • Concurrent processing: Multiple files processed simultaneously
  • Memory efficient: Streams large files without loading entirely into memory

Examples

Basic File Drop

# Copy a file to the watch folder
cp document.pdf /app/watch/
# File will be automatically detected and processed

Batch Processing

# Copy multiple files
cp *.pdf /app/watch/
# All supported files will be queued for processing

Real-time Monitoring

# Watch the logs for processing updates
docker logs -f readur-container | grep watcher