📤 Smart File Upload Guide¶
Readur provides an intuitive drag-and-drop file upload system that supports multiple document formats and batch processing.
Supported File Types¶
-
PDF Files (.pdf)
Direct text extraction and OCR for scanned PDFs -
Images (.png, .jpg, .jpeg, .tiff, .bmp, .webp)
Full OCR text extraction -
Text Files (.txt, .rtf)
Direct text import -
Office Documents (.docx, .doc, .xlsx, .xls, .pptx, .ppt)
Text extraction and OCR
Upload Methods¶
Drag & Drop¶
- Navigate to the main dashboard
- Drag files from your computer directly onto the upload area
- Multiple files can be selected and dropped simultaneously
- Progress indicators show upload and processing status
Browse & Select¶
- Click the "Upload Documents" button
- Use the file browser to select one or multiple files
- Click "Open" to begin the upload process
Batch Processing¶
- Upload multiple files at once for efficient processing
- Each file is processed independently for OCR and text extraction
- Real-time status updates show processing progress
- Failed uploads can be retried individually
Processing Pipeline¶
- File Validation - Verify file type and size limits
- Enhanced File Type Detection (v2.5.4+) - Magic number detection using Rust 'infer' crate
- Storage - Secure file storage with backup (local or S3)
- OCR Processing - Automatic text extraction using Tesseract
- Indexing - Full-text search indexing in PostgreSQL
- Metadata Extraction - File properties and document information
Enhanced File Type Detection (v2.5.4+)¶
Readur now uses content-based file type detection rather than relying solely on file extensions:
- Magic Number Detection: Identifies files by their content signature, not just extension
- Broader Format Support: Automatically recognizes more document and image formats
- Security Enhancement: Prevents malicious files with incorrect extensions from being processed
- Performance: Fast, native Rust implementation for minimal overhead
Automatically Detected Formats: - Documents: PDF, DOCX, XLSX, PPTX, ODT, ODS, ODP - Images: PNG, JPEG, GIF, BMP, TIFF, WebP, HEIC - Archives: ZIP, RAR, 7Z, TAR, GZ - Text: TXT, MD, CSV, JSON, XML
This enhancement ensures files are correctly identified even when extensions are missing or incorrect, improving both reliability and security.
Best Practices¶
-
File Size
Keep individual files under 50MB for optimal performance -
File Names
Use descriptive names for better organization -
Batch Size
Upload 10-20 files at once for best performance -
Network
Stable internet connection recommended for large uploads
Troubleshooting¶
Upload Fails¶
- Check file size limits
- Verify file format is supported
- Ensure stable internet connection
- Try uploading fewer files at once
OCR Issues¶
- Ensure images have good contrast and resolution
- PDF files may need higher quality scans
- Check the OCR Optimization Guide for advanced tips
Security¶
- All uploads are scanned for malicious content
- Files are stored securely with proper access controls
- User permissions apply to all uploaded documents
- Automatic backup ensures data safety