Multi-Language OCR Guide¶

Readur supports powerful multi-language OCR capabilities that allow you to process documents in multiple languages simultaneously for optimal text extraction accuracy.

🌍 Overview¶

The multi-language OCR system allows you to: - Process documents in up to 4 languages simultaneously for best results - Set preferred languages that apply to all your document uploads - Retry failed OCR with different language combinations - Automatically optimize text extraction by using multiple language models

🚀 Getting Started¶

Setting Your Language Preferences¶

Navigate to Settings in your account
Select OCR Languages section
Choose up to 4 preferred languages - these will be used for all new uploads
Set a primary language - this language gets processing priority
Save your preferences

Example preferred language setup: - Primary: English (eng) - Additional: Spanish (spa), French (fra) - Result: Documents processed with English priority, plus Spanish and French recognition

Language Selection During Upload¶

When uploading documents, you can:

Use your default preferences - no action needed
Override for specific documents:
Click the language selector in the upload area
Choose different languages for this upload session
These languages will be applied to all files in the current upload

📋 Available Languages¶

Readur supports 67+ languages including:

Major World Languages¶

English (eng) - Default and most reliable
Spanish (spa) - Excellent accuracy
French (fra) - High quality results
German (deu) - Strong performance
Italian (ita) - Good accuracy
Portuguese (por) - Reliable processing
Russian (rus) - Solid results

Asian Languages¶

Chinese Simplified (chi_sim)
Chinese Traditional (chi_tra)
Japanese (jpn)
Korean (kor)
Hindi (hin)
Thai (tha)
Vietnamese (vie)

European Languages¶

Dutch (nld)
Swedish (swe)
Norwegian (nor)
Danish (dan)
Finnish (fin)
Polish (pol)
Czech (ces)

And Many More¶

Including Arabic (ara), Hebrew (heb), Turkish (tur), and dozens of other languages.

Tip: For the complete list of available languages, visit the OCR Languages page in your settings or call the API endpoint: GET /api/ocr/languages

🛠️ Using the API¶

Get Available Languages¶

curl -H "Authorization: Bearer YOUR_TOKEN" \
     https://your-readur-instance.com/api/ocr/languages

Response:

{
  "available_languages": [
    {
      "code": "eng",
      "name": "English",
      "installed": true
    },
    {
      "code": "spa", 
      "name": "Spanish",
      "installed": true
    }
  ],
  "current_user_language": "eng"
}

Update Language Preferences¶

curl -X PUT \
     -H "Authorization: Bearer YOUR_TOKEN" \
     -H "Content-Type: application/json" \
     -d '{
       "preferred_languages": ["eng", "spa", "fra"],
       "primary_language": "eng"
     }' \
     https://your-readur-instance.com/api/settings

Retry OCR with Different Languages¶

curl -X POST \
     -H "Authorization: Bearer YOUR_TOKEN" \
     -H "Content-Type: application/json" \
     -d '{
       "languages": ["eng", "deu"]
     }' \
     https://your-readur-instance.com/api/documents/DOCUMENT_ID/ocr/retry

🎯 Best Practices¶

Language Selection Strategy¶

For Mixed-Language Documents: - Choose 2-3 languages that appear in your document - Always include English as a fallback (most reliable) - Put the dominant language first as your primary language

Examples: - Business document with English/Spanish: ["eng", "spa"] - European legal document: ["eng", "fra", "deu"] - Academic paper with multiple references: ["eng", "spa", "ita"]

Performance Optimization¶

Do: - ✅ Limit to 2-4 languages for best performance - ✅ Include English when processing mixed content - ✅ Use specific language combinations for consistent document types - ✅ Set realistic expectations for complex multilingual documents

Don't: - ❌ Select languages not present in your documents - ❌ Use more than 4 languages simultaneously - ❌ Expect perfect results with very low-quality scans - ❌ Mix completely unrelated language families unnecessarily

🔄 Retrying OCR Processing¶

If OCR results are poor, you can retry with different languages:

Via Web Interface¶

Navigate to the document with poor OCR results
Click "Retry OCR" button
Select different languages that better match your document
Start retry process

Common Retry Scenarios¶

Scenario 1: Wrong Language Detected - Original: English-only processing of Spanish document - Solution: Retry with ["spa", "eng"]

Scenario 2: Mixed Language Document - Original: Single language processing - Solution: Add 2-3 relevant languages

Scenario 3: Poor Quality Scan - Original: Fast processing with limited languages - Solution: Try with primary language + English fallback

📊 Monitoring OCR Results¶

Understanding OCR Confidence¶

90%+ - Excellent results, high accuracy
70-89% - Good results, minor errors possible
50-69% - Moderate results, review recommended
Below 50% - Poor results, consider retry with different languages

Language-Specific Performance¶

Different languages have varying accuracy rates: - Latin-based scripts (English, Spanish, French): Highest accuracy - Germanic languages (German, Dutch): Very good accuracy - Asian languages (Chinese, Japanese): Good accuracy with proper font recognition - Arabic/Hebrew scripts: Moderate accuracy, depends on text quality

🐛 Troubleshooting¶

Common Issues¶

Problem: "Language not available" error Solution: - Check language code spelling (e.g., eng not english) - Verify language is installed on the server - Contact administrator if language should be available

Problem: Poor OCR results despite correct language Solutions: - Ensure document scan quality is sufficient (300+ DPI recommended) - Try adding English as a fallback language - Consider document preprocessing (contrast, rotation correction) - Retry with fewer languages for better performance

Problem: Slow processing with multiple languages
Solutions: - Reduce number of selected languages to 2-3 - Use languages only present in your document - Consider processing during off-peak hours

Getting Help¶

If you're experiencing issues:

Check the OCR Health page - GET /api/ocr/health
Review your language selection - ensure languages match document content
Try with English fallback - adds reliability to processing
Contact support with document ID and language combination used

🔮 Advanced Features¶

Planned Enhancements¶

Auto-language detection: Automatic suggestion of optimal language combinations
Custom language models: Upload your own specialized language data
Batch language updates: Change languages for multiple documents at once
Language-specific confidence thresholds: Fine-tune accuracy requirements per language

Integration Options¶

The multi-language OCR system integrates with: - Document management workflows - Automated processing pipelines
- Third-party applications via REST API - Webhook notifications for completion

📚 Additional Resources¶

API Documentation: Complete endpoint reference
Language Codes Reference: Full list of supported language codes
Performance Guidelines: Optimization recommendations
Migration Guide: Upgrading from single-language setup

Need Help? Contact support or check the system health dashboard for real-time OCR capability status.