Skip to content

Multi-Language OCR Guide

Readur supports powerful multi-language OCR capabilities that allow you to process documents in multiple languages simultaneously for optimal text extraction accuracy.

🌍 Overview

The multi-language OCR system allows you to: - Process documents in up to 4 languages simultaneously for best results - Set preferred languages that apply to all your document uploads - Retry failed OCR with different language combinations - Automatically optimize text extraction by using multiple language models

🚀 Getting Started

Setting Your Language Preferences

  1. Navigate to Settings in your account
  2. Select OCR Languages section
  3. Choose up to 4 preferred languages - these will be used for all new uploads
  4. Set a primary language - this language gets processing priority
  5. Save your preferences

Example preferred language setup: - Primary: English (eng) - Additional: Spanish (spa), French (fra) - Result: Documents processed with English priority, plus Spanish and French recognition

Language Selection During Upload

When uploading documents, you can:

  1. Use your default preferences - no action needed
  2. Override for specific documents:
  3. Click the language selector in the upload area
  4. Choose different languages for this upload session
  5. These languages will be applied to all files in the current upload

📋 Available Languages

Readur supports 67+ languages including:

Major World Languages

  • English (eng) - Default and most reliable
  • Spanish (spa) - Excellent accuracy
  • French (fra) - High quality results
  • German (deu) - Strong performance
  • Italian (ita) - Good accuracy
  • Portuguese (por) - Reliable processing
  • Russian (rus) - Solid results

Asian Languages

  • Chinese Simplified (chi_sim)
  • Chinese Traditional (chi_tra)
  • Japanese (jpn)
  • Korean (kor)
  • Hindi (hin)
  • Thai (tha)
  • Vietnamese (vie)

European Languages

  • Dutch (nld)
  • Swedish (swe)
  • Norwegian (nor)
  • Danish (dan)
  • Finnish (fin)
  • Polish (pol)
  • Czech (ces)

And Many More

Including Arabic (ara), Hebrew (heb), Turkish (tur), and dozens of other languages.

Tip: For the complete list of available languages, visit the OCR Languages page in your settings or call the API endpoint: GET /api/ocr/languages

🛠️ Using the API

Get Available Languages

curl -H "Authorization: Bearer YOUR_TOKEN" \
     https://your-readur-instance.com/api/ocr/languages

Response:

{
  "available_languages": [
    {
      "code": "eng",
      "name": "English",
      "installed": true
    },
    {
      "code": "spa", 
      "name": "Spanish",
      "installed": true
    }
  ],
  "current_user_language": "eng"
}

Update Language Preferences

curl -X PUT \
     -H "Authorization: Bearer YOUR_TOKEN" \
     -H "Content-Type: application/json" \
     -d '{
       "preferred_languages": ["eng", "spa", "fra"],
       "primary_language": "eng"
     }' \
     https://your-readur-instance.com/api/settings

Retry OCR with Different Languages

curl -X POST \
     -H "Authorization: Bearer YOUR_TOKEN" \
     -H "Content-Type: application/json" \
     -d '{
       "languages": ["eng", "deu"]
     }' \
     https://your-readur-instance.com/api/documents/DOCUMENT_ID/ocr/retry

🎯 Best Practices

Language Selection Strategy

For Mixed-Language Documents: - Choose 2-3 languages that appear in your document - Always include English as a fallback (most reliable) - Put the dominant language first as your primary language

Examples: - Business document with English/Spanish: ["eng", "spa"] - European legal document: ["eng", "fra", "deu"] - Academic paper with multiple references: ["eng", "spa", "ita"]

Performance Optimization

Do: - ✅ Limit to 2-4 languages for best performance - ✅ Include English when processing mixed content - ✅ Use specific language combinations for consistent document types - ✅ Set realistic expectations for complex multilingual documents

Don't: - ❌ Select languages not present in your documents - ❌ Use more than 4 languages simultaneously - ❌ Expect perfect results with very low-quality scans - ❌ Mix completely unrelated language families unnecessarily

🔄 Retrying OCR Processing

If OCR results are poor, you can retry with different languages:

Via Web Interface

  1. Navigate to the document with poor OCR results
  2. Click "Retry OCR" button
  3. Select different languages that better match your document
  4. Start retry process

Common Retry Scenarios

Scenario 1: Wrong Language Detected - Original: English-only processing of Spanish document - Solution: Retry with ["spa", "eng"]

Scenario 2: Mixed Language Document - Original: Single language processing - Solution: Add 2-3 relevant languages

Scenario 3: Poor Quality Scan - Original: Fast processing with limited languages - Solution: Try with primary language + English fallback

📊 Monitoring OCR Results

Understanding OCR Confidence

  • 90%+ - Excellent results, high accuracy
  • 70-89% - Good results, minor errors possible
  • 50-69% - Moderate results, review recommended
  • Below 50% - Poor results, consider retry with different languages

Language-Specific Performance

Different languages have varying accuracy rates: - Latin-based scripts (English, Spanish, French): Highest accuracy - Germanic languages (German, Dutch): Very good accuracy - Asian languages (Chinese, Japanese): Good accuracy with proper font recognition - Arabic/Hebrew scripts: Moderate accuracy, depends on text quality

🐛 Troubleshooting

Common Issues

Problem: "Language not available" error Solution: - Check language code spelling (e.g., eng not english) - Verify language is installed on the server - Contact administrator if language should be available

Problem: Poor OCR results despite correct language Solutions: - Ensure document scan quality is sufficient (300+ DPI recommended) - Try adding English as a fallback language - Consider document preprocessing (contrast, rotation correction) - Retry with fewer languages for better performance

Problem: Slow processing with multiple languages
Solutions: - Reduce number of selected languages to 2-3 - Use languages only present in your document - Consider processing during off-peak hours

Getting Help

If you're experiencing issues:

  1. Check the OCR Health page - GET /api/ocr/health
  2. Review your language selection - ensure languages match document content
  3. Try with English fallback - adds reliability to processing
  4. Contact support with document ID and language combination used

🔮 Advanced Features

Planned Enhancements

  • Auto-language detection: Automatic suggestion of optimal language combinations
  • Custom language models: Upload your own specialized language data
  • Batch language updates: Change languages for multiple documents at once
  • Language-specific confidence thresholds: Fine-tune accuracy requirements per language

Integration Options

The multi-language OCR system integrates with: - Document management workflows - Automated processing pipelines
- Third-party applications via REST API - Webhook notifications for completion

📚 Additional Resources

  • API Documentation: Complete endpoint reference
  • Language Codes Reference: Full list of supported language codes
  • Performance Guidelines: Optimization recommendations
  • Migration Guide: Upgrading from single-language setup

Need Help? Contact support or check the system health dashboard for real-time OCR capability status.