Multi-Language OCR Guide¶
Readur supports powerful multi-language OCR capabilities that allow you to process documents in multiple languages simultaneously for optimal text extraction accuracy.
🌍 Overview¶
The multi-language OCR system allows you to: - Process documents in up to 4 languages simultaneously for best results - Set preferred languages that apply to all your document uploads - Retry failed OCR with different language combinations - Automatically optimize text extraction by using multiple language models
🚀 Getting Started¶
Setting Your Language Preferences¶
- Navigate to Settings in your account
- Select OCR Languages section
- Choose up to 4 preferred languages - these will be used for all new uploads
- Set a primary language - this language gets processing priority
- Save your preferences
Example preferred language setup: - Primary: English (eng
) - Additional: Spanish (spa
), French (fra
) - Result: Documents processed with English priority, plus Spanish and French recognition
Language Selection During Upload¶
When uploading documents, you can:
- Use your default preferences - no action needed
- Override for specific documents:
- Click the language selector in the upload area
- Choose different languages for this upload session
- These languages will be applied to all files in the current upload
📋 Available Languages¶
Readur supports 67+ languages including:
Major World Languages¶
- English (
eng
) - Default and most reliable - Spanish (
spa
) - Excellent accuracy - French (
fra
) - High quality results - German (
deu
) - Strong performance - Italian (
ita
) - Good accuracy - Portuguese (
por
) - Reliable processing - Russian (
rus
) - Solid results
Asian Languages¶
- Chinese Simplified (
chi_sim
) - Chinese Traditional (
chi_tra
) - Japanese (
jpn
) - Korean (
kor
) - Hindi (
hin
) - Thai (
tha
) - Vietnamese (
vie
)
European Languages¶
- Dutch (
nld
) - Swedish (
swe
) - Norwegian (
nor
) - Danish (
dan
) - Finnish (
fin
) - Polish (
pol
) - Czech (
ces
)
And Many More¶
Including Arabic (ara
), Hebrew (heb
), Turkish (tur
), and dozens of other languages.
Tip: For the complete list of available languages, visit the OCR Languages page in your settings or call the API endpoint:
GET /api/ocr/languages
🛠️ Using the API¶
Get Available Languages¶
Response:
{
"available_languages": [
{
"code": "eng",
"name": "English",
"installed": true
},
{
"code": "spa",
"name": "Spanish",
"installed": true
}
],
"current_user_language": "eng"
}
Update Language Preferences¶
curl -X PUT \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"preferred_languages": ["eng", "spa", "fra"],
"primary_language": "eng"
}' \
https://your-readur-instance.com/api/settings
Retry OCR with Different Languages¶
curl -X POST \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"languages": ["eng", "deu"]
}' \
https://your-readur-instance.com/api/documents/DOCUMENT_ID/ocr/retry
🎯 Best Practices¶
Language Selection Strategy¶
For Mixed-Language Documents: - Choose 2-3 languages that appear in your document - Always include English as a fallback (most reliable) - Put the dominant language first as your primary language
Examples: - Business document with English/Spanish: ["eng", "spa"]
- European legal document: ["eng", "fra", "deu"]
- Academic paper with multiple references: ["eng", "spa", "ita"]
Performance Optimization¶
Do: - ✅ Limit to 2-4 languages for best performance - ✅ Include English when processing mixed content - ✅ Use specific language combinations for consistent document types - ✅ Set realistic expectations for complex multilingual documents
Don't: - ❌ Select languages not present in your documents - ❌ Use more than 4 languages simultaneously - ❌ Expect perfect results with very low-quality scans - ❌ Mix completely unrelated language families unnecessarily
🔄 Retrying OCR Processing¶
If OCR results are poor, you can retry with different languages:
Via Web Interface¶
- Navigate to the document with poor OCR results
- Click "Retry OCR" button
- Select different languages that better match your document
- Start retry process
Common Retry Scenarios¶
Scenario 1: Wrong Language Detected - Original: English-only processing of Spanish document - Solution: Retry with ["spa", "eng"]
Scenario 2: Mixed Language Document - Original: Single language processing - Solution: Add 2-3 relevant languages
Scenario 3: Poor Quality Scan - Original: Fast processing with limited languages - Solution: Try with primary language + English fallback
📊 Monitoring OCR Results¶
Understanding OCR Confidence¶
- 90%+ - Excellent results, high accuracy
- 70-89% - Good results, minor errors possible
- 50-69% - Moderate results, review recommended
- Below 50% - Poor results, consider retry with different languages
Language-Specific Performance¶
Different languages have varying accuracy rates: - Latin-based scripts (English, Spanish, French): Highest accuracy - Germanic languages (German, Dutch): Very good accuracy - Asian languages (Chinese, Japanese): Good accuracy with proper font recognition - Arabic/Hebrew scripts: Moderate accuracy, depends on text quality
🐛 Troubleshooting¶
Common Issues¶
Problem: "Language not available" error Solution: - Check language code spelling (e.g., eng
not english
) - Verify language is installed on the server - Contact administrator if language should be available
Problem: Poor OCR results despite correct language Solutions: - Ensure document scan quality is sufficient (300+ DPI recommended) - Try adding English as a fallback language - Consider document preprocessing (contrast, rotation correction) - Retry with fewer languages for better performance
Problem: Slow processing with multiple languages
Solutions: - Reduce number of selected languages to 2-3 - Use languages only present in your document - Consider processing during off-peak hours
Getting Help¶
If you're experiencing issues:
- Check the OCR Health page -
GET /api/ocr/health
- Review your language selection - ensure languages match document content
- Try with English fallback - adds reliability to processing
- Contact support with document ID and language combination used
🔮 Advanced Features¶
Planned Enhancements¶
- Auto-language detection: Automatic suggestion of optimal language combinations
- Custom language models: Upload your own specialized language data
- Batch language updates: Change languages for multiple documents at once
- Language-specific confidence thresholds: Fine-tune accuracy requirements per language
Integration Options¶
The multi-language OCR system integrates with: - Document management workflows - Automated processing pipelines
- Third-party applications via REST API - Webhook notifications for completion
📚 Additional Resources¶
- API Documentation: Complete endpoint reference
- Language Codes Reference: Full list of supported language codes
- Performance Guidelines: Optimization recommendations
- Migration Guide: Upgrading from single-language setup
Need Help? Contact support or check the system health dashboard for real-time OCR capability status.