Advanced Search Guide¶

Readur provides powerful search capabilities that go far beyond simple text matching. This comprehensive guide covers all search modes, advanced filtering, query syntax, and optimization techniques.

Table of Contents¶

Overview
Search Modes
Query Syntax
Advanced Filtering
Search Interface
Search Optimization
Saved Searches
Search Analytics
API Search
Troubleshooting

Overview¶

Readur's search system is built on PostgreSQL's full-text search capabilities with additional enhancements for document-specific requirements.

Search Capabilities¶

Full-Text Search: Search within document content and OCR-extracted text
Multiple Search Modes: Simple, phrase, fuzzy, and boolean search options
Advanced Filtering: Filter by file type, date, size, labels, and source
Real-Time Suggestions: Auto-complete and query suggestions as you type
Faceted Search: Browse documents by categories and properties
Cross-Language Support: Search in multiple languages with OCR text
Relevance Ranking: Intelligent scoring and result ordering

Search Sources¶

Readur searches across multiple content sources:

Document Content: Original text from text files and PDFs
OCR Text: Extracted text from images and scanned documents
Metadata: File names, descriptions, and document properties
Labels: User-created and system-generated tags
Source Information: Upload source and file paths

Search Modes¶

Simple Search (Smart Search)¶

Best for: General purpose searching and quick document discovery

How it works: - Automatically applies stemming and fuzzy matching - Searches across all text content and metadata - Provides intelligent relevance scoring - Handles common typos and variations

Example:

invoice 2024

Finds: "Invoice Q1 2024", "invoicing for 2024", "2024 invoice data"

Features:

The simple search mode includes powerful Auto-stemming capabilities, where "running" automatically matches variations like "run", "runs", and "runner". This linguistic intelligence extends to Fuzzy tolerance, correcting common typos so "recieve" successfully finds "receive".

Partial matching allows shorter queries to find longer terms - typing "doc" will match "document", "documentation", and other related terms. The system's Relevance ranking ensures the most pertinent results appear first, using sophisticated algorithms that consider term frequency, document importance, and contextual relevance. Additionally, the search engine applies Synonym expansion for common terms and includes Stop word filtering to improve search quality.

Phrase Search (Exact Match)¶

Best for: Finding exact phrases or specific terminology

How it works: - Searches for the exact sequence of words - Case-insensitive but order-sensitive - Useful for finding specific quotes, names, or technical terms

Syntax: Use quotes around the phrase

"quarterly financial report"
"John Smith"
"error code 404"

Features: - Exact word order: Only matches the precise sequence - Case insensitive: "John Smith" matches "john smith" - Punctuation ignored: "error-code" matches "error code"

Fuzzy Search (Approximate Matching)¶

Best for: Handling typos, OCR errors, and spelling variations

How it works: - Uses trigram similarity to find approximate matches - Configurable similarity threshold (default: 0.8) - Particularly useful for OCR-processed documents with errors

Syntax: Use the ~ operator

invoice~     # Finds "invoice", "invoce", "invoise"
contract~    # Finds "contract", "contarct", "conract"

Configuration:

Fuzzy search can be fine-tuned through Threshold adjustment in your user settings, allowing you to configure how sensitive the system is to spelling variations. This is particularly important because Language-specific requirements vary - for instance, Germanic languages might need different thresholds than Romance languages due to their structural differences.

The system provides OCR optimization with higher tolerance settings specifically for OCR-processed documents, which often contain character recognition errors. You can also configure Context-aware matching that adjusts fuzzy thresholds based on surrounding words, and set up Custom dictionaries for industry-specific terminology that might not be in standard dictionaries.

Boolean Search (Logical Operators)¶

Best for: Complex queries with multiple conditions and precise control

Operators:

Boolean search provides precise control through logical operators. The AND operator requires both terms to be present in matching documents, perfect for narrowing results to specific criteria. Conversely, the OR operator broadens your search by accepting documents containing either term.

For exclusion logic, the NOT operator removes documents containing unwanted terms from your results. To handle complex queries, Parentheses allow you to group conditions and control the order of operations. The system also supports NEAR operators for proximity searches and XOR for exclusive or logic, giving you complete control over your search logic.

Examples:

budget AND 2024                    # Both "budget" and "2024"
invoice OR receipt                  # Either "invoice" or "receipt"
contract NOT draft                  # "contract" but not "draft"
(budget OR financial) AND 2024      # Complex grouping
marketing AND (campaign OR strategy) # Marketing documents about campaigns or strategy

Advanced Boolean Examples:

# Find completed project documents
project AND (final OR completed OR approved) NOT draft

# Financial documents excluding personal items
(invoice OR receipt OR budget) NOT personal

# Recent important documents
(urgent OR priority OR critical) AND label:"this month"

Query Syntax¶

Field-Specific Search¶

Search within specific document fields for precise targeting.

Available Fields¶

Field	Description	Example
`filename:`	Search in file names	`filename:invoice`
`content:`	Search in document text	`content:"project status"`
`label:`	Search by labels	`label:urgent`
`type:`	Search by file type	`type:pdf`
`source:`	Search by upload source	`source:webdav`
`size:`	Search by file size	`size:>10MB`
`date:`	Search by date	`date:2024-01-01`

Field Search Examples¶

filename:contract AND date:2024        # Contracts from 2024
label:"high priority" OR label:urgent  # Priority documents
type:pdf AND content:budget            # PDF files containing "budget"
source:webdav AND label:approved       # Approved docs from WebDAV

Range Queries¶

Date Ranges¶

date:2024-01-01..2024-03-31    # Q1 2024 documents
date:>2024-01-01               # After January 1, 2024
date:<2024-12-31               # Before December 31, 2024

Size Ranges¶

size:1MB..10MB                 # Between 1MB and 10MB
size:>50MB                     # Larger than 50MB
size:<1KB                      # Smaller than 1KB

Wildcard Search¶

Use wildcards for partial matching:

proj*           # Matches "project", "projects", "projection"
*report         # Matches "annual report", "status report"
doc?ment        # Matches "document", "documents" (? = single character)

Exclusion Operators¶

Exclude unwanted results:

invoice -draft                 # Invoices but not drafts
budget NOT personal           # Budget documents excluding personal
-label:archive proposal       # Proposals not in archive

Advanced Filtering¶

File Type Filters¶

Filter by specific file formats:

Common File Types:

The file type filter supports a comprehensive range of formats. Documents include standard formats like PDF, DOC, DOCX, TXT, and RTF, covering most text-based files you'll encounter in a business environment.

Images are fully supported with common formats including PNG, JPG, JPEG, TIFF, BMP, and GIF - all of which can be processed through OCR for text extraction. For data analysis, Spreadsheets in XLS, XLSX, and CSV formats are searchable, with the system able to extract and index tabular data.

Presentations in PPT and PPTX formats are also indexed, including slide text and speaker notes. Beyond these categories, the system handles Archive files (ZIP, RAR, 7Z), Email formats (EML, MSG), and eBook formats (EPUB, MOBI) for comprehensive document management.

Filter Interface:

The filtering interface provides multiple ways to narrow your search by file type. You can use Checkbox Filters to quickly select or deselect multiple file types with a single click, making it easy to focus on specific document formats.

For broader categorization, MIME Type Groups let you filter by general categories like "all documents" or "all images" without selecting individual formats. The system also supports Custom Extensions where you can add specific file extensions that might be unique to your organization. Additionally, there's a Quick Toggle feature for common combinations and Saved Filter Sets that remember your frequently used filter configurations.

Search Syntax:

type:pdf                       # Only PDF files
type:(pdf OR doc)              # PDF or Word documents
-type:image                    # Exclude all images

Date and Time Filters¶

Predefined Ranges: - Today, Yesterday, This Week, Last Week - This Month, Last Month, This Quarter, Last Quarter - This Year, Last Year

Custom Date Ranges: - Start Date: Documents uploaded after specific date - End Date: Documents uploaded before specific date - Date Range: Documents within specific period

Advanced Date Syntax:

created:today                  # Documents uploaded today
modified:>2024-01-01          # Modified after January 1st
accessed:last-week            # Accessed in the last week

Size Filters¶

Size Categories:

File size filtering uses intuitive categories to help you find documents of specific sizes. Small files under 1MB typically include text documents, simple PDFs, and low-resolution images - perfect for quick reference materials.

The Medium category (1MB - 10MB) encompasses most business documents, including formatted reports, presentations, and high-quality images. Large files (10MB - 50MB) often contain detailed technical documentation, multi-page scanned documents, or multimedia presentations.

Documents in the Very Large category (over 50MB) usually include comprehensive manuals, video content, or high-resolution design files. The system also provides Micro (<100KB) and Gigantic (>500MB) categories for edge cases, along with custom size range inputs for precise filtering.

Custom Size Ranges:

size:>10MB                     # Larger than 10MB
size:1MB..5MB                  # Between 1MB and 5MB
size:<100KB                    # Smaller than 100KB

Label Filters¶

Label Selection: - Multiple Labels: Select multiple labels with AND/OR logic - Label Hierarchy: Navigate nested label structures - Label Suggestions: Auto-complete based on existing labels

Label Search Syntax:

label:project                  # Documents with "project" label
label:"high priority"          # Multi-word labels in quotes
label:(urgent OR critical)     # Documents with either label
-label:archive                 # Exclude archived documents

Source Filters¶

Filter by document source or origin:

Source Types:

Documents can enter your system through various channels, each tracked separately for filtering. Manual Upload identifies documents that users have uploaded directly through the web interface or API, typically one-off additions or user-generated content.

WebDAV Sync marks documents automatically synchronized from WebDAV servers like Nextcloud or ownCloud, often representing shared team resources. The Local Folder source indicates documents ingested from watched directories on the server, useful for automated workflows and bulk imports.

S3 Sync identifies documents pulled from Amazon S3 or compatible cloud storage, commonly used for large-scale document repositories. The system also tracks Email Attachments, API Uploads, and Migration Imports as distinct sources for complete visibility into document origins.

Source-Specific Filters:

source:webdav                  # WebDAV synchronized documents
source:manual                  # Manually uploaded documents
source:"My Nextcloud"          # Specific named source

OCR Status Filters¶

Filter by OCR processing status:

Status Options:

OCR status filtering helps you manage document processing states effectively. Documents marked as Completed have been successfully processed with text extraction finished and content indexed for searching.

The Pending status indicates documents still waiting in the OCR queue, which might need attention if the queue is backing up. Failed status highlights documents where OCR processing encountered errors - these might need manual review or reprocessing with different settings.

Documents marked Not Applicable are text-based files that don't require OCR processing, such as native PDFs with embedded text or plain text files. The system also tracks In Progress for actively processing documents, Partial for documents with mixed success, and Skipped for documents excluded by configuration rules.

OCR Quality Filters:

Filter documents based on OCR extraction confidence levels to focus on quality. High Confidence documents (over 90% confidence) contain reliably extracted text suitable for critical searches and automated workflows.

Medium Confidence results (70-90%) represent acceptable quality with occasional errors, typically from slightly degraded originals or handwritten sections. Documents with Low Confidence (below 70%) may contain significant extraction errors and often benefit from manual review or reprocessing.

The system provides additional quality indicators including Language Match confidence, Layout Preservation quality, and Character Recognition accuracy scores, helping you identify documents that might need attention or alternative processing strategies.

Search Interface¶

Global Search Bar¶

Location: Available in the header on all pages Features: - Real-time suggestions: Shows results as you type - Quick results: Top 5 matches with snippets - Fast navigation: Direct access to documents - Search history: Recent searches for quick access

Usage:

The global search bar provides instant access to your documents from anywhere in the application. Simply click on the search bar located in the header to activate it and place your cursor ready for input.

As you start typing your query, the system immediately begins processing, showing instant suggestions based on your input and search history. The dropdown displays the top matching results with highlighted snippets, giving you a preview of each document's relevance.

Clicking any result navigates directly to the document viewer, while pressing Enter takes you to the full search results page with all matches. The interface also supports keyboard shortcuts - use arrow keys to navigate suggestions, Tab to autocomplete, and Escape to close the search dropdown.

Advanced Search Page¶

Location: Dedicated search page with full interface Features: - Multiple search modes: Toggle between search types - Filter sidebar: All filtering options in one place - Result options: Sorting, pagination, view modes - Export capabilities: Export search results

Interface Sections:

Search Input Area¶

The search input area serves as your command center for building sophisticated queries. The Query builder provides visual query construction tools, including drag-and-drop operators and clickable filter tags that make complex searches accessible to all users.

A Mode selector lets you instantly switch between search types - simple for everyday use, phrase for exact matches, fuzzy for typo tolerance, and boolean for complex logic. The Suggestions system offers intelligent auto-complete based on your search history, popular queries, and document content.

Additionally, the interface includes a Query validator that checks syntax in real-time, Search templates for common query patterns, and a Query history dropdown for quick access to recent searches.

The filter sidebar consolidates all filtering options in an intuitive, collapsible panel. File type filters present checkboxes for different formats, organized by category with select-all options and quick presets for common combinations.

The Date range picker offers a calendar interface for precise date selection, including preset ranges like "Last 7 days" or "This quarter" for convenience. Size sliders provide visual range selection with logarithmic scaling to handle the wide range of file sizes effectively.

For organization, the Label selector displays your hierarchical label structure in a tree view, supporting multi-select with AND/OR logic. Source filters let you filter by upload source with usage statistics shown for each source. The sidebar also includes OCR status filters and Custom metadata fields for advanced filtering capabilities.

Results Area¶

The results area adapts to your preferred way of viewing search results. Sort options include relevance scoring, upload date, filename alphabetically, and file size, with secondary sort criteria available for tie-breaking.

View modes cater to different preferences - list view for maximum information density, grid view for visual browsing of documents with thumbnails, and detail view for in-depth document examination without leaving search results. The Pagination controls offer flexible navigation through result pages, with options for 10, 25, 50, or 100 results per page.

For data portability, Export options allow you to download search results in CSV format for spreadsheet analysis or JSON for programmatic processing. The results area also features Bulk actions for operating on multiple documents, Quick preview on hover, and Keyboard navigation for power users.

Search Results¶

Result Display Elements¶

Document Cards:

Each search result is presented as an information-rich card designed for quick evaluation. The Filename serves as the primary identifier, displayed prominently with file type icon and extension for immediate recognition.

A Snippet shows highlighted text excerpts where your search terms appear, with intelligent context extraction ensuring you see the most relevant portions of each document. The Metadata section efficiently displays essential information including file size, document type, upload date, and applied labels in a scannable format.

The Relevance Score provides a numerical ranking (0-100) helping you understand why certain documents rank higher than others. Quick Actions are available on hover or click, offering immediate access to download, view, or edit operations without leaving the search interface. Cards also display OCR confidence, Source information, and Last modified timestamps for complete context.

Highlighting:

Search result highlighting makes it easy to spot relevant content within documents. Search terms are prominently highlighted in yellow (or your chosen color scheme) throughout snippets and document previews, ensuring quick visual scanning.

The system includes sufficient Context around matched terms, showing surrounding sentences to help you understand the relevance without opening the document. When documents contain Multiple matches, all instances are highlighted with a match counter showing how many times your search terms appear.

Snippet length can be adjusted in user settings from compact 100-character excerpts to detailed 500-character passages based on your preference. The highlighting system also supports Synonym highlighting in different colors, Phrase boundary markers for exact matches, and Fuzzy match indicators showing approximate matches with confidence scores.

Result Sorting¶

Sort Options:

Flexible sorting options help you organize search results according to your needs. Relevance sorting (the default) uses sophisticated algorithms to place the best matches first, considering term frequency, document importance, and search context.

Date sorting can display newest or oldest documents first, essential for finding recent updates or historical documents. Filename ordering arranges results alphabetically, useful when you know partial filenames or want to group similar documents.

Size sorting helps identify the largest or smallest files, valuable for storage management or finding specific document types. Beyond these primary options, you can sort by Score for numerical relevance ranking, Modification time for recently edited documents, and Access frequency to surface popular documents.

Secondary Sorting: - Apply secondary criteria when primary sort values are equal - Example: Sort by relevance, then by date

Search Configuration¶

User Preferences¶

Search Settings (accessible via Settings → Search):

Personalize your search experience through comprehensive user preferences. Results per page can be set to 10, 25, 50, or 100 items depending on your screen size and browsing preference, with the system remembering your choice across sessions.

Snippet length options range from concise 100-character excerpts to detailed 500-character passages, letting you balance information density with readability. The Fuzzy threshold slider adjusts sensitivity for approximate matching, particularly useful if you frequently search OCR documents or deal with technical terms.

Your Default sort preference ensures results always appear in your preferred order, whether that's relevance, date, or another criterion. Search history can be enabled or disabled based on privacy preferences and workflow needs. Additional settings include Highlighting colors, Auto-suggestion delay, and Advanced mode defaults for power users.

Search Behavior¶

Configure how the search system responds to your interactions for an optimized experience. Auto-complete functionality can be toggled to show search suggestions as you type, drawing from your history, popular queries, and document content to speed up query creation.

Real-time search enables instant result updates as you modify your query, perfect for exploratory searching where you're refining terms to find the right documents. This feature can be disabled if you prefer to complete your query before searching.

Search highlighting ensures your search terms stand out in results, with customizable colors and styles to match your visual preferences. Context snippets control how much surrounding text appears with matches, helping you evaluate relevance without opening documents. The system also offers Spell checking with automatic correction suggestions, Search shortcuts for frequently used queries, and Predictive filtering that suggests relevant filters based on your query.

Search Optimization¶

Query Optimization¶

Best Practices¶

Use Specific Terms: More specific queries yield better results
```
Good: "quarterly sales report Q1"
Poor: "document"
```

Combine Search Modes: Use appropriate mode for your needs

Exact phrases: "status update"
Flexible terms: project~
Complex logic: (budget OR financial) AND 2024

Leverage Filters: Combine text search with filters

Query: budget
Filters: Type = PDF, Date = This Quarter, Label = Finance

Use Field Search: Target specific document aspects

filename:invoice date:2024
content:"project milestone" label:important

Performance Tips¶

Efficient Searching¶

Optimize your search strategy for best performance and results. Start Broad, Then Narrow by beginning with general terms to gauge the document landscape, then progressively add filters and specific terms to refine results to exactly what you need.

Use Filters Early in your search process - applying file type, date, or label filters before complex text queries reduces the dataset size and speeds up text matching. This is particularly effective when you know the general characteristics of the documents you're seeking.

When using wildcards, Avoid Wildcards at Start of terms as *report requires scanning all terms in the index, while report* can use the index efficiently. This seemingly small difference can impact search speed significantly on large document collections.

Combine Short Queries strategically - multiple focused terms often work better than long phrases, as they're more flexible in matching and allow the relevance algorithm to work effectively. Additionally, consider using field-specific searches to target particular document attributes, and leverage search templates for commonly repeated queries.

Search Index Optimization¶

The search system employs multiple automatic optimizations to ensure fast, accurate results. Frequent Terms that appear often in queries are specially indexed with optimized data structures for lightning-fast retrieval, reducing search latency for common queries.

Document Updates trigger immediate reindexing, ensuring new content becomes searchable within seconds of upload or modification. This real-time indexing eliminates the delays common in batch-processing systems.

Comprehensive Language Support includes language-specific stemming algorithms and analysis rules for over 20 languages, ensuring accurate search regardless of document language. The system automatically detects document language and applies appropriate processing.

Cache Management intelligently stores results from frequent searches, dramatically reducing response time for popular queries while managing memory usage efficiently. The system also performs Index compaction during low-usage periods, Query optimization through automatic rewriting, and Distributed indexing for large deployments to maintain consistent performance at scale.

OCR Search Optimization¶

Handling OCR Text¶

OCR-extracted text may contain errors that affect search:

Strategies: 1. Use Fuzzy Search: Handle OCR errors with approximate matching 2. Try Variations: Search for common OCR mistakes 3. Use Context: Include surrounding words for better matches 4. Check Original: Compare with original document when possible

Common OCR Issues:

Understanding typical OCR errors helps you craft better searches for scanned documents. Character confusion frequently occurs with similar-looking letter combinations - "m" might be read as "rn", "cl" as "d", or "li" as "h", particularly in lower-quality scans.

Word boundaries present another challenge where OCR might incorrectly split or merge words, reading "something" as "some thing" or "can not" as "cannot". These errors are especially common with justified text or unusual fonts.

Special characters including punctuation, symbols, and diacritical marks often get misread or omitted entirely, turning "don't" into "dont" or "café" into "cafe". The system also commonly encounters Case confusion where uppercase I becomes lowercase l, Number-letter swaps like 0/O or 1/I/l, and Ligature problems where connected letters in certain fonts get misinterpreted.

Optimization Examples:

# Original: "invoice"
# OCR might produce: "irwoice", "invoce", "mvoice"
# Solution: Use fuzzy search
invoice~

# Or search for context
"invoice number" OR "irwoice number" OR "invoce number"

Saved Searches¶

Creating Saved Searches¶

Build Your Query: Create a search with desired parameters
Test Results: Verify the search returns expected documents
Save Search: Click "Save Search" button
Name Search: Provide descriptive name
Configure Options: Set update frequency and notifications

Managing Saved Searches¶

Saved Search Features:

Saved searches transform one-time queries into powerful ongoing tools. Quick Access from the sidebar or dashboard means your most important searches are always one click away, eliminating the need to recreate complex queries.

Automatic Updates ensure your saved searches stay current - as new documents matching your criteria are added to the system, they automatically appear in saved search results. This creates dynamic document collections without manual maintenance.

The Shared Access capability (coming soon) will allow you to share carefully crafted searches with team members, ensuring everyone uses consistent search criteria for common tasks. Export Options enable automatic result export on a schedule, perfect for regular reporting needs.

Saved searches also support Change notifications alerting you when new matches appear, Version tracking to see how results change over time, and Search analytics showing usage patterns and result quality metrics.

Search Organization: - Categories: Group related searches - Favorites: Mark frequently used searches - Recent: Quick access to recently used searches

Smart Collections¶

Saved searches that automatically include new documents:

Examples: - "This Month's Reports": type:pdf AND content:report AND date:this-month - "Pending Review": label:"needs review" AND -label:completed - "High Priority Items": label:(urgent OR critical OR "high priority")

Search Analytics¶

Search Performance Metrics¶

Available Metrics:

Comprehensive analytics help you understand and optimize search usage across your organization. Query Performance tracking shows average response times broken down by query complexity, time of day, and result set size, helping identify optimization opportunities.

Popular Searches analysis reveals the most frequently used search terms and queries, providing insights into what information users need most often. This data can inform document organization, labeling strategies, and training priorities.

Result Quality metrics including click-through rates, dwell time, and refinement patterns indicate whether searches are successfully connecting users with needed documents. Low engagement might suggest indexing issues or user training needs.

Search Patterns analysis uncovers common search behaviors, query refinement sequences, and feature usage trends. The system also tracks Failed searches with no results, Search abandonment rates, and Filter usage patterns to provide a complete picture of search effectiveness.

User Search History¶

History Features:

Your search history becomes a powerful tool for improving search efficiency. Recent Searches provides instant access to previous queries through a dropdown menu, eliminating the need to retype complex searches and making it easy to revisit earlier research.

Search Suggestions leverage your personal search history along with successful searches from across the system to offer intelligent query recommendations as you type. These suggestions learn from your patterns over time, becoming more accurate and personalized.

Query Refinement tools analyze your search patterns to suggest improvements - if you frequently refine searches in similar ways, the system learns and suggests these refinements proactively. Export History functionality lets you download your complete search history for analysis, audit purposes, or migration to other systems.

The history system also includes Collaborative filtering to suggest searches based on similar users' patterns, Temporal analysis showing how your search needs change over time, and Privacy controls allowing you to clear or disable history tracking as needed.

API Search¶

Basic Search API¶

GET /api/search?query=invoice&limit=20
Authorization: Bearer <jwt_token>

Query Parameters: - query: Search query string - limit: Number of results (default: 50, max: 100) - offset: Pagination offset - sort: Sort order (relevance, date, filename, size)

Advanced Search API¶

POST /api/search/advanced
Authorization: Bearer <jwt_token>
Content-Type: application/json

{
  "query": "budget report",
  "mode": "phrase",
  "filters": {
    "file_types": ["pdf", "docx"],
    "labels": ["Q1 2024", "Finance"],
    "date_range": {
      "start": "2024-01-01",
      "end": "2024-03-31"
    },
    "size_range": {
      "min": 1048576,
      "max": 52428800
    }
  },
  "options": {
    "fuzzy_threshold": 0.8,
    "snippet_length": 200,
    "highlight": true
  }
}

Search Response Format¶

{
  "results": [
    {
      "id": "550e8400-e29b-41d4-a716-446655440000",
      "filename": "Q1_Budget_Report.pdf",
      "snippet": "The quarterly budget report shows a <mark>10% increase</mark> in revenue...",
      "score": 0.95,
      "highlights": ["budget", "report"],
      "metadata": {
        "size": 2048576,
        "type": "application/pdf",
        "uploaded_at": "2024-01-15T10:30:00Z",
        "labels": ["Q1 2024", "Finance", "Budget"],
        "source": "WebDAV Sync"
      }
    }
  ],
  "total": 42,
  "limit": 20,
  "offset": 0,
  "query_time": 0.085
}

Troubleshooting¶

Common Search Issues¶

No Results Found¶

Possible Causes:

When searches return no results, systematic troubleshooting usually reveals the issue. Typos in search queries are the most common cause - even small spelling errors can prevent matches if you're using exact or phrase search modes.

Your query might be Too Specific, combining multiple restrictive terms that no single document satisfies. This often happens when copying and pasting long phrases or using very technical terminology. Consider whether you're using the Wrong Mode - exact phrase searches won't find variations, while fuzzy search might be needed for OCR documents with recognition errors.

Filters can inadvertently exclude all results, especially when multiple filters combine with AND logic. Date ranges might be too narrow, or label filters might reference labels that have been renamed or deleted. It's also worth checking if Permission restrictions are limiting visible results, or if Index lag means recently uploaded documents aren't yet searchable.

Solutions:

Resolving "no results" issues requires a systematic approach to identify what's blocking matches. Simplify Query by starting with just one or two broad terms, then gradually add specificity once you confirm documents exist in your target area.

Check Spelling carefully, or switch to fuzzy search mode which tolerates typos and variations. This is particularly important for proper names, technical terms, or content extracted from OCR where errors are common.

Remove Filters systematically - temporarily disable all filters to see if results appear, then reapply them one at a time to identify which filter is too restrictive. Pay special attention to date ranges and source filters.

Try Synonyms and alternative phrasings for your concepts - what you call a "report" might be labeled as "summary", "analysis", or "review" in the actual documents. Additionally, consider searching for Related terms that often appear alongside your target content, use Wildcard searches to catch variations, and check the Search scope to ensure you're searching all available document fields.

Irrelevant Results¶

Possible Causes:

Receiving irrelevant results often stems from queries that cast too wide a net. Too Broad queries using generic terms like "document" or "file" match nearly everything in your repository, burying relevant results in noise.

Searching with Common Terms that appear in most documents - words like "page", "date", or "company" - dilutes result relevance. These terms add little discriminatory value to your search. Using the Wrong Mode can also cause relevance issues; fuzzy search might match too many variations when you need exact phrase matching for specific terminology.

Other factors include Missing context where single words lack the surrounding terms that would clarify intent, Outdated relevance when old but highly-referenced documents outrank newer relevant ones, and Language confusion where multilingual content causes unexpected matches.

Solutions:

Improving result relevance requires refining your search strategy to be more targeted. Add Specificity by including additional context terms that distinguish your desired documents from others - instead of just "budget", try "budget 2024 marketing" for precision.

Use Filters aggressively to narrow the result set before text matching occurs. File type, date ranges, and label filters can eliminate large swaths of irrelevant documents, letting text search focus on a smaller, more relevant set.

Phrase Search with quotation marks ensures multi-word concepts stay together, preventing matches where terms appear separately in unrelated contexts. This is essential for finding specific titles, names, or technical phrases.

Boolean Logic provides surgical precision in defining what should and shouldn't appear in results. Combine AND to require multiple concepts, OR for alternatives, and NOT to exclude irrelevant documents. Beyond these techniques, consider Field-specific searches to target just filenames or content, Relevance tuning in search settings, and Proximity operators to require terms appear near each other.

Slow Search Performance¶

Possible Causes:

Search performance can degrade for several reasons that are usually correctable. Complex Queries with deeply nested boolean logic, multiple wildcards, or extensive OR conditions require more processing time as the system evaluates numerous combinations.

Large Result Sets slow down both search execution and result rendering. When queries match thousands of documents, the system must score and sort all matches before displaying even the first page. Wildcard Overuse, particularly leading wildcards like *report, forces full index scans rather than efficient prefix matching.

Performance also suffers from Fuzzy search overhead with very low similarity thresholds, Uncached queries that are unique or rarely used, and Resource contention when multiple users run complex searches simultaneously.

Solutions:

Optimizing slow searches often involves adjusting your search strategy rather than waiting for results. Simplify Queries by breaking complex boolean expressions into multiple simpler searches, then combine results mentally or through saved searches.

Add Filters before text search to reduce the document pool being searched. Date ranges, file types, and source filters can eliminate 90% of documents before expensive text matching begins, dramatically improving speed.

Avoid Leading Wildcards which require examining every term in the index. Replace *report with report* or search for "report" without wildcards, using fuzzy matching if variation tolerance is needed.

Use Pagination effectively by requesting smaller result sets (25-50 results) rather than large sets (100+). Most relevant results appear early, so you rarely need to see everything at once. Additional optimizations include Caching frequent searches by saving them, Scheduling complex searches for off-peak hours, and Using search templates that are pre-optimized for common query patterns.

OCR Search Issues¶

OCR Text Not Searchable¶

Symptoms: Can't find text that's visible in document images Solutions: 1. Check OCR Status: Verify OCR processing completed 2. Retry OCR: Manually retry OCR processing 3. Use Fuzzy Search: OCR might have character recognition errors 4. Check Language Settings: Ensure correct OCR language is configured

Poor OCR Search Quality¶

Symptoms: Fuzzy search required for most queries on scanned documents Solutions: 1. Improve Source Quality: Use higher resolution scans (300+ DPI) 2. OCR Language: Verify correct language setting for documents 3. Image Enhancement: Enable OCR preprocessing options 4. Manual Correction: Consider manual text correction for important documents

Search Configuration Issues¶

Settings Not Applied¶

Symptoms: Search settings changes don't take effect Solutions: 1. Reload Page: Refresh browser to apply settings 2. Clear Cache: Clear browser cache and cookies 3. Check Permissions: Ensure user has permission to modify settings 4. Database Issues: Check if settings are being saved to database

Filter Problems¶

Symptoms: Filters not working as expected Solutions: 1. Clear All Filters: Reset filters and apply one at a time 2. Check Filter Logic: Ensure AND/OR logic is correct 3. Label Validation: Verify labels exist and are spelled correctly 4. Date Format: Ensure dates are in correct format

Next Steps¶

Explore labels and organization for better search categorization
Set up sources for automatic content ingestion
Review user guide for general search tips
Check API reference for programmatic search integration
Configure OCR optimization for better text extraction

Advanced Search Guide¶

Table of Contents¶

Overview¶

Search Capabilities¶

Search Sources¶

Search Modes¶

Simple Search (Smart Search)¶

Phrase Search (Exact Match)¶

Fuzzy Search (Approximate Matching)¶

Boolean Search (Logical Operators)¶

Query Syntax¶

Field-Specific Search¶

Available Fields¶

Field Search Examples¶

Range Queries¶

Date Ranges¶

Size Ranges¶

Wildcard Search¶

Exclusion Operators¶

Advanced Filtering¶

File Type Filters¶

Date and Time Filters¶

Size Filters¶

Label Filters¶

Source Filters¶

OCR Status Filters¶

Search Interface¶

Global Search Bar¶

Advanced Search Page¶

Search Input Area¶

Filter Sidebar¶

Results Area¶

Search Results¶

Result Display Elements¶

Result Sorting¶

Search Configuration¶

User Preferences¶

Search Behavior¶

Search Optimization¶

Query Optimization¶

Best Practices¶

Performance Tips¶

Efficient Searching¶

Efficient Searching¶

Search Index Optimization¶

Search Index Optimization¶

OCR Search Optimization¶

Handling OCR Text¶

Saved Searches¶

Creating Saved Searches¶

Managing Saved Searches¶

Smart Collections¶

Search Analytics¶

Search Performance Metrics¶

User Search History¶

API Search¶

Basic Search API¶

Advanced Search API¶

Search Response Format¶

Troubleshooting¶

Common Search Issues¶

No Results Found¶

No Results Found¶

Irrelevant Results¶

Irrelevant Results¶

Slow Search Performance¶

Slow Search Performance¶

OCR Search Issues¶

OCR Text Not Searchable¶

Poor OCR Search Quality¶

Search Configuration Issues¶

Settings Not Applied¶

Filter Problems¶

Next Steps¶