Architecture¶

OneSearch's default Docker setup runs as one container: nginx, the FastAPI backend, and managed Meilisearch supervised together. The legacy two-container setup is still available for installs that want external Meilisearch.

High-Level Overview¶

OneSearch consists of three main runtime pieces:

nginx serves the React frontend and proxies API requests to the backend. Users only interact with nginx on port 8000.

Backend (FastAPI) handles indexing and search requests. It walks your file system, extracts content from documents, and talks to Meilisearch.

Meilisearch is the search engine. It stores the full-text index and handles search queries with typo tolerance and relevance ranking.

In the default managed setup, Meilisearch listens on 127.0.0.1:7700 inside the app container and is not exposed to the host.

Container Layout¶

┌─────────────────────────────────────────────┐
│             onesearch container             │
│  ┌───────────────────────────────────────┐  │
│  │              supervisord              │  │
│  │  ┌──────────┐ ┌─────────┐ ┌────────┐  │  │
│  │  │  nginx   │ │ uvicorn │ │ meili  │  │  │
│  │  │  :8000   │ │ :8001   │ │ :7700  │  │  │
│  │  │ frontend │ │ backend │ │ local  │  │  │
│  │  └──────────┘ └─────────┘ └────────┘  │  │
│  └───────────────────────────────────────┘  │
└─────────────────────────────────────────────┘

Supervisord manages nginx, uvicorn, and managed Meilisearch inside the OneSearch container. In legacy mode, Meilisearch runs as a separate container or external service instead.

Data Flow¶

Indexing and search:

Indexing Flow¶

User adds a source via the web UI, CLI, or API
Source configuration (name, path, patterns) gets stored in SQLite
User triggers reindex
Scanner walks the directory and applies glob patterns (include/exclude)
For each file:
Check if it changed by comparing modified time, size, and hash with the indexed_files table
If changed, extract content using the appropriate extractor
Send normalized document to Meilisearch
Update indexed_files table with metadata
Search queries go to Meilisearch, which returns results with highlighted snippets

Reindexing a large library is slow, so OneSearch tracks file metadata in SQLite and only processes files that changed. Each file type has its own extractor (text, markdown, PDF, Office docs) that returns the same normalized document structure. Meilisearch handles search: typo tolerance and relevance ranking out of the box.

Database Schema¶

OneSearch uses SQLite for metadata. The main tables are:

sources¶

Stores source configurations.

Column	Type	Description
id	TEXT	Primary key (user-defined or auto-generated)
name	TEXT	Display name
root_path	TEXT	Container path to index
include_patterns	TEXT	JSON array of glob patterns, stored as text
exclude_patterns	TEXT	JSON array of glob patterns, stored as text
scan_schedule	TEXT	Cron expression or preset (`@hourly`, `@daily`, `@weekly`)
last_scan_at	DATETIME	Last completed scan timestamp
next_scan_at	DATETIME	Next scheduled scan timestamp
created_at	DATETIME	Creation timestamp
updated_at	DATETIME	Last update timestamp

indexed_files¶

Tracks all indexed files for incremental updates.

Column	Type	Description
id	INTEGER	Primary key
source_id	TEXT	Foreign key to sources
path	TEXT	Full file path
size_bytes	INTEGER	File size in bytes
modified_at	DATETIME	File modified timestamp
indexed_at	DATETIME	When we indexed it
hash	TEXT	SHA256 hash of path (for document ID)
status	TEXT	success, failed, skipped
error_message	TEXT	Error if failed

Unique constraint on (source_id, path) prevents duplicates.

Meilisearch Document Schema¶

Every document in Meilisearch follows this structure:

{
  "id": "source1--a1b2c3d4e5f6",
  "source_id": "source1",
  "source_name": "NAS Documents",
  "path": "/path/to/file.pdf",
  "basename": "file.pdf",
  "extension": "pdf",
  "type": "pdf",
  "size_bytes": 123456,
  "modified_at": 1732896000,
  "indexed_at": 1732896000,
  "content": "Full extracted text content...",
  "title": "Optional document title",
  "metadata": {}
}

Document IDs use the format {source_id}--{sha256_hash[:12]} where the hash is derived from the file path. This avoids Meilisearch character restrictions and prevents ID collisions.

Searchable fields: content, basename, path, title

Filterable fields: source_id, type, extension, modified_at

Extractor System¶

Extractors live in backend/app/extractors/ and follow a simple pattern:

base.py defines the abstract BaseExtractor class. All extractors inherit from it.

Concrete extractors: - text.py - Plain text files with encoding detection - markdown.py - Markdown with YAML front-matter parsing - pdf.py - PDFs using pypdf for text extraction - office.py - Word, Excel, PowerPoint using python-docx, openpyxl, python-pptx - rtf.py, epub.py, subtitles.py, comic.py - rich document and archive-like formats - images.py, media.py, metadata.py - images, RAW photos, audio/video metadata, and metadata-only fallback

Each extractor: - Takes a file path - Returns a normalized Document object - Has timeout protection (corrupt or huge files won't hang indexing) - Handles errors gracefully (failed files get logged, indexing continues)

Adding new file format support means creating a new extractor and registering it with the extractor registry.

Backend Structure¶

The FastAPI application is organized into layers:

backend/app/
├── main.py              # FastAPI app setup, CORS, static files
├── config.py            # Settings from environment variables
├── models.py            # SQLAlchemy ORM models
├── schemas.py           # Pydantic request/response schemas
├── api/                 # API route handlers
│   ├── search.py        # POST /api/search
│   ├── sources.py       # CRUD for /api/sources
│   └── status.py        # GET /api/health, /api/status
├── services/            # Business logic
│   ├── indexer.py       # Orchestrates indexing
│   ├── scanner.py       # File system walker
│   └── search.py        # Meilisearch client wrapper
├── extractors/          # Document parsers
└── db/
    └── database.py      # SQLAlchemy setup

API routes are thin handlers. Business logic lives in services, models stay separate from request schemas. FastAPI's DI system injects database sessions into route handlers.

Frontend Structure¶

React SPA using functional components and hooks:

frontend/src/
├── main.tsx             # Entry point
├── App.tsx              # Router + TanStack Query provider
├── pages/
│   ├── SearchPage.tsx   # Main search (/)
│   ├── DocumentPage.tsx # Document preview
│   └── admin/
│       ├── SourcesPage.tsx   # Manage sources
│       └── StatusPage.tsx    # Indexing status
├── components/
│   ├── SearchBox.tsx
│   ├── ResultCard.tsx
│   ├── SourceForm.tsx
│   └── ui/              # shadcn/ui components
├── lib/
│   ├── api.ts           # API client (fetch wrappers)
│   └── utils.ts         # Utilities
└── types/
    └── api.ts           # TypeScript interfaces

State management:

TanStack Query (React Query) manages server state - search results, sources, status. It handles caching, refetching, and invalidation automatically.

React hooks (useState, useEffect) manage local UI state - form inputs, modals, etc.

No global state library needed. Server state lives in TanStack Query, UI state in component hooks.

Performance Considerations¶

Incremental indexing is the most important optimization. Always check indexed_files before reprocessing.

Extractor timeouts prevent hanging on corrupt or huge files. Default is 30 seconds for PDFs, 5 seconds for text.

Meilisearch batching - Send documents in batches of 100-1000 for efficiency, not one at a time.

Read-only mounts - Recommend :ro flag on Docker volumes. OneSearch only reads files, never writes.

Deployment¶

The unified Docker image contains everything: - nginx (compiled frontend) - uvicorn (backend) - managed Meilisearch - CLI tool - runtime dependencies

Supervisord manages nginx, uvicorn, and Meilisearch. One container, simple deployment. Legacy external-Meilisearch installs can still run the search engine separately when needed.

Next Steps¶

Want to contribute? Check out:

Backend Development - How to develop the backend
Frontend Development - How to develop the frontend
Adding Extractors - Add support for new file types
Contributing Guide - General contribution guidelines