Backend Development¶

The OneSearch backend is built with FastAPI and Python 3.11+.

Getting Started¶

Prerequisites¶

Python 3.11 or later
uv (recommended) or pip
Docker + Docker Compose (for Meilisearch)

Initial Setup¶

Clone the repository and set up the backend:

git clone https://github.com/demigodmode/OneSearch.git
cd OneSearch/backend

Install dependencies using uv (faster than pip):

uv sync

This creates a .venv directory and installs all dependencies from pyproject.toml.

Or use pip if you prefer:

python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install -e .

Start Meilisearch¶

The backend needs Meilisearch running:

docker-compose up -d meilisearch

Set your Meilisearch master key in .env:

MEILI_MASTER_KEY=your-key-here

Run the Development Server¶

uv run uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Or if your venv is activated:

uvicorn app.main:app --reload

The --reload flag enables auto-reload on code changes.

API docs available at http://localhost:8000/docs

Project Structure¶

backend/
├── app/
│   ├── main.py              # FastAPI app, startup, CORS
│   ├── config.py            # Settings from environment
│   ├── models.py            # SQLAlchemy ORM models
│   ├── schemas.py           # Pydantic request/response schemas
│   ├── api/                 # API endpoints
│   │   ├── search.py        # POST /api/search
│   │   ├── sources.py       # CRUD for /api/sources
│   │   └── status.py        # GET /api/health, /api/status
│   ├── services/            # Business logic
│   │   ├── indexer.py       # Orchestrates indexing
│   │   ├── scanner.py       # File system walker
│   │   └── search.py        # Meilisearch wrapper
│   ├── extractors/          # Document parsers
│   │   ├── base.py          # BaseExtractor abstract class
│   │   ├── text.py          # Text files
│   │   ├── markdown.py      # Markdown
│   │   ├── pdf.py           # PDFs
│   │   └── office.py        # Office documents
│   └── db/
│       └── database.py      # SQLAlchemy setup
├── tests/                   # Tests
├── alembic/                 # Database migrations
├── pyproject.toml           # Dependencies (uv/pip)
└── uv.lock                  # Lock file (commit this!)

Development Workflow¶

Making Changes¶

Create a feature branch:
```
git checkout -b feature/your-feature
```
Make your changes
Run tests:
```
uv run pytest
```

Commit and push:

git add .
git commit -m "add feature description"
git push origin feature/your-feature

Create a pull request

Adding Dependencies¶

Use uv to add packages:

# Regular dependency
uv add package-name

# Development dependency
uv add --dev package-name

This updates pyproject.toml and uv.lock. Always commit the lock file.

Key Concepts¶

API Endpoints¶

API routes live in app/api/. Each module handles a resource:

search.py - Search endpoint

sources.py - Source CRUD operations

status.py - Health checks and status

Example endpoint structure:

from fastapi import APIRouter, Depends
from sqlalchemy.orm import Session
from ..db.database import get_db
from ..schemas import SourceCreate, SourceResponse
from ..services import source_service

router = APIRouter(prefix="/api/sources", tags=["sources"])

@router.post("/", response_model=SourceResponse)
def create_source(
    source: SourceCreate,
    db: Session = Depends(get_db)
):
    return source_service.create(db, source)

Use Pydantic schemas for validation, dependency injection for database sessions, and FastAPI exceptions for errors.

Services¶

Business logic lives in app/services/. Keep API routes thin - they should just validate input and call service functions.

indexer.py orchestrates the indexing process: 1. Scan directories 2. Extract content 3. Send to Meilisearch 4. Update metadata

scanner.py walks the file system and applies glob patterns.

search.py wraps the Meilisearch client.

Extractors¶

Extractors parse file content. All inherit from BaseExtractor:

from abc import ABC, abstractmethod
from ..schemas import Document

class BaseExtractor(ABC):
    @abstractmethod
    async def extract(self, file_path: str) -> Document:
        pass

Example extractor:

class TextExtractor(BaseExtractor):
    async def extract(self, file_path: str) -> Document:
        # Read file with timeout
        # Detect encoding
        # Return normalized Document
        return Document(
            path=file_path,
            content=content,
            type="text",
            # ... other fields
        )

All extractors share the same interface, so error handling and timeout protection live in one place and adding new file types is just a new class.

Database Models¶

SQLAlchemy models in app/models.py:

class Source(Base):
    __tablename__ = "sources"
    id = Column(String, primary_key=True)
    name = Column(String, nullable=False)
    root_path = Column(String, nullable=False)
    # ...

vs. Pydantic schemas in app/schemas.py:

class SourceCreate(BaseModel):
    name: str
    root_path: str
    # ...

class SourceResponse(BaseModel):
    id: str
    name: str
    # ...

Models define database structure, schemas define API contracts. They change for different reasons so they stay separate.

Database Migrations¶

OneSearch uses Alembic for migrations.

Create a migration after changing models:

uv run alembic revision --autogenerate -m "add new column"

Review the generated file in alembic/versions/. Alembic can't detect everything, so check it.

Apply migrations:

uv run alembic upgrade head

Testing¶

Run all tests:

uv run pytest

Run specific tests:

uv run pytest tests/test_extractors.py
uv run pytest tests/test_api.py::test_search

Verbose output:

uv run pytest -v

With coverage:

uv run pytest --cov=app

Writing Tests¶

Use pytest fixtures for setup:

import pytest

from app.extractors.text import TextExtractor


@pytest.fixture
def sample_text_file(tmp_path):
    file_path = tmp_path / "sample.txt"
    file_path.write_text("Sample content")
    return str(file_path)


def test_text_extractor(sample_text_file):
    extractor = TextExtractor(source_id="test", source_name="Test")
    doc = extractor.extract(sample_text_file)
    assert doc.content == "Sample content"
    assert doc.source_id == "test"

Mock external services rather than calling a live Meilisearch instance in unit tests:

from unittest.mock import AsyncMock


def test_search_service_handles_empty_results():
    meili = AsyncMock()
    meili.search.return_value = {
        "hits": [],
        "estimatedTotalHits": 0,
        "processingTimeMs": 1,
    }
    # Pass the mock into the service under test and assert the app-level response.

Raw Meilisearch responses use hits; OneSearch API responses use results.

Common Tasks¶

Adding a New Endpoint¶

Define Pydantic schemas in schemas.py
Implement route handler in api/
Add service logic in services/
Write tests
Update API docs if needed

Adding a New Extractor¶

See Adding File Extractors for a complete guide.

Quick version: 1. Create extractor class in extractors/ 2. Inherit from BaseExtractor 3. Implement extract() method 4. Register with the extractor registry 5. Add tests with sample files

Debugging¶

Use FastAPI's built-in logging:

import logging
logger = logging.getLogger(__name__)

logger.debug("Debug message")
logger.info("Info message")
logger.error("Error message")

Set LOG_LEVEL=DEBUG in .env to see all logs.

Or use Python debugger:

import pdb; pdb.set_trace()

Code Style¶

Follow PEP 8. Use type hints:

def process_file(path: str, source_id: str) -> Document:
    # ...

Keep functions small and focused. Extract complex logic into helper functions.

Don't over-comment obvious code. Comment WHY, not WHAT.

Performance Tips¶

Async where it matters - File I/O and network calls benefit from async. Pure Python computation doesn't.

Batch operations - Send documents to Meilisearch in batches, not one at a time.

Database connections - Use dependency injection to manage sessions properly.

Timeouts - Always set timeouts on external calls and file operations.

Troubleshooting¶

Import errors¶

Make sure you're in the venv:

source .venv/bin/activate  # Unix
.venv\Scripts\activate     # Windows

Or use uv to run directly:

uv run uvicorn ...

Database migrations failing¶

Check current version:

uv run alembic current

Reset and reapply:

uv run alembic downgrade base
uv run alembic upgrade head

Meilisearch connection errors¶

Verify Meilisearch is running:

docker-compose ps meilisearch

Check the master key matches in both .env and docker-compose.yml.

Next Steps¶

Architecture - Understand the system design
Frontend Development - Work on the web UI
Adding Extractors - Add file format support
Contributing - Contribution guidelines