Backend Development¶
The OneSearch backend is built with FastAPI and Python 3.11+.
Getting Started¶
Prerequisites¶
- Python 3.11 or later
- uv (recommended) or pip
- Docker + Docker Compose (for Meilisearch)
Initial Setup¶
Clone the repository and set up the backend:
Install dependencies using uv (faster than pip):
This creates a .venv directory and installs all dependencies from pyproject.toml.
Or use pip if you prefer:
Start Meilisearch¶
The backend needs Meilisearch running:
Set your Meilisearch master key in .env:
Run the Development Server¶
Or if your venv is activated:
The --reload flag enables auto-reload on code changes.
API docs available at http://localhost:8000/docs
Project Structure¶
backend/
├── app/
│ ├── main.py # FastAPI app, startup, CORS
│ ├── config.py # Settings from environment
│ ├── models.py # SQLAlchemy ORM models
│ ├── schemas.py # Pydantic request/response schemas
│ ├── api/ # API endpoints
│ │ ├── search.py # POST /api/search
│ │ ├── sources.py # CRUD for /api/sources
│ │ └── status.py # GET /api/health, /api/status
│ ├── services/ # Business logic
│ │ ├── indexer.py # Orchestrates indexing
│ │ ├── scanner.py # File system walker
│ │ └── search.py # Meilisearch wrapper
│ ├── extractors/ # Document parsers
│ │ ├── base.py # BaseExtractor abstract class
│ │ ├── text.py # Text files
│ │ ├── markdown.py # Markdown
│ │ ├── pdf.py # PDFs
│ │ └── office.py # Office documents
│ └── db/
│ └── database.py # SQLAlchemy setup
├── tests/ # Tests
├── alembic/ # Database migrations
├── pyproject.toml # Dependencies (uv/pip)
└── uv.lock # Lock file (commit this!)
Development Workflow¶
Making Changes¶
-
Create a feature branch:
-
Make your changes
-
Run tests:
-
Commit and push:
-
Create a pull request
Adding Dependencies¶
Use uv to add packages:
This updates pyproject.toml and uv.lock. Always commit the lock file.
Key Concepts¶
API Endpoints¶
API routes live in app/api/. Each module handles a resource:
search.py - Search endpoint
sources.py - Source CRUD operations
status.py - Health checks and status
Example endpoint structure:
from fastapi import APIRouter, Depends
from sqlalchemy.orm import Session
from ..db.database import get_db
from ..schemas import SourceCreate, SourceResponse
from ..services import source_service
router = APIRouter(prefix="/api/sources", tags=["sources"])
@router.post("/", response_model=SourceResponse)
def create_source(
source: SourceCreate,
db: Session = Depends(get_db)
):
return source_service.create(db, source)
Use Pydantic schemas for validation, dependency injection for database sessions, and FastAPI exceptions for errors.
Services¶
Business logic lives in app/services/. Keep API routes thin - they should just validate input and call service functions.
indexer.py orchestrates the indexing process: 1. Scan directories 2. Extract content 3. Send to Meilisearch 4. Update metadata
scanner.py walks the file system and applies glob patterns.
search.py wraps the Meilisearch client.
Extractors¶
Extractors parse file content. All inherit from BaseExtractor:
from abc import ABC, abstractmethod
from ..schemas import Document
class BaseExtractor(ABC):
@abstractmethod
async def extract(self, file_path: str) -> Document:
pass
Example extractor:
class TextExtractor(BaseExtractor):
async def extract(self, file_path: str) -> Document:
# Read file with timeout
# Detect encoding
# Return normalized Document
return Document(
path=file_path,
content=content,
type="text",
# ... other fields
)
All extractors share the same interface, so error handling and timeout protection live in one place and adding new file types is just a new class.
Database Models¶
SQLAlchemy models in app/models.py:
class Source(Base):
__tablename__ = "sources"
id = Column(String, primary_key=True)
name = Column(String, nullable=False)
root_path = Column(String, nullable=False)
# ...
vs. Pydantic schemas in app/schemas.py:
class SourceCreate(BaseModel):
name: str
root_path: str
# ...
class SourceResponse(BaseModel):
id: str
name: str
# ...
Models define database structure, schemas define API contracts. They change for different reasons so they stay separate.
Database Migrations¶
OneSearch uses Alembic for migrations.
Create a migration after changing models:
Review the generated file in alembic/versions/. Alembic can't detect everything, so check it.
Apply migrations:
Testing¶
Run all tests:
Run specific tests:
Verbose output:
With coverage:
Writing Tests¶
Use pytest fixtures for setup:
import pytest
from app.extractors.text import TextExtractor
@pytest.fixture
def sample_text_file(tmp_path):
file_path = tmp_path / "sample.txt"
file_path.write_text("Sample content")
return str(file_path)
def test_text_extractor(sample_text_file):
extractor = TextExtractor(source_id="test", source_name="Test")
doc = extractor.extract(sample_text_file)
assert doc.content == "Sample content"
assert doc.source_id == "test"
Mock external services rather than calling a live Meilisearch instance in unit tests:
from unittest.mock import AsyncMock
def test_search_service_handles_empty_results():
meili = AsyncMock()
meili.search.return_value = {
"hits": [],
"estimatedTotalHits": 0,
"processingTimeMs": 1,
}
# Pass the mock into the service under test and assert the app-level response.
Raw Meilisearch responses use hits; OneSearch API responses use results.
Common Tasks¶
Adding a New Endpoint¶
- Define Pydantic schemas in
schemas.py - Implement route handler in
api/ - Add service logic in
services/ - Write tests
- Update API docs if needed
Adding a New Extractor¶
See Adding File Extractors for a complete guide.
Quick version:
1. Create extractor class in extractors/
2. Inherit from BaseExtractor
3. Implement extract() method
4. Register with the extractor registry
5. Add tests with sample files
Debugging¶
Use FastAPI's built-in logging:
import logging
logger = logging.getLogger(__name__)
logger.debug("Debug message")
logger.info("Info message")
logger.error("Error message")
Set LOG_LEVEL=DEBUG in .env to see all logs.
Or use Python debugger:
Code Style¶
Follow PEP 8. Use type hints:
Keep functions small and focused. Extract complex logic into helper functions.
Don't over-comment obvious code. Comment WHY, not WHAT.
Performance Tips¶
Async where it matters - File I/O and network calls benefit from async. Pure Python computation doesn't.
Batch operations - Send documents to Meilisearch in batches, not one at a time.
Database connections - Use dependency injection to manage sessions properly.
Timeouts - Always set timeouts on external calls and file operations.
Troubleshooting¶
Import errors¶
Make sure you're in the venv:
Or use uv to run directly:
Database migrations failing¶
Check current version:
Reset and reapply:
Meilisearch connection errors¶
Verify Meilisearch is running:
Check the master key matches in both .env and docker-compose.yml.
Next Steps¶
- Architecture - Understand the system design
- Frontend Development - Work on the web UI
- Adding Extractors - Add file format support
- Contributing - Contribution guidelines