Understanding Indexing¶

Indexing is the scan-and-extract step that turns files into searchable documents.

A normal run does three things:

Walk the source directory.
Skip files that do not match the source patterns.
Extract text and metadata from new or changed files.

Deleted files are removed from the search index during the same pass.

Incremental indexing¶

Reindexing is incremental by default. OneSearch tracks each file's path, modified time, size, and status. If a file has not changed, it is skipped.

That makes regular indexing cheap after the first scan. A big source might take a while the first time, then finish quickly on later runs.

Run it from the web UI with Admin → Sources → Reindex, or from the CLI:

onesearch source reindex documents

Full reindex¶

A full reindex clears OneSearch's indexed metadata for that source and rebuilds it from disk.

Use it when:

you changed indexing settings and want old documents refreshed
you migrated Meilisearch data
search results look stale or inconsistent
an extractor changed and you want everything reprocessed

CLI:

onesearch source reindex documents --full

API:

curl -X POST "http://localhost:8000/api/sources/documents/reindex?full=true" \
  -H "Authorization: Bearer $TOKEN"

What gets indexed¶

Supported formats get full text extraction where possible. PDFs, Office docs, Markdown, text, code, RTF, EPUB, subtitles, comics, images, RAW photos, and media files all have dedicated handling.

Unsupported files can still be indexed as metadata-only entries if Admin → Settings → Indexing → Unsupported files is set to metadata-only. That makes filenames and paths searchable even when OneSearch cannot read the file contents.

Failures and skipped files¶

A failed file does not stop the whole source. OneSearch records the error and keeps going.

Common reasons:

file is too large for the configured limit
file disappeared while indexing was running
extractor timed out
file is corrupt or encrypted
source mount is no longer available

Check Admin → Status for failed file details.

Schedules¶

Sources can be manual-only or scheduled. Presets are hourly, daily, and weekly, with custom cron available if you need it.

Schedules run incremental indexing. If a source is already being indexed, another run for that source is skipped instead of running two scans at once.

See Scheduling for cron examples.

Changing indexing settings

Settings such as unsupported-file behavior, RAW metadata extraction, GPS metadata, and extraction size limits apply to future indexing. Run a full reindex if existing documents need to be refreshed with new settings.