Sift

Indexed grep for codebases. Build an index once, then search with a grep-like CLI or the sift-core library, up to 60x faster than ripgrep on indexed queries.

Quick Start

Install (GitHub Release)

curl -fsSL https://raw.githubusercontent.com/botirk38/sift/master/scripts/install.sh | sh

Installs to $HOME/.local/bin/sift. Override with PREFIX=/usr/local.

Updating

sift update

Or re-run the install script (same as a fresh install over the existing binary).

From Source

cargo build --release -p sift-grep
./target/release/sift --sift-dir .sift index build /path/to/corpus
./target/release/sift --sift-dir .sift "pattern"

Patterns use Rust regex syntax by default. Use -F for fixed strings, -- to disambiguate from subcommands (e.g. sift -- index build).

Architecture

sift/
├── crates/
│   ├── core/           # sift-core: index-backed query planner and search engine
│   └── cli/            # sift-cli: grep-like CLI over sift-core
├── fuzz/               # cargo-fuzz targets (standalone, nightly)
├── benchsuite/         # rg vs sift comparative benchmarks
├── scripts/            # bench.sh, fuzz.sh, install.sh
├── skills/             # Agent skill for searching with sift (npx skills)
└── docs/               # Performance snapshots and compatibility matrix

Crates

Crate	Package	Description
`crates/core`	`sift-core`	Index-backed query planner, candidate narrowing, and parallel search engine
`crates/cli`	`sift-cli`	`sift` binary with ripgrep-compatible flags
`fuzz/`	n/a	LibFuzzer targets for `sift-core` (excluded from workspace)

How It Works

Sift uses on-disk indexes to skip files that cannot match your query. The core idea is simple: build indexes based on your workload, then let the planner use them to narrow candidates before running the regex engine.

Build: walk the corpus respecting .gitignore rules, extract indexable features from every file, and persist them as memory-mapped tables. The shipped index type is a trigram index, which records overlapping 3-byte sequences from each file.
Plan: extract required literals from the regex pattern, decompose them into index-friendly terms, and intersect posting lists to narrow the candidate set.
Search: scan only candidate files with the full regex engine, optionally parallelized via Rayon when the candidate count justifies it.

Queries with index hits skip most of the corpus entirely. Full-scan fallback (e.g. \p{Greek}) still matches ripgrep performance.

The SearchIndex trait makes the system pluggable. New index kinds (suffix arrays, symbol tables, etc.) can be added alongside the trigram index, and the Indexes registry will intersect their candidate sets for even tighter narrowing.

Performance

Benchsuite snapshot against the Linux kernel corpus:

Search Class	Speedup vs `rg`	Mechanism
Indexed literals	~60x	Index narrowing eliminates most files
Indexed word matches	~60x	Whole-word literal shaping stays cheap
Indexed alternation	~31x	Multi-arm candidate narrowing
Full-scan Unicode	~1.0x	Near parity, regex engine scans
Full-scan no-literal	~1.1x	Comparable full-scan performance

Correctness parity: 11/11 benchmarks. See crates/core/benches/README.md for the full benchmark and profiling workflow, and benchsuite/ for the comparative suite.

Differences from ripgrep

Requires a prior index (sift index build) before searching; refresh with sift index update.
Search paths must sit under the indexed corpus root.
Uses --no-filename instead of -h (which is help).

See docs/rg-compat-matrix.md for the full flag compatibility matrix.

Requirements

Component	Version
Rust	2024 edition (stable)
OS	Linux, macOS, Windows (CI-tested)

Development

cargo fmt --all -- --check
cargo clippy --workspace --all-targets --all-features -- -D warnings
cargo test --workspace --all-features

CI runs fmt, clippy (-D warnings), and tests on Linux, macOS, and Windows. See .github/workflows/ci.yml.

License

MIT OR Apache-2.0. See Cargo.toml.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sift

Quick Start

Install (GitHub Release)

Updating

From Source

Architecture

Crates

How It Works

Performance

Differences from ripgrep

Requirements

Development

License

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 153 Commits
.cargo		.cargo
.github		.github
benchsuite		benchsuite
crates		crates
docs		docs
fuzz		fuzz
scripts		scripts
skills		skills
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md
cliff.toml		cliff.toml

Folders and files

Latest commit

History

Repository files navigation

Sift

Quick Start

Install (GitHub Release)

Updating

From Source

Architecture

Crates

How It Works

Performance

Differences from ripgrep

Requirements

Development

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages