Skip to content

hash-anu/snkv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

278 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SNKV Logo

SNKV

A simple, crash-safe embedded key-value store

SNKV — a simple, crash-safe embedded key-value store

Build Memory Leaks Tests Peak Memory GitHub Issues GitHub Closed Issues Discord License


What is SNKV?

SNKV is a lightweight, ACID-compliant embedded key-value store built directly on SQLite's storage engine — without SQL layer.

The idea: bypass the SQL layer entirely and talk directly to SQLite's storage engine. No SQL parser. No query planner. No virtual machine. Just a clean KV API on top of a proven, battle-tested storage core.

SQLite-grade reliability. KV-first design. Lower overhead for read-heavy and mixed key-value workloads.


Quick Start

Single-header integration — drop it in and go:

#define SNKV_IMPLEMENTATION
#include "snkv.h"

int main(void) {
    KVStore *db;
    kvstore_open("mydb.db", &db, KVSTORE_JOURNAL_WAL);

    kvstore_put(db, "key", 3, "value", 5);

    void *val; int len;
    kvstore_get(db, "key", 3, &val, &len);
    printf("%.*s\n", len, (char*)val);
    snkv_free(val);

    kvstore_close(db);
}

C/C++ API Reference: hash-anu.github.io/snkv/api.html


TTL — Native Key Expiry

SNKV has built-in TTL (Time-To-Live) support. Keys expire automatically — no background thread, no cron job. Expiry is enforced lazily on access and can be bulk-purged on demand.

/* C API */
int64_t expire_ms = kvstore_now_ms() + 60000;  /* 60 seconds from now */
kvstore_put_ttl(db, "session", 7, "tok123", 6, expire_ms);

void *val; int len; int64_t remaining_ms;
kvstore_get_ttl(db, "session", 7, &val, &len, &remaining_ms);

int n;
kvstore_purge_expired(db, &n);  /* bulk-delete all expired keys */
# Python API
db.put(b"session", b"tok123", ttl=60)      # expires in 60 seconds
db[b"token", 30] = b"bearer-xyz"           # dict-style TTL

val = db.get(b"session")                   # None if expired
try:
    remaining = db.ttl(b"session")         # seconds remaining, or None if no expiry
except NotFoundError:
    remaining = None                        # key does not exist
n = db.purge_expired()                     # bulk-delete expired keys

TTL is supported on both the default store and on column families. Expired keys are lazily deleted on get() and exists() — consistent results without a background thread.


Encryption

SNKV has built-in at-rest encryption. Every value is encrypted with XChaCha20-Poly1305; passwords are stretched with Argon2id. All existing APIs work transparently on encrypted stores.

/* C API */
KVStore *db;
kvstore_open_encrypted("mydb.db", "hunter2", 7, &db, NULL);

kvstore_put(db, "secret", 6, "classified", 10);   /* encrypted transparently */

void *val; int len;
kvstore_get(db, "secret", 6, &val, &len);          /* decrypted on read */
snkv_free(val);

kvstore_reencrypt(db, "new-pass", 8);              /* change password in-place */
kvstore_close(db);
# Python API
with KVStore.open_encrypted("mydb.db", b"hunter2") as db:
    db[b"secret"] = b"classified"       # encrypted transparently
    print(db[b"secret"])                # b"classified" — decrypted on read
    db.reencrypt(b"new-pass")           # change password in-place

Cryptographic details: XChaCha20-Poly1305 per value · Argon2id KDF (64 MB, 3 iterations) · 40-byte overhead per value · encryption key wiped from memory on close · wrong password returns KVSTORE_AUTH_FAILED / raises AuthError.


Vector Search (C API)

SNKV includes a native C vector search layer built on usearch's HNSW index. Vectors and KV data share the same .db file. The index is rebuilt from the database on open and optionally saved to a .usearch sidecar for fast reload.

#include "kvstore_vec.h"

/* Open (or create) a 128-dim cosine store */
KVVecStore *vs = NULL;
kvstore_vec_open("store.db", 128, KVVEC_SPACE_COSINE,
                 0, 0, 0, KVVEC_DTYPE_F32, NULL, 0, &vs);

/* Insert key + value + vector (+ optional JSON metadata) */
float vec[128] = { /* ... */ };
kvstore_vec_put(vs, "doc:1", 5, "hello world", 11,
                vec, 0, "{\"tag\":\"ai\"}", 12);

/* Approximate nearest-neighbour search */
KVVecSearchResult *res = NULL; int n = 0;
kvstore_vec_search(vs, query, /*top_k=*/5, /*rerank=*/0,
                   /*oversample=*/0, /*max_dist=*/0.0f, &res, &n);
for (int i = 0; i < n; i++)
    printf("%.*s  dist=%.4f\n", res[i].nKey, (char*)res[i].pKey, res[i].distance);
kvstore_vec_free_results(res, n);

kvstore_vec_close(vs);

Distance spaces

Constant Description
KVVEC_SPACE_L2 Squared Euclidean (‖a−b‖²) — not sqrt; distances are comparable but not metric L2
KVVEC_SPACE_COSINE Cosine distance (1 − dot(a,b) / (‖a‖·‖b‖))
KVVEC_SPACE_IP Inner product (negative dot product)

Index precision

Constant RAM usage Notes
KVVEC_DTYPE_F32 Full Default
KVVEC_DTYPE_F16 Half Negligible recall loss
KVVEC_DTYPE_I8 Quarter Cosine-like metrics only

Key features

  • Atomic writes — every put is one KVStore transaction (5 internal CFs); usearch updated after commit
  • Exact rerank — pass rerank=1 to fetch oversample×top_k candidates and re-score with exact float32 distances
  • TTL — pass expire_ms > 0 to kvstore_vec_put; expired vectors are lazily evicted on search/get and bulk-removable via kvstore_vec_purge_expired
  • Sidecar persistenceclose saves the HNSW graph to {path}.usearch; open loads it in O(1) instead of an O(n·dim) rebuild (disabled for encrypted stores)
  • Encryption — pass a password to kvstore_vec_open; values and vectors are encrypted, sidecar is disabled
  • Batch insertskvstore_vec_put_batch writes N items in one atomic transaction

Build

The vector layer requires g++ to compile the usearch C++ core:

make vector               # builds libsnkv_vec.a  (core + usearch)
make vector-examples      # compiles examples/vector.c
make run-vector-examples  # runs the example
make test-vector          # runs the test suite

See examples/vector.c for a complete walkthrough of every API, and api.html for the full C reference.


Configuration

Use kvstore_open_v2 to control how the store is opened. Zero-initialise the config and set only what you need — unset fields resolve to safe defaults.

KVStoreConfig cfg = {0};
cfg.journalMode = KVSTORE_JOURNAL_WAL;   /* WAL mode (default) */
cfg.syncLevel   = KVSTORE_SYNC_NORMAL;   /* survives process crash (default) */
cfg.cacheSize   = 4000;                  /* ~16 MB page cache (default 2000 ≈ 8 MB) */
cfg.pageSize    = 4096;                  /* DB page size, new DBs only (default 4096) */
cfg.busyTimeout = 5000;                  /* retry 5 s on SQLITE_BUSY (default 0) */
cfg.readOnly    = 0;                     /* read-write (default) */

KVStore *db;
kvstore_open_v2("mydb.db", &db, &cfg);
Field Default Options
journalMode KVSTORE_JOURNAL_WAL KVSTORE_JOURNAL_DELETE
syncLevel KVSTORE_SYNC_NORMAL KVSTORE_SYNC_OFF, KVSTORE_SYNC_FULL
cacheSize 2000 pages (~8 MB) Any positive integer
pageSize 4096 bytes Power of 2, 512–65536; new DBs only
readOnly 0 1 to open read-only
busyTimeout 0 (fail immediately) Milliseconds; useful for multi-process use

kvstore_open remains fully supported and uses all defaults except journalMode.


Installation & Build

Linux / macOS

make                      # builds libsnkv.a (pure gcc, no C++)
make snkv.h               # generates single-header version
make examples             # builds examples
make run-examples         # run all examples
make test                 # run all tests (CI suite)

make vector               # builds libsnkv_vec.a (core + usearch, requires g++)
make vector-examples      # builds examples/vector
make run-vector-examples  # run the vector example
make test-vector          # run the vector test suite

make clean

Windows (MSYS2 / MinGW64)

1. Install MSYS2.

2. Launch "MSYS2 MinGW 64-bit" from the Start menu (not the plain MSYS2 terminal).

3. Install the toolchain:

pacman -S --needed mingw-w64-x86_64-gcc make

4. Clone and build:

git clone https://github.com/hash-anu/snkv.git
cd snkv
make              # builds libsnkv.a
make snkv.h       # generates single-header
make examples     # builds .exe examples
make run-examples
make test

All commands must be run from the MSYS2 MinGW64 shell. Running mingw32-make from a native cmd.exe or PowerShell window will not work — the Makefile relies on sh and standard Unix tools that are only available inside the MSYS2 environment.


Python Bindings

Available on PyPI — no compiler needed:

pip install snkv           # KV store, TTL, encryption, column families
pip install snkv[vector]   # + HNSW vector search (usearch + numpy)
from snkv import KVStore

with KVStore("mydb.db") as db:
    db["hello"] = "world"
    print(db["hello"].decode())   # world

Vector search — integrated HNSW approximate nearest-neighbour index backed by usearch. Vectors and KV data share the same .db file. Supports metadata filtering, exact rerank, TTL on vectors, quantization (f32/f16/i8), sidecar index persistence, and encryption. Available in both C and Python.

from snkv.vector import VectorStore
import numpy as np

with VectorStore("store.db", dim=128, space="cosine") as vs:
    vs.vector_put(b"doc:1", b"hello world", np.random.rand(128).astype("f4"))
    results = vs.search(np.random.rand(128).astype("f4"), top_k=5)
    for r in results:
        print(r.key, r.distance, r.value)

Full documentation — installation, API reference, examples, and thread-safety notes — is in python/README.md.

SNKV Python API Demo


10 GB Crash-Safety Stress Test

A production-scale kill-9 test is included but kept separate from the CI suite. It writes unique deterministic key-value pairs into a 10 GB WAL-mode database, forcibly kills the writer with SIGKILL during active writes, and verifies on restart that every committed transaction is present with byte-exact values, no partial transactions are visible, and the database has zero corruption.

make test-crash-10gb          # run full 5-cycle kill-9 + verify (Linux / macOS)

# individual modes
./tests/test_crash_10gb write  tests/crash_10gb.db   # continuous writer
./tests/test_crash_10gb verify tests/crash_10gb.db   # post-crash verifier
./tests/test_crash_10gb clean  tests/crash_10gb.db   # remove DB files

Requires ~11 GB free disk. run mode is POSIX-only; write and verify work on all platforms.


How It Works

Standard database path:

Application → SQL Parser → Query Planner → VDBE (VM) → B-Tree → Disk

SNKV path:

Application → KV API → B-Tree → Disk

By removing the layers you don't need for key-value workloads, SNKV keeps the proven storage core and cuts the overhead.

Layer SQLite SNKV
SQL Parser
Query Planner
VDBE (VM)
B-Tree Engine
Pager / WAL

Benchmarks

1M records, Linux, averaged across 3 runs. Both SNKV and SQLite use identical settings: WAL mode, synchronous=NORMAL, 2000-page (8 MB) page cache, 4096-byte pages.

Benchmark source: SNKV · SQLite

SNKV vs SQLite (KV workloads)

SQLite benchmark uses WITHOUT ROWID with a BLOB primary key — the fairest possible comparison, both using a single B-tree keyed on the same field. Both run with identical settings: WAL mode, synchronous=NORMAL, 2000-page (8 MB) cache, 4096-byte pages. This isolates the pure cost of the SQL layer for KV operations.

Note: Both SNKV and SQLite (WITHOUT ROWID) use identical peak RSS (~10.8 MB) since they share the same underlying pager and page cache infrastructure.

Benchmark SQLite SNKV Notes
Sequential writes 142K ops/s 232K ops/s SNKV 1.64x faster
Random reads 90K ops/s 160K ops/s SNKV 1.77x faster
Sequential scan 1.56M ops/s 2.89M ops/s SNKV 1.85x faster
Random updates 16K ops/s 31K ops/s SNKV 1.9x faster
Random deletes 16K ops/s 31K ops/s SNKV ~2x faster
Exists checks 93K ops/s 173K ops/s SNKV 1.85x faster
Mixed workload 34K ops/s 62K ops/s SNKV 1.79x faster
Bulk insert 211K ops/s 248K ops/s SNKV 1.17x faster

With identical storage configuration, SNKV wins across every benchmark. The gains come entirely from bypassing the SQL layer — no parsing, no query planner, no VDBE — and a per-column-family cached read cursor that eliminates repeated cursor open/close overhead on the hot read path. Updates and deletes show the biggest gains (~2x) since SQLite must parse, plan, and execute a full SQL statement per operation. Bulk insert is the closest (17%) because both commit a single large B-tree transaction with minimal per-row overhead.


Running your own LMDB / RocksDB comparison

If you want to benchmark SNKV against LMDB or RocksDB, the benchmark harnesses are here:


When to Use SNKV

SNKV is a good fit if:

  • Your workload is read-heavy or mixed (reads + writes)
  • You need native TTL — sessions, rate limiting, caches, OTP codes, leases
  • You're running in a memory-constrained or embedded environment
  • You want a clean KV API without writing SQL strings, preparing statements, and binding parameters
  • You need single-header C integration with no external dependencies
  • You want predictable latency — no compaction stalls, no mmap tuning

Consider alternatives if:

  • You need maximum write/update/delete throughput → RocksDB (LSM-tree)
  • You need maximum read/scan speed and memory isn't a constraint → LMDB (memory-mapped)
  • You already use SQL elsewhere and want to consolidate → SQLite directly

Features

  • ACID Transactions — commit / rollback safety
  • WAL Mode — concurrent readers + single writer
  • Native TTL — per-key expiry with lazy eviction and purge_expired(); no background thread required
  • Encryption — per-value XChaCha20-Poly1305 with Argon2id key derivation; transparent to all existing APIs
  • Column Families — logical namespaces within a single database
  • Iterators — ordered key traversal
  • Thread Safe — built-in synchronization
  • Single-header — drop snkv.h into any C/C++ project
  • Zero memory leaks — verified with Valgrind
  • SSD-friendly — WAL appends sequentially, reducing random writes
  • Python Bindings — idiomatic Python 3.8+ API with dict-style access, TTL, encryption, column families, iterators, and typed exceptions — see python/README.md
  • Vector Search (C) — native HNSW index via make vector; ANN search, exact rerank, TTL, sidecar persistence, encryption, batch insert — see examples/vector.c
  • Vector Search (Python)pip install snkv[vector]; metadata filtering, exact rerank, TTL on vectors, quantization (f32/f16/i8), sidecar persistence — see python/README.md#vector-search
  • Can be cross compiled for Android devices - snkv is embedded database with vector support which has all required features so it can be cross compiled for Android platforms
  • Ideal for AI workloads - SNKV built solely on top of battle tested b-tree engine of sqlite, additionally by default read transaction is enabled, so read operations are faster as well as since it bypass sql layer it removed overhead of row metadata write operations and directly write data to b-tree, Which improves 1.5x speed improvement in mixed workloads.

Backup & Tooling Compatibility

Because SNKV uses SQLite's file format and pager layer, backup tools that operate at the WAL or page level work out of the box:

  • LiteFS — distributed SQLite replication works with SNKV databases
  • WAL-based backup tools — any tool consuming WAL files works correctly
  • Rollback journal tools — journal mode is fully supported

Note: Tools that rely on SQLite's schema layer — like the sqlite3 CLI or DB Browser for SQLite — won't work. SNKV bypasses the schema layer entirely by design.


Internals & Documentation

I documented the SQLite internals explored while building this:


Design Principles

  • Minimalism wins — fewer layers, less overhead
  • Proven foundations — reuse battle-tested storage, don't reinvent it
  • Predictable performance — no hidden query costs, no compaction stalls
  • Honest tradeoffs — SNKV is not the fastest at everything; it's optimized for its target use case

Third-Party Licenses

SNKV embeds the following third-party libraries:

Library Version License Notes
SQLite 3.x (amalgamation subset) Public Domain B-tree, pager, WAL, OS layer
Monocypher 4.x CC0-1.0 (Public Domain) XChaCha20-Poly1305 + Argon2id
usearch ≥ 2.9 Apache 2.0 HNSW vector index (optional — C: make vector, Python: pip install snkv[vector])

SQLite and Monocypher are statically compiled into libsnkv and snkv.h. No dynamic linking or separate installation is required.

SQLite and Monocypher are public domain — no attribution is legally required, but credit is given here in the spirit of good practice. usearch is an optional runtime dependency and is not bundled.


License

Apache License 2.0 © 2025 Hash Anu

Packages

 
 
 

Contributors