Skip to content

Add RocksDB L0/L1 compression config#4865

Draft
pcholakov wants to merge 1 commit into
mainfrom
pavel/rocksdb-l0-l1-compression
Draft

Add RocksDB L0/L1 compression config#4865
pcholakov wants to merge 1 commit into
mainfrom
pavel/rocksdb-l0-l1-compression

Conversation

@pcholakov
Copy link
Copy Markdown
Contributor

@pcholakov pcholakov commented Jun 2, 2026

Summary

  • Add rocksdb-l0-l1-compression to select the RocksDB compression algorithm for L0/L1 SST files.
  • Support zstd, lz4, snappy, and none.
  • Keep L2+ SST files on ZSTD.
  • Preserve the original rocksdb-disable-l0-l1-compression key for backwards compatibility.

Example

Global L0/L1 compression configuration:

rocksdb-l0-l1-compression = "lz4"

Compatibility

  • Existing deployments keep the current ZSTD default.
  • Existing configs with rocksdb-disable-l0-l1-compression = true still disable L0/L1 compression.
  • Existing configs with rocksdb-disable-l0-l1-compression = false still use ZSTD for L0/L1 compression.
  • If rocksdb-disable-l0-l1-compression and rocksdb-l0-l1-compression are both set in the same config block, the original rocksdb-disable-l0-l1-compression key takes precedence.

Validation

  • cargo check
  • cargo nextest run --all-features
  • cargo fmt --all -- --check
  • cargo clippy --all-features --all-targets --workspace -- -D warnings

@pcholakov pcholakov force-pushed the pavel/rocksdb-l0-l1-compression branch from dd36405 to 652ec74 Compare June 2, 2026 20:22
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 2, 2026

Test Results

  8 files  ±0    8 suites  ±0   4m 55s ⏱️ +13s
 60 tests ±0   60 ✅ ±0  0 💤 ±0  0 ❌ ±0 
267 runs  ±0  267 ✅ ±0  0 💤 ±0  0 ❌ ±0 

Results for commit b26bfeb. ± Comparison against base commit 2ef318f.

♻️ This comment has been updated with latest results.

@pcholakov pcholakov force-pushed the pavel/rocksdb-l0-l1-compression branch from 652ec74 to 583b3a4 Compare June 3, 2026 10:51
@pcholakov
Copy link
Copy Markdown
Contributor Author

Ran a per-algorithm benchmark of this on a 3×24-CPU GCP cell — sharing the performance observations.

Workload regime

A parametric agent load-gen (agent.run_turn): 8k concurrent suspending turns, 4 steps × 4 KiB ~4×-compressible state writes per turn (repeated-base pattern — pure-random would be incompressible and the disk axis meaningless), sustained. One arm per algorithm: set rocksdb-l0-l1-compression → roll → 5 min warm-up (so L0/L1 is rewritten with the algo) → 10 min steady measure → full drain → next. Metrics are rate[2m] over each arm's steady window.

Results

rocksdb-l0-l1-compression commit/s server CPU (of 72) disk write MiB/s RocksDB write-stall
none 68.8k 18.4c 1491 0.85 s/s
snappy 98.9k 31.8c 1121 0.03 s/s
lz4 100.2k 31.7c 1168 0
zstd 100.2k 37.6c 1107 0

CPU + RocksDB throughput across the four arms
summary: none saturates disk; lz4 = zstd result at ~16% less CPU

Observations

  • none saturates the disk even at this load (0.85 s/s stall) → ~30% lower throughput (69k vs 100k), writing the most — ~2× the bytes per commit vs the compressed arms.
  • Any compression removes the disk bottleneck here (stall → 0, full ~100k sustained).
  • lz4 is the standout: it matches zstd's throughput and zero stall at ~16% less CPU (31.7 vs 37.6 cores). zstd writes marginally fewer bytes for the most CPU; snappy ≈ lz4 on CPU but leaves a little residual stall. On ~4×-compressible data, zstd's extra compression buys little disk saving for a real CPU premium — so the ability to pick lz4 (this PR) is genuinely valuable for write-heavy/disk-bound cells.

Caveat for whoever sets the default

This is a disk-pressured regime, where fewer-bytes wins. On a CPU-bound cell the ranking flips (an earlier 8-CPU test showed disabling compression won outright). So there's no universal winner — it's a CPU↔disk trade, which is exactly why per-algorithm selection is the right call. zstd is a safe default for disk-bound; lz4 is the better balance when CPU is also contended; none only when CPU-starved with disk headroom.

(Synthetic ~4×-compressible payload; real agent JSON may compress more and nudge zstd up a bit.)

@pcholakov
Copy link
Copy Markdown
Contributor Author

Follow-up: disk-saturation A/B (zstd vs lz4) — the optimal codec flips with the bottleneck

My earlier benchmark compared the four codecs at a sustainable, sub-saturating load (equal
throughput, comparing CPU vs disk-IO cost), and I noted "lz4 ≈ zstd at ~16% less CPU." That framing
favors lz4 — but only when CPU is the scarce resource. I re-ran zstd vs lz4 pushed to write
saturation
(Connor-style mix ramped 10k → 22k resident, two freshly-wiped arms) to see which codec
gives more headroom when the disk is the binder. The answer reverses.

zstd vs lz4 at disk saturation

Per-stage averages (3×24-CPU nodes, provisioned-bandwidth SSD ~400 MiB/s/disk):

concurrency codec commit/s CPU (of 72) disk write /pod RocksDB LSM write /pod
14k zstd 287k 43c 345 MiB/s 143 MiB/s
14k lz4 294k 38c 401 MiB/s 212 MiB/s
18k zstd 282k 47c 457 MiB/s 179 MiB/s
18k lz4 280k 39c 456 MiB/s 290 MiB/s

Observations

  • This deployment is disk-write-bound: throughput plateaus ~280–294k commits/s while per-pod disk
    write climbs to ~450–460 MiB/s (at the provisioned bandwidth), and CPU never exceeds ~48/72. The
    binder is per-disk write bandwidth, not CPU.
  • zstd and lz4 reach the same commit ceiling. lz4 uses ~10–18% less CPU but writes ~50–60%
    more bytes
    to disk (it compresses less — at 18k, lz4 = 82% of zstd's CPU but 162% of its LSM bytes).
  • So when disk bandwidth is scarce and CPU is abundant (large-core cloud nodes on provisioned-IOPS
    disks), zstd wins — tighter compression = more concurrency headroom before the write wall.
    When CPU is scarce (small / few-core nodes), lz4 wins.

Takeaway for this PR: there's no single best default across deployment shapes — the optimum codec
flips depending on whether the cell is disk-bound or CPU-bound. That's exactly what makes per-algorithm
selection worth having. If a default is needed, zstd is the safer pick for the typical large-core /
bandwidth-capped-disk cloud deployment
; lz4 is the better opt-in for CPU-constrained nodes.

@pcholakov pcholakov force-pushed the pavel/rocksdb-l0-l1-compression branch from 583b3a4 to b26bfeb Compare June 4, 2026 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant