Add RocksDB L0/L1 compression config by pcholakov · Pull Request #4865 · restatedev/restate

pcholakov · 2026-06-02T20:11:33Z

Summary

Add rocksdb-l0-l1-compression to select the RocksDB compression algorithm for L0/L1 SST files.
Support zstd, lz4, snappy, and none.
Keep L2+ SST files on ZSTD.
Preserve the original rocksdb-disable-l0-l1-compression key for backwards compatibility.

Example

Global L0/L1 compression configuration:

rocksdb-l0-l1-compression = "lz4"

Compatibility

Existing deployments keep the current ZSTD default.
Existing configs with rocksdb-disable-l0-l1-compression = true still disable L0/L1 compression.
Existing configs with rocksdb-disable-l0-l1-compression = false still use ZSTD for L0/L1 compression.
If rocksdb-disable-l0-l1-compression and rocksdb-l0-l1-compression are both set in the same config block, the original rocksdb-disable-l0-l1-compression key takes precedence.

Validation

cargo check
cargo nextest run --all-features
cargo fmt --all -- --check
cargo clippy --all-features --all-targets --workspace -- -D warnings

github-actions · 2026-06-02T20:49:14Z

Test Results

8 files ±0 8 suites ±0 4m 55s ⏱️ +13s
60 tests ±0 60 ✅ ±0 0 💤 ±0 0 ❌ ±0
267 runs ±0 267 ✅ ±0 0 💤 ±0 0 ❌ ±0

Results for commit b26bfeb. ± Comparison against base commit 2ef318f.

♻️ This comment has been updated with latest results.

pcholakov · 2026-06-03T19:18:44Z

Ran a per-algorithm benchmark of this on a 3×24-CPU GCP cell — sharing the performance observations.

Workload regime

A parametric agent load-gen (agent.run_turn): 8k concurrent suspending turns, 4 steps × 4 KiB ~4×-compressible state writes per turn (repeated-base pattern — pure-random would be incompressible and the disk axis meaningless), sustained. One arm per algorithm: set rocksdb-l0-l1-compression → roll → 5 min warm-up (so L0/L1 is rewritten with the algo) → 10 min steady measure → full drain → next. Metrics are rate[2m] over each arm's steady window.

Results

`rocksdb-l0-l1-compression`	commit/s	server CPU (of 72)	disk write MiB/s	RocksDB write-stall
none	68.8k	18.4c	1491	0.85 s/s
snappy	98.9k	31.8c	1121	0.03 s/s
lz4	100.2k	31.7c	1168	0
zstd	100.2k	37.6c	1107	0

Observations

none saturates the disk even at this load (0.85 s/s stall) → ~30% lower throughput (69k vs 100k), writing the most — ~2× the bytes per commit vs the compressed arms.
Any compression removes the disk bottleneck here (stall → 0, full ~100k sustained).
lz4 is the standout: it matches zstd's throughput and zero stall at ~16% less CPU (31.7 vs 37.6 cores). zstd writes marginally fewer bytes for the most CPU; snappy ≈ lz4 on CPU but leaves a little residual stall. On ~4×-compressible data, zstd's extra compression buys little disk saving for a real CPU premium — so the ability to pick lz4 (this PR) is genuinely valuable for write-heavy/disk-bound cells.

Caveat for whoever sets the default

This is a disk-pressured regime, where fewer-bytes wins. On a CPU-bound cell the ranking flips (an earlier 8-CPU test showed disabling compression won outright). So there's no universal winner — it's a CPU↔disk trade, which is exactly why per-algorithm selection is the right call. zstd is a safe default for disk-bound; lz4 is the better balance when CPU is also contended; none only when CPU-starved with disk headroom.

(Synthetic ~4×-compressible payload; real agent JSON may compress more and nudge zstd up a bit.)

pcholakov · 2026-06-04T03:42:56Z

Follow-up: disk-saturation A/B (zstd vs lz4) — the optimal codec flips with the bottleneck

My earlier benchmark compared the four codecs at a sustainable, sub-saturating load (equal
throughput, comparing CPU vs disk-IO cost), and I noted "lz4 ≈ zstd at ~16% less CPU." That framing
favors lz4 — but only when CPU is the scarce resource. I re-ran zstd vs lz4 pushed to write
saturation (Connor-style mix ramped 10k → 22k resident, two freshly-wiped arms) to see which codec
gives more headroom when the disk is the binder. The answer reverses.

Per-stage averages (3×24-CPU nodes, provisioned-bandwidth SSD ~400 MiB/s/disk):

concurrency	codec	commit/s	CPU (of 72)	disk write /pod	RocksDB LSM write /pod
14k	zstd	287k	43c	345 MiB/s	143 MiB/s
14k	lz4	294k	38c	401 MiB/s	212 MiB/s
18k	zstd	282k	47c	457 MiB/s	179 MiB/s
18k	lz4	280k	39c	456 MiB/s	290 MiB/s

Observations

This deployment is disk-write-bound: throughput plateaus ~280–294k commits/s while per-pod disk
write climbs to ~450–460 MiB/s (at the provisioned bandwidth), and CPU never exceeds ~48/72. The
binder is per-disk write bandwidth, not CPU.
zstd and lz4 reach the same commit ceiling. lz4 uses ~10–18% less CPU but writes ~50–60%
more bytes to disk (it compresses less — at 18k, lz4 = 82% of zstd's CPU but 162% of its LSM bytes).
So when disk bandwidth is scarce and CPU is abundant (large-core cloud nodes on provisioned-IOPS
disks), zstd wins — tighter compression = more concurrency headroom before the write wall.
When CPU is scarce (small / few-core nodes), lz4 wins.

Takeaway for this PR: there's no single best default across deployment shapes — the optimum codec
flips depending on whether the cell is disk-bound or CPU-bound. That's exactly what makes per-algorithm
selection worth having. If a default is needed, zstd is the safer pick for the typical large-core /
bandwidth-capped-disk cloud deployment; lz4 is the better opt-in for CPU-constrained nodes.

pcholakov force-pushed the pavel/rocksdb-l0-l1-compression branch from dd36405 to 652ec74 Compare June 2, 2026 20:22

pcholakov force-pushed the pavel/rocksdb-l0-l1-compression branch from 652ec74 to 583b3a4 Compare June 3, 2026 10:51

Add RocksDB L0/L1 compression config

b26bfeb

pcholakov force-pushed the pavel/rocksdb-l0-l1-compression branch from 583b3a4 to b26bfeb Compare June 4, 2026 09:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add RocksDB L0/L1 compression config#4865

Add RocksDB L0/L1 compression config#4865
pcholakov wants to merge 1 commit into
mainfrom
pavel/rocksdb-l0-l1-compression

pcholakov commented Jun 2, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 2, 2026 •

edited

Loading

Uh oh!

pcholakov commented Jun 3, 2026

Uh oh!

pcholakov commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pcholakov commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Example

Compatibility

Validation

Uh oh!

github-actions Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results

Uh oh!

pcholakov commented Jun 3, 2026

Workload regime

Results

Observations

Caveat for whoever sets the default

Uh oh!

pcholakov commented Jun 4, 2026

Follow-up: disk-saturation A/B (zstd vs lz4) — the optimal codec flips with the bottleneck

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pcholakov commented Jun 2, 2026 •

edited

Loading

github-actions Bot commented Jun 2, 2026 •

edited

Loading