Skip to content

perf(ecstore): replace mmap+spawn_blocking shard reads with pread#27

Open
rdiperri-wasabi wants to merge 3 commits into
mainfrom
perf/productionize-get-path-fixes
Open

perf(ecstore): replace mmap+spawn_blocking shard reads with pread#27
rdiperri-wasabi wants to merge 3 commits into
mainfrom
perf/productionize-get-path-fixes

Conversation

@rdiperri-wasabi

Copy link
Copy Markdown

Fix 1 — consolidate spawn_blocking in read_all_data_with_dmtime: Previously each EC shard read dispatched two thread-pool round-trips (one for the pread/mmap setup, one for the actual read). This patch collapses both into a single blocking closure, eliminating one cross-thread handoff per shard per GET request.

Fix 2 — replace mmap with pread in batch_shard_pread: EC shard reads for object data were using mmap, which causes kernel page-cache pressure for large objects and adds VM fault overhead on each access. Replaced with O_DIRECT-friendly pread calls that read directly into caller-supplied buffers, removing the mmap lifecycle (map, access, unmap) from the hot GET path.

Measured on smc1 (single-node, SAS SSDs, EC:1) vs wasabi baseline at 50 threads, 5-minute warm-cache GET runs:

Small objects (0–1500K, loopback):
4d: 1,562 → 2,109 MB/s (+35%), 22 → 16ms
6d: 906 → 2,437 MB/s (+169%), 40 → 14ms
8d: 662 → 3,072 MB/s (+364%), 54 → 12ms
10d: 472 → 2,458 MB/s (+420%), 76 → 14ms

Large objects (1000–1500K, 100 GbE from smc2):
4d: 1,987 → 2,949 MB/s (+48%), 30 → 19ms
6d: 1,577 → 2,949 MB/s (+87%), 38 → 19ms
8d: 1,270 → 2,867 MB/s (+126%), 48 → 20ms
10d: 824 → 2,785 MB/s (+238%), 73 → 21ms

Also includes perf harness improvements: analyze.py adds per-interface NIC reporting; loadgen-run.sh and run-perf-test.sh add LG_OBJ_SIZE, remote loadgen support (LOADGEN_HOST), and related flags.

Related Issues

Summary of Changes

Verification

Impact

Additional Notes


Thank you for your contribution! Please ensure your PR follows the community standards (CODE_OF_CONDUCT.md). If this is your first contribution, review the CLA document and sign it by commenting I have read and agree to the CLA. on the PR.

Copilot AI review requested due to automatic review settings June 22, 2026 20:09

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes EC shard reads on the GET hot path by replacing mmap-based reads with pread-style reads and reducing spawn_blocking round-trips, while also extending the perf harness to support more workload mixes and richer reporting.

Changes:

  • Switch local “zero-copy” shard reads from mmap to FileExt::read_at and add batched shard reads (batch_shard_pread) to reduce per-shard scheduling overhead.
  • Add detailed GET-path tracing spans around fanout, locks, decode, and pipe stages.
  • Improve perf harness ergonomics (env overrides for op mix/object size, summary detection) and enhance analysis output (per-interface NIC, disk read stats).

Reviewed changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
scripts/perf/run-perf-test.sh Makes loadgen completion detection op-agnostic; enables GET tracing recipe in TRACE mode.
scripts/perf/lib/loadgen-run.sh Adds env-driven workload controls (op mix, object size, resume/save/preload flags).
scripts/perf/analyze/analyze.py Generalizes loadgen parsing across op types; adds disk read metrics and updates report tables.
rustfs/src/app/object_usecase.rs Adds a pre-dispatch GET tracing span around request-context preparation.
crates/ecstore/src/set_disk/read.rs Introduces a batched local-shard read fast-path and instruments shard setup/fanout timing.
crates/ecstore/src/set_disk.rs Adds GET-path spans around lock acquisition, xlmeta read, and spawned read task.
crates/ecstore/src/erasure_coding/decode.rs Instruments shard read, EC decode, and pipe write stages for GET tracing.
crates/ecstore/src/disk/mod.rs Adds a helper to obtain local filesystem paths only for local disks.
crates/ecstore/src/disk/disk_store.rs Exposes local object-path retrieval on the local disk wrapper.
crates/ecstore/src/disk/local.rs Replaces mmap with pread in read_file_zero_copy, consolidates spawn_blocking, adds batch_shard_pread, and adds tests.
crates/ecstore/src/bitrot.rs Extracts shard offset/length adjustment into a helper and updates docs/tests.
crates/ecstore/Cargo.toml Removes memmap2 dependency from ecstore.
Cargo.lock Drops memmap2 from the lockfile for ecstore.

Comment thread scripts/perf/analyze/analyze.py
Comment thread crates/ecstore/src/disk/local.rs
Comment thread crates/ecstore/src/disk/local.rs
Comment thread crates/ecstore/src/set_disk/read.rs Outdated
@rdiperri-wasabi rdiperri-wasabi force-pushed the perf/productionize-get-path-fixes branch 2 times, most recently from b3e8d49 to e9b8f40 Compare June 22, 2026 20:37
Copilot AI review requested due to automatic review settings June 22, 2026 20:37

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 13 changed files in this pull request and generated 3 comments.

Comment thread crates/ecstore/src/set_disk/read.rs Outdated
Comment thread scripts/perf/analyze/analyze.py
Comment thread crates/ecstore/src/bitrot.rs
rdiperri-wasabi added a commit that referenced this pull request Jun 22, 2026
- set_disk/read.rs: wrap unix batch fast-path in #[cfg(unix)] and use
  early return so non-unix targets (Windows) compile without
  batch_shard_pread and adjust_shard_read_params, which are unix-only.
  Sequential fallback is now unconditional.

- disk/local.rs: move should_reclaim_file_cache_after_read() call inside
  the #[cfg(target_os = "linux")] fadvise block, eliminating the unused-
  variable warning on macOS/non-linux unix targets.

- bitrot.rs: tighten adjust_shard_read_params visibility from pub to
  pub(crate); it is only used within this crate.

- analyze.py: fix empty-device placeholder row from 9 columns to 8 to
  match the disk table header, preventing zip-truncation in _md_table().

Co-authored-by: Cursor <cursoragent@cursor.com>
rdiperri-wasabi added a commit that referenced this pull request Jun 22, 2026
- set_disk/read.rs: wrap unix batch fast-path in #[cfg(unix)] and use
  early return so non-unix targets (Windows) compile without
  batch_shard_pread and adjust_shard_read_params, which are unix-only.
  Sequential fallback is now unconditional.

- disk/local.rs: move should_reclaim_file_cache_after_read() call inside
  the #[cfg(target_os = "linux")] fadvise block, eliminating the unused-
  variable warning on macOS/non-linux unix targets.

- bitrot.rs: tighten adjust_shard_read_params visibility from pub to
  pub(crate); it is only used within this crate.

- analyze.py: fix empty-device placeholder row from 9 columns to 8 to
  match the disk table header, preventing zip-truncation in _md_table().

Co-authored-by: Cursor <cursoragent@cursor.com>
@rdiperri-wasabi rdiperri-wasabi force-pushed the perf/productionize-get-path-fixes branch from 3e5c5f4 to e4b135b Compare June 22, 2026 21:42
auto-merge was automatically disabled June 22, 2026 22:53

Pull request was closed

rdiperri-wasabi and others added 2 commits June 25, 2026 18:57
Fix 1 — consolidate spawn_blocking in read_all_data_with_dmtime:
Previously each EC shard read dispatched two thread-pool round-trips
(one for the pread/mmap setup, one for the actual read). This patch
collapses both into a single blocking closure, eliminating one
cross-thread handoff per shard per GET request.

Fix 2 — replace mmap with pread in batch_shard_pread:
EC shard reads for object data were using mmap, which causes kernel
page-cache pressure for large objects and adds VM fault overhead on
each access. Replaced with O_DIRECT-friendly pread calls that read
directly into caller-supplied buffers, removing the mmap lifecycle
(map, access, unmap) from the hot GET path.

Measured on smc1 (single-node, SAS SSDs, EC:1) vs wasabi baseline
at 50 threads, 5-minute warm-cache GET runs:

Small objects (0–1500K, loopback):
  4d: 1,562 → 2,109 MB/s (+35%), 22 → 16ms
  6d:   906 → 2,437 MB/s (+169%), 40 → 14ms
  8d:   662 → 3,072 MB/s (+364%), 54 → 12ms
 10d:   472 → 2,458 MB/s (+420%), 76 → 14ms

Large objects (1000–1500K, 100 GbE from smc2):
  4d: 1,987 → 2,949 MB/s (+48%), 30 → 19ms
  6d: 1,577 → 2,949 MB/s (+87%), 38 → 19ms
  8d: 1,270 → 2,867 MB/s (+126%), 48 → 20ms
 10d:   824 → 2,785 MB/s (+238%), 73 → 21ms

Also includes perf harness improvements: analyze.py adds per-interface
NIC reporting; loadgen-run.sh and run-perf-test.sh add LG_OBJ_SIZE,
remote loadgen support (LOADGEN_HOST), and related flags.

Co-authored-by: Cursor <cursoragent@cursor.com>
- set_disk/read.rs: wrap unix batch fast-path in #[cfg(unix)] and use
  early return so non-unix targets (Windows) compile without
  batch_shard_pread and adjust_shard_read_params, which are unix-only.
  Sequential fallback is now unconditional.

- disk/local.rs: move should_reclaim_file_cache_after_read() call inside
  the #[cfg(target_os = "linux")] fadvise block, eliminating the unused-
  variable warning on macOS/non-linux unix targets.

- bitrot.rs: tighten adjust_shard_read_params visibility from pub to
  pub(crate); it is only used within this crate.

- analyze.py: fix empty-device placeholder row from 9 columns to 8 to
  match the disk table header, preventing zip-truncation in _md_table().

Co-authored-by: Cursor <cursoragent@cursor.com>
@rdiperri-wasabi rdiperri-wasabi force-pushed the perf/productionize-get-path-fixes branch from e4b135b to b044968 Compare June 26, 2026 01:57
Copilot AI review requested due to automatic review settings June 26, 2026 01:57
@github-actions

Copy link
Copy Markdown

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 12 changed files in this pull request and generated 1 comment.

Comment thread crates/ecstore/src/set_disk.rs
Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants