[Enhancement] Split publish-trace SST miss counters by local-cache vs remote#73087
Conversation
|
@codex review |
|
Codex Review: Didn't find any major issues. 🎉 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
a32dc24 to
8b43c0d
Compare
|
@codex review |
|
Codex Review: Didn't find any major issues. 👍 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Publish-path overhead A/BRe-checked the cost of the new SST IO breakdown counters on a real shared-data cluster before and after switching from the Setup
Original version (snapshot via
|
| base | PR | diff | |
|---|---|---|---|
| 20k-row PK upsert | 134.09 ms | 141.95 ms | +7.86 ms (PR ~5.9% slower) |
20k-row DUP control (no multi_get) |
97 ms | 96 ms | 0 (clusters equivalent) |
Per-call cost was concentrated in starlet's NumericStatistics adapter (fs_starlet.cpp:206-216): 1× make_unique<NumericStatistics> + 2× vector alloc (reserve(11)) + 11× std::string constructor in append(). Per multi_get snapshot ran twice. The publish path always has an active Trace, so the cost was always paid.
Light version (this commit — IoStatsSnapshot struct)
InputStream now exposes a virtual get_io_stats_snapshot() const returning a 13×int64 struct. Starlet override copies directly from IOStats — no heap alloc, no strings, no vector.
Two runs (5 rounds each):
| Run | base | PR | diff |
|---|---|---|---|
| 1 | 172.95 ms | 167.68 ms | -5.27 ms |
| 2 | 165.94 ms | 151.99 ms | -13.95 ms |
| aggregate (10 rounds) | 169.45 ms | 159.84 ms | -9.61 ms (PR ~5.7% faster) |
The original ~6% slowdown is gone. The "PR is now faster" reading is biased by ordering (each round measures base first, then PR, so PR benefits from warmer caches); the honest read is that the lightweight snapshot drops the overhead into the cluster-level noise, well below resolution at this workload size.
Bonus: easier to extend
Adding a new counter to the publish trace in the future is now:
- Add the field to
io::IoStatsSnapshotinbe/src/io/core/input_stream.h. - Fill it in the starlet override in
be/src/fs/fs_starlet.cpp.
No changes to the wrapper chain (SeekableInputStreamWrapper, SharedBufferedInputStream, CompressedInputStream, CompressedSeekableInputStream, BundleSeekableInputStream), no changes to other callers, no changes to the UT fake.
a4fd3cd to
c2d974c
Compare
|
@codex review |
|
Codex Review: Didn't find any major issues. What shall we delve into next? ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
c2d974c to
316e4d0
Compare
|
@codex review |
|
Codex Review: Didn't find any major issues. More of your lovely PRs please. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
316e4d0 to
f2c8a12
Compare
|
@codex review |
|
Codex Review: Didn't find any major issues. Nice work! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
… remote The existing `read_block_miss_cache_cnt` trace counter on the persistent-index sstable MultiGet path only tells you that the in-memory sstable block cache missed; it does not say whether the resulting file read was served by the local data cache or by the remote object store (S3/OSS). For shared-data publish traces that distinction is what we actually need to diagnose. Snapshot the underlying RandomAccessFile's NumericStatistics (`bytes_read_local_disk`, `bytes_read_remote`, `io_count_local_disk`, `io_count_remote`) before and after `Table::MultiGet`, and emit the delta as four new TRACE counters alongside the existing miss counter: sstable_io_local_disk_bytes sstable_io_remote_bytes sstable_io_count_local_disk sstable_io_count_remote Streams without NumericStatistics (e.g. plain POSIX in shared-nothing UTs) report all-zero, so the wiring is safe in every build flavor. UT: drive the breakdown end-to-end by wrapping the on-disk stream with a FakeStatsInputStream that attributes every successful `read_at_fully` to either the local-disk bucket or the remote bucket, and assert the published trace metrics. A third case exercises the nullptr-stats path. Signed-off-by: luohaha <18810541851@163.com>
f2c8a12 to
b13a360
Compare
|
@codex review |
|
Codex Review: Didn't find any major issues. 🎉 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
[Java-Extensions Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[FE Incremental Coverage Report]✅ pass : 0 / 0 (0%) |
[BE Incremental Coverage Report]✅ pass : 37 / 37 (100.00%) file detail
|
|
@Mergifyio backport branch-4.1 |
✅ Backports have been createdDetails
Cherry-pick of 2b6cbed has failed: To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally |
… remote (#73087) Signed-off-by: luohaha <18810541851@163.com>
Why I'm doing:
In shared-data publish traces today we have
read_block_miss_cache_cnton the persistent-index sstableMultiGetpath. It tells us the in-memory sstable block cache missed, but not whether the resulting file read was served by the local data cache or went out to the remote object store (S3/OSS). That distinction is what we actually need to diagnose slow PK publishes — a miss that hits local disk is dramatically different from one that hits remote.What I'm doing:
In
PersistentIndexSstable::multi_get, snapshot the underlyingRandomAccessFile'sNumericStatistics(bytes_read_local_disk,bytes_read_remote,io_count_local_disk,io_count_remote) before and afterTable::MultiGet, and emit the delta as four new trace counters alongside the existing miss counter:sstable_io_local_disk_bytessstable_io_remote_bytessstable_io_count_local_disksstable_io_count_remoteStreams without
NumericStatistics(e.g. plain POSIX in shared-nothing UTs) report all-zero, so the wiring is safe across build flavors.UT: a
FakeStatsInputStreamwraps the on-disk stream and attributes every successfulread_at_fullyto either the local-disk bucket or the remote bucket; three new cases (test_multi_get_io_breakdown_local_disk,test_multi_get_io_breakdown_remote,test_multi_get_io_breakdown_no_stats) drive the breakdown end-to-end and assert the published trace metrics — including the nullptr-stats branch.Fixes #issue
What type of PR is this:
Does this PR entail a change in behavior?
If yes, please specify the type of change:
Checklist:
Bugfix cherry-pick branch check: