Skip to content

[Enhancement] Split publish-trace SST miss counters by local-cache vs remote#73087

Merged
luohaha merged 1 commit into
StarRocks:mainfrom
luohaha:apo-sstable-io-breakdown
May 13, 2026
Merged

[Enhancement] Split publish-trace SST miss counters by local-cache vs remote#73087
luohaha merged 1 commit into
StarRocks:mainfrom
luohaha:apo-sstable-io-breakdown

Conversation

@luohaha
Copy link
Copy Markdown
Contributor

@luohaha luohaha commented May 11, 2026

Why I'm doing:

In shared-data publish traces today we have read_block_miss_cache_cnt on the persistent-index sstable MultiGet path. It tells us the in-memory sstable block cache missed, but not whether the resulting file read was served by the local data cache or went out to the remote object store (S3/OSS). That distinction is what we actually need to diagnose slow PK publishes — a miss that hits local disk is dramatically different from one that hits remote.

What I'm doing:

In PersistentIndexSstable::multi_get, snapshot the underlying RandomAccessFile's NumericStatistics (bytes_read_local_disk, bytes_read_remote, io_count_local_disk, io_count_remote) before and after Table::MultiGet, and emit the delta as four new trace counters alongside the existing miss counter:

  • sstable_io_local_disk_bytes
  • sstable_io_remote_bytes
  • sstable_io_count_local_disk
  • sstable_io_count_remote

Streams without NumericStatistics (e.g. plain POSIX in shared-nothing UTs) report all-zero, so the wiring is safe across build flavors.

UT: a FakeStatsInputStream wraps the on-disk stream and attributes every successful read_at_fully to either the local-disk bucket or the remote bucket; three new cases (test_multi_get_io_breakdown_local_disk, test_multi_get_io_breakdown_remote, test_multi_get_io_breakdown_no_stats) drive the breakdown end-to-end and assert the published trace metrics — including the nullptr-stats branch.

Fixes #issue

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
    • This pr needs auto generate documentation
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 4.1
    • 4.0
    • 3.5
    • 3.4

@luohaha luohaha requested a review from a team as a code owner May 11, 2026 08:00
@github-actions github-actions Bot requested a review from srlch May 11, 2026 08:02
@github-actions github-actions Bot added the 4.1 label May 11, 2026
@CelerData-Reviewer
Copy link
Copy Markdown

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 🎉

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@luohaha luohaha force-pushed the apo-sstable-io-breakdown branch 2 times, most recently from a32dc24 to 8b43c0d Compare May 12, 2026 06:20
@luohaha luohaha requested a review from a team as a code owner May 12, 2026 06:20
@CelerData-Reviewer
Copy link
Copy Markdown

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 👍

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@luohaha
Copy link
Copy Markdown
Contributor Author

luohaha commented May 12, 2026

Publish-path overhead A/B

Re-checked the cost of the new SST IO breakdown counters on a real shared-data cluster before and after switching from the get_numeric_statistics() adapter to a lightweight IoStatsSnapshot struct.

Setup

  • 2 identical shared-data clusters (1 FE 4c16g + 2 CN 4c16g, aliyun, same main tip 4ee8f9c).
  • Workload: INSERT ... SELECT generate_series(1, 20000) upserts into a single-bucket PK table seeded with 200k rows. enable_sync_publish=true, so per-statement wall clock includes the publish phase.
  • 5 rounds × 10 iters per cluster, alternating PR ↔ baseline; round mean excludes iter-1 cold-cache spike. Reported below as the per-INSERT mean across rounds.

Original version (snapshot via get_numeric_statistics() adapter)

base PR diff
20k-row PK upsert 134.09 ms 141.95 ms +7.86 ms (PR ~5.9% slower)
20k-row DUP control (no multi_get) 97 ms 96 ms 0 (clusters equivalent)

Per-call cost was concentrated in starlet's NumericStatistics adapter (fs_starlet.cpp:206-216): 1× make_unique<NumericStatistics> + 2× vector alloc (reserve(11)) + 11× std::string constructor in append(). Per multi_get snapshot ran twice. The publish path always has an active Trace, so the cost was always paid.

Light version (this commit — IoStatsSnapshot struct)

InputStream now exposes a virtual get_io_stats_snapshot() const returning a 13×int64 struct. Starlet override copies directly from IOStats — no heap alloc, no strings, no vector.

Two runs (5 rounds each):

Run base PR diff
1 172.95 ms 167.68 ms -5.27 ms
2 165.94 ms 151.99 ms -13.95 ms
aggregate (10 rounds) 169.45 ms 159.84 ms -9.61 ms (PR ~5.7% faster)

The original ~6% slowdown is gone. The "PR is now faster" reading is biased by ordering (each round measures base first, then PR, so PR benefits from warmer caches); the honest read is that the lightweight snapshot drops the overhead into the cluster-level noise, well below resolution at this workload size.

Bonus: easier to extend

Adding a new counter to the publish trace in the future is now:

  1. Add the field to io::IoStatsSnapshot in be/src/io/core/input_stream.h.
  2. Fill it in the starlet override in be/src/fs/fs_starlet.cpp.

No changes to the wrapper chain (SeekableInputStreamWrapper, SharedBufferedInputStream, CompressedInputStream, CompressedSeekableInputStream, BundleSeekableInputStream), no changes to other callers, no changes to the UT fake.

xiangguangyxg
xiangguangyxg previously approved these changes May 12, 2026
@CelerData-Reviewer
Copy link
Copy Markdown

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. What shall we delve into next?

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@luohaha luohaha force-pushed the apo-sstable-io-breakdown branch from c2d974c to 316e4d0 Compare May 12, 2026 11:28
@CelerData-Reviewer
Copy link
Copy Markdown

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. More of your lovely PRs please.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@luohaha luohaha force-pushed the apo-sstable-io-breakdown branch from 316e4d0 to f2c8a12 Compare May 12, 2026 15:23
@CelerData-Reviewer
Copy link
Copy Markdown

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Nice work!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

… remote

The existing `read_block_miss_cache_cnt` trace counter on the persistent-index
sstable MultiGet path only tells you that the in-memory sstable block cache
missed; it does not say whether the resulting file read was served by the
local data cache or by the remote object store (S3/OSS). For shared-data
publish traces that distinction is what we actually need to diagnose.

Snapshot the underlying RandomAccessFile's NumericStatistics
(`bytes_read_local_disk`, `bytes_read_remote`, `io_count_local_disk`,
`io_count_remote`) before and after `Table::MultiGet`, and emit the delta as
four new TRACE counters alongside the existing miss counter:

  sstable_io_local_disk_bytes
  sstable_io_remote_bytes
  sstable_io_count_local_disk
  sstable_io_count_remote

Streams without NumericStatistics (e.g. plain POSIX in shared-nothing UTs)
report all-zero, so the wiring is safe in every build flavor.

UT: drive the breakdown end-to-end by wrapping the on-disk stream with a
FakeStatsInputStream that attributes every successful `read_at_fully` to
either the local-disk bucket or the remote bucket, and assert the published
trace metrics. A third case exercises the nullptr-stats path.

Signed-off-by: luohaha <18810541851@163.com>
@luohaha luohaha force-pushed the apo-sstable-io-breakdown branch from f2c8a12 to b13a360 Compare May 12, 2026 16:43
@CelerData-Reviewer
Copy link
Copy Markdown

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 🎉

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@github-actions
Copy link
Copy Markdown
Contributor

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

@github-actions
Copy link
Copy Markdown
Contributor

[FE Incremental Coverage Report]

pass : 0 / 0 (0%)

@github-actions
Copy link
Copy Markdown
Contributor

[BE Incremental Coverage Report]

pass : 37 / 37 (100.00%)

file detail

path covered_line new_line coverage not_covered_line_detail
🔵 be/src/fs/fs_starlet.cpp 12 12 100.00% []
🔵 be/src/fs/bundle_file.cpp 2 2 100.00% []
🔵 be/src/io/core/seekable_input_stream.cpp 2 2 100.00% []
🔵 be/src/io/core/input_stream.cpp 4 4 100.00% []
🔵 be/src/io/shared_buffered_input_stream.cpp 2 2 100.00% []
🔵 be/src/storage/lake/persistent_index_sstable.cpp 11 11 100.00% []
🔵 be/src/io/compressed_input_stream.cpp 4 4 100.00% []

@luohaha luohaha enabled auto-merge (squash) May 13, 2026 03:24
@luohaha luohaha merged commit 2b6cbed into StarRocks:main May 13, 2026
57 of 58 checks passed
@github-actions
Copy link
Copy Markdown
Contributor

@Mergifyio backport branch-4.1

@github-actions github-actions Bot removed the 4.1 label May 13, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 13, 2026

backport branch-4.1

✅ Backports have been created

Details

Cherry-pick of 2b6cbed has failed:

On branch mergify/bp/branch-4.1/pr-73087
Your branch is up to date with 'origin/branch-4.1'.

You are currently cherry-picking commit 2b6cbed582.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	modified:   be/src/fs/bundle_file.cpp
	modified:   be/src/fs/bundle_file.h
	modified:   be/src/fs/fs_starlet.cpp
	modified:   be/src/io/compressed_input_stream.cpp
	modified:   be/src/io/compressed_input_stream.h
	modified:   be/src/io/input_stream.h
	modified:   be/src/io/seekable_input_stream.cpp
	modified:   be/src/io/seekable_input_stream.h
	modified:   be/src/io/shared_buffered_input_stream.cpp
	modified:   be/src/io/shared_buffered_input_stream.h
	modified:   be/test/fs/fs_starlet_test.cpp
	modified:   be/test/io/compressed_input_stream_test.cpp

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   be/src/io/CMakeLists.txt
	added by them:   be/src/io/input_stream.cpp
	both modified:   be/src/storage/lake/persistent_index_sstable.cpp
	both modified:   be/test/storage/lake/persistent_index_sstable_test.cpp

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

luohaha added a commit that referenced this pull request May 13, 2026
… remote (#73087)

Signed-off-by: luohaha <18810541851@163.com>
wanpengfei-git pushed a commit that referenced this pull request May 13, 2026
… remote (backport #73087) (#73191)

Signed-off-by: luohaha <18810541851@163.com>
Co-authored-by: Yixin Luo <18810541851@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants