Skip to content

[Enhancement] Split publish-trace SST miss counters by local-cache vs remote (backport #73087)#73191

Merged
wanpengfei-git merged 1 commit into
branch-4.1from
mergify/bp/branch-4.1/pr-73087
May 13, 2026
Merged

[Enhancement] Split publish-trace SST miss counters by local-cache vs remote (backport #73087)#73191
wanpengfei-git merged 1 commit into
branch-4.1from
mergify/bp/branch-4.1/pr-73087

Conversation

@mergify
Copy link
Copy Markdown
Contributor

@mergify mergify Bot commented May 13, 2026

Why I'm doing:

In shared-data publish traces today we have read_block_miss_cache_cnt on the persistent-index sstable MultiGet path. It tells us the in-memory sstable block cache missed, but not whether the resulting file read was served by the local data cache or went out to the remote object store (S3/OSS). That distinction is what we actually need to diagnose slow PK publishes — a miss that hits local disk is dramatically different from one that hits remote.

What I'm doing:

In PersistentIndexSstable::multi_get, snapshot the underlying RandomAccessFile's NumericStatistics (bytes_read_local_disk, bytes_read_remote, io_count_local_disk, io_count_remote) before and after Table::MultiGet, and emit the delta as four new trace counters alongside the existing miss counter:

  • sstable_io_local_disk_bytes
  • sstable_io_remote_bytes
  • sstable_io_count_local_disk
  • sstable_io_count_remote

Streams without NumericStatistics (e.g. plain POSIX in shared-nothing UTs) report all-zero, so the wiring is safe across build flavors.

UT: a FakeStatsInputStream wraps the on-disk stream and attributes every successful read_at_fully to either the local-disk bucket or the remote bucket; three new cases (test_multi_get_io_breakdown_local_disk, test_multi_get_io_breakdown_remote, test_multi_get_io_breakdown_no_stats) drive the breakdown end-to-end and assert the published trace metrics — including the nullptr-stats branch.

Fixes #issue

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
    • This pr needs auto generate documentation
  • This is a backport pr

Bugfix cherry-pick branch check:

@mergify mergify Bot added the conflicts label May 13, 2026
@mergify
Copy link
Copy Markdown
Contributor Author

mergify Bot commented May 13, 2026

Cherry-pick of 2b6cbed has failed:

On branch mergify/bp/branch-4.1/pr-73087
Your branch is up to date with 'origin/branch-4.1'.

You are currently cherry-picking commit 2b6cbed582.
  (fix conflicts and run "git cherry-pick --continue")
  (use "git cherry-pick --skip" to skip this patch)
  (use "git cherry-pick --abort" to cancel the cherry-pick operation)

Changes to be committed:
	modified:   be/src/fs/bundle_file.cpp
	modified:   be/src/fs/bundle_file.h
	modified:   be/src/fs/fs_starlet.cpp
	modified:   be/src/io/compressed_input_stream.cpp
	modified:   be/src/io/compressed_input_stream.h
	modified:   be/src/io/input_stream.h
	modified:   be/src/io/seekable_input_stream.cpp
	modified:   be/src/io/seekable_input_stream.h
	modified:   be/src/io/shared_buffered_input_stream.cpp
	modified:   be/src/io/shared_buffered_input_stream.h
	modified:   be/test/fs/fs_starlet_test.cpp
	modified:   be/test/io/compressed_input_stream_test.cpp

Unmerged paths:
  (use "git add <file>..." to mark resolution)
	both modified:   be/src/io/CMakeLists.txt
	added by them:   be/src/io/input_stream.cpp
	both modified:   be/src/storage/lake/persistent_index_sstable.cpp
	both modified:   be/test/storage/lake/persistent_index_sstable_test.cpp

To fix up this pull request, you can check it out locally. See documentation: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/checking-out-pull-requests-locally

@wanpengfei-git wanpengfei-git enabled auto-merge (squash) May 13, 2026 03:28
@mergify mergify Bot closed this May 13, 2026
auto-merge was automatically disabled May 13, 2026 03:28

Pull request was closed

@mergify
Copy link
Copy Markdown
Contributor Author

mergify Bot commented May 13, 2026

@mergify[bot]: Backport conflict, please reslove the conflict and resubmit the pr

@luohaha luohaha reopened this May 13, 2026
@luohaha luohaha closed this May 13, 2026
@luohaha luohaha reopened this May 13, 2026
@wanpengfei-git wanpengfei-git enabled auto-merge (squash) May 13, 2026 04:34
… remote (#73087)

Signed-off-by: luohaha <18810541851@163.com>
@luohaha luohaha force-pushed the mergify/bp/branch-4.1/pr-73087 branch from d926ea4 to 1a9dc09 Compare May 13, 2026 04:50
@wanpengfei-git wanpengfei-git merged commit fdf9b5d into branch-4.1 May 13, 2026
31 checks passed
@wanpengfei-git wanpengfei-git deleted the mergify/bp/branch-4.1/pr-73087 branch May 13, 2026 06:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants