fix: lfc read/write seconds calculation bug fix#12852
Open
kevin336 wants to merge 1 commit into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Bug 1: LFC Write metrics overflow (file_cache_write_wait_seconds_sum)
The
INSTR_TIME_SUBTRACTarguments were reversed, causing a negative time value:When this negative value was interpreted as uint64, it resulted in values close to 2^64 (~18 quintillion), making the metric unusable.
Observed symptom: file_cache_write_wait_seconds_sum showing values like 18446744073709.54 instead of realistic values.
Bug 2: LFC Read metrics always zero (file_cache_read_wait_seconds_sum)
The read path was missing time measurement code entirely. io_time_us was initialized to 0 but never updated after the preadv() call:
uint64 io_time_us = 0; // initialized
// ... preadv() called without timing ...
inc_page_cache_read_wait(io_time_us); // always 0
Observed symptom: file_cache_read_wait_seconds_sum always showing 0 regardless of actual I/O time.
Solution
Fix 1: Correct the argument order for write metrics
// After (correct): io_end - io_start = positive value
INSTR_TIME_SUBTRACT(io_end, io_start);
time_spent_us = INSTR_TIME_GET_MICROSEC(io_end);
This fix is applied in two locations:
After fixing the problem I can see below

Fix 2: Add time measurement for read metrics
Added proper instrumentation around preadv():
instr_time read_start, read_end;
INSTR_TIME_SET_CURRENT(read_start);
rc = preadv(lfc_desc, ...);
INSTR_TIME_SET_CURRENT(read_end);
INSTR_TIME_SUBTRACT(read_end, read_start);
io_time_us = INSTR_TIME_GET_MICROSEC(read_end);
After fixing the problem I can see below

Summary of changes
This PR fixes two bugs in the Local File Cache (LFC) I/O latency metrics that caused incorrect values to be reported.