Skip to content

fix: lfc read/write seconds calculation bug fix#12852

Open
kevin336 wants to merge 1 commit into
neondatabase:mainfrom
kevin336:fix/lfc_seconds_calc
Open

fix: lfc read/write seconds calculation bug fix#12852
kevin336 wants to merge 1 commit into
neondatabase:mainfrom
kevin336:fix/lfc_seconds_calc

Conversation

@kevin336

Copy link
Copy Markdown

Problem

Bug 1: LFC Write metrics overflow (file_cache_write_wait_seconds_sum)

The INSTR_TIME_SUBTRACT arguments were reversed, causing a negative time value:

// Before (incorrect): io_start - io_end = negative value
INSTR_TIME_SUBTRACT(io_start, io_end);
time_spent_us = INSTR_TIME_GET_MICROSEC(io_start);

When this negative value was interpreted as uint64, it resulted in values close to 2^64 (~18 quintillion), making the metric unusable.

image

Observed symptom: file_cache_write_wait_seconds_sum showing values like 18446744073709.54 instead of realistic values.

Bug 2: LFC Read metrics always zero (file_cache_read_wait_seconds_sum)

image

The read path was missing time measurement code entirely. io_time_us was initialized to 0 but never updated after the preadv() call:

uint64 io_time_us = 0; // initialized
// ... preadv() called without timing ...
inc_page_cache_read_wait(io_time_us); // always 0

Observed symptom: file_cache_read_wait_seconds_sum always showing 0 regardless of actual I/O time.

Solution

Fix 1: Correct the argument order for write metrics

// After (correct): io_end - io_start = positive value
INSTR_TIME_SUBTRACT(io_end, io_start);
time_spent_us = INSTR_TIME_GET_MICROSEC(io_end);

This fix is applied in two locations:

  • lfc_prefetch() (single block write)
  • lfc_writev() (multi-block write)

After fixing the problem I can see below
image

Fix 2: Add time measurement for read metrics

Added proper instrumentation around preadv():

instr_time read_start, read_end;

INSTR_TIME_SET_CURRENT(read_start);
rc = preadv(lfc_desc, ...);
INSTR_TIME_SET_CURRENT(read_end);
INSTR_TIME_SUBTRACT(read_end, read_start);
io_time_us = INSTR_TIME_GET_MICROSEC(read_end);

After fixing the problem I can see below
image

Summary of changes

This PR fixes two bugs in the Local File Cache (LFC) I/O latency metrics that caused incorrect values to be reported.

@kevin336 kevin336 requested review from a team as code owners February 10, 2026 05:33
@kevin336 kevin336 changed the title fix: lfc read/write seconds calculation bug fixed fix: lfc read/write seconds calculation bug fix Feb 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants