fix(stats): report wall-clock uptime so dashboard matches systemd#283
Merged
Conversation
Uptime was derived from Instant, whose monotonic clock freezes during host suspend on Linux (CLOCK_MONOTONIC) and macOS (CLOCK_UPTIME_RAW). After the machine slept, the dashboard undercounted versus systemd's "active (running) since N ago", which reckons from the real-time clock. Anchor started_at to SystemTime instead; systemd measures uptime the same way, so the two agree across suspend and clock steps. elapsed() clamps to zero on a backward real-time step. Closes #281.
Reproduces the monotonic-clock undercount without a real host suspend: runs numa under libfaketime with DONT_FAKE_MONOTONIC=1 so CLOCK_REALTIME (SystemTime) advances while CLOCK_MONOTONIC (Instant) stays frozen — the exact condition a suspend creates. Jumps the fake wall-clock +10h and asserts uptime followed. Passes on this branch, fails on main.
c4cc596 to
9306280
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Issue #281: a user running numa 0.20.0 under systemd saw
systemctl statusreport the serviceactive (running) since … 14h ago, while the dashboard reported uptime of only 4h 34min.Cause
Uptime was computed from
std::time::Instant, whose monotonic clock freezes while the host is suspended —CLOCK_MONOTONICon Linux,CLOCK_UPTIME_RAW/mach_absolute_timeon macOS. systemd's "active since" reckons from the real-time clock, which keeps counting through sleep. After an overnight suspend (~9.5h here) the two diverge by the suspend duration.This is current Rust behavior: the PR to switch
InstanttoCLOCK_BOOTTIMEwas closed without merge (Dec 2022).Fix
Anchor
started_attostd::time::SystemTimedirectly (no wrapper type). systemd measures uptime from the real-time clock too, so dashboard and systemd now move together across both suspend and NTP/clock steps. The three read sites use.elapsed().unwrap_or_default(), clamping to zero if the real-time clock steps back past the start point.CLOCK_BOOTTIME(Linux-only, via libc) was considered but rejected: it only fixes Linux (macOSInstanthas the same bug), addsunsafe+cfg, and would actually diverge from systemd's display on an NTP step. For a human-facing uptime readout whose job is to agree with systemd, matching systemd's clock source is the simpler and more correct choice. Uptime is never used for interval timing here, so giving up monotonicity costs nothing.Verification
Automated —
tests/docker/issue-281-repro.shreproduces the bug deterministically without a real suspend. A container can't suspend, but suspend is observationally just "real-time advanced while monotonic did not", whichlibfaketimewithDONT_FAKE_MONOTONIC=1reproduces exactly (fakesCLOCK_REALTIME/SystemTime, passesCLOCK_MONOTONIC/Instantthrough). It starts numa at a fake T0, jumps the wall-clock +10h, and asserts uptime followed:Passes on this branch; on a
main-built binary the delta is ~0 (the #281 undercount), so it doubles as a regression guard. Builds numa inside the image, so it's host-OS/arch independent.Manual (real suspend) — run the service, note dashboard
uptime_secs,systemctl suspend, resume: before this fix the dashboard lagssystemctl status … active sinceby the suspend duration; after, they agree (±NTP jitter).Existing
health/statsunit tests pass; clippy clean on the changed files (the unrelatedsrtt.rs/recursive.rs1.94 lints are pre-existing toolchain drift onmain).Closes #281.