Skip to content

fix(stats): report wall-clock uptime so dashboard matches systemd#283

Merged
razvandimescu merged 2 commits into
mainfrom
fix/uptime-wall-clock
Jun 4, 2026
Merged

fix(stats): report wall-clock uptime so dashboard matches systemd#283
razvandimescu merged 2 commits into
mainfrom
fix/uptime-wall-clock

Conversation

@razvandimescu

@razvandimescu razvandimescu commented Jun 4, 2026

Copy link
Copy Markdown
Owner

Problem

Issue #281: a user running numa 0.20.0 under systemd saw systemctl status report the service active (running) since … 14h ago, while the dashboard reported uptime of only 4h 34min.

Cause

Uptime was computed from std::time::Instant, whose monotonic clock freezes while the host is suspendedCLOCK_MONOTONIC on Linux, CLOCK_UPTIME_RAW/mach_absolute_time on macOS. systemd's "active since" reckons from the real-time clock, which keeps counting through sleep. After an overnight suspend (~9.5h here) the two diverge by the suspend duration.

This is current Rust behavior: the PR to switch Instant to CLOCK_BOOTTIME was closed without merge (Dec 2022).

Fix

Anchor started_at to std::time::SystemTime directly (no wrapper type). systemd measures uptime from the real-time clock too, so dashboard and systemd now move together across both suspend and NTP/clock steps. The three read sites use .elapsed().unwrap_or_default(), clamping to zero if the real-time clock steps back past the start point.

CLOCK_BOOTTIME (Linux-only, via libc) was considered but rejected: it only fixes Linux (macOS Instant has the same bug), adds unsafe + cfg, and would actually diverge from systemd's display on an NTP step. For a human-facing uptime readout whose job is to agree with systemd, matching systemd's clock source is the simpler and more correct choice. Uptime is never used for interval timing here, so giving up monotonicity costs nothing.

Verification

Automatedtests/docker/issue-281-repro.sh reproduces the bug deterministically without a real suspend. A container can't suspend, but suspend is observationally just "real-time advanced while monotonic did not", which libfaketime with DONT_FAKE_MONOTONIC=1 reproduces exactly (fakes CLOCK_REALTIME/SystemTime, passes CLOCK_MONOTONIC/Instant through). It starts numa at a fake T0, jumps the wall-clock +10h, and asserts uptime followed:

✓ numa up; uptime before jump = 0s
jumped wall-clock +10h; uptime after = 35999s
✓ PASS uptime tracked the +10h wall-clock jump (Δ=35999s) — matches systemd

Passes on this branch; on a main-built binary the delta is ~0 (the #281 undercount), so it doubles as a regression guard. Builds numa inside the image, so it's host-OS/arch independent.

Manual (real suspend) — run the service, note dashboard uptime_secs, systemctl suspend, resume: before this fix the dashboard lags systemctl status … active since by the suspend duration; after, they agree (±NTP jitter).

Existing health/stats unit tests pass; clippy clean on the changed files (the unrelated srtt.rs/recursive.rs 1.94 lints are pre-existing toolchain drift on main).

Closes #281.

Uptime was derived from Instant, whose monotonic clock freezes during
host suspend on Linux (CLOCK_MONOTONIC) and macOS (CLOCK_UPTIME_RAW).
After the machine slept, the dashboard undercounted versus systemd's
"active (running) since N ago", which reckons from the real-time clock.

Anchor started_at to SystemTime instead; systemd measures uptime the
same way, so the two agree across suspend and clock steps. elapsed()
clamps to zero on a backward real-time step.

Closes #281.
Reproduces the monotonic-clock undercount without a real host suspend:
runs numa under libfaketime with DONT_FAKE_MONOTONIC=1 so CLOCK_REALTIME
(SystemTime) advances while CLOCK_MONOTONIC (Instant) stays frozen — the
exact condition a suspend creates. Jumps the fake wall-clock +10h and
asserts uptime followed. Passes on this branch, fails on main.
@razvandimescu razvandimescu force-pushed the fix/uptime-wall-clock branch from c4cc596 to 9306280 Compare June 4, 2026 13:40
@razvandimescu razvandimescu merged commit 3522654 into main Jun 4, 2026
7 checks passed
@razvandimescu razvandimescu deleted the fix/uptime-wall-clock branch June 4, 2026 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Uptime question

1 participant