watcher: coalesce per-save event bursts into a single --hot reload#30617
watcher: coalesce per-save event bursts into a single --hot reload#30617robobun wants to merge 12 commits into
Conversation
|
Found 1 issue this PR may fix:
🤖 Generated with Claude Code |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughLinux, macOS, and Windows file watchers now coalesce rapid event bursts using a configurable quiet-period and bounded drain iterations (default 10ms). Hot reloader deduplicates pending file-hash IDs. Tests add a regression asserting a burst produces a single reload. ChangesFile Event Coalescing Across Watchers
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Warning Review ran into problems🔥 ProblemsStopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a Comment |
There was a problem hiding this comment.
Additional findings (outside current diff — PR may have been updated during review):
-
🟡
test/cli/hot/hot.test.ts:658-662— TheBUN_INOTIFY_COALESCE_INTERVALoverride here is only honored on Linux —KEventWatcher.zighardcodescoalesce_interval_ns = 10_000_000andWindowsWatcher.zighardcodesTimeout.coalesce = 10, neither reads the env var. On macOS/Windows the test still runs with the new 10 ms window, so per the comment above, the self-write gets absorbed and the reject→report race this test guards never opens (the test still passes, it just stops exercising the regression). Might be worth having the kqueue/Windows watchers honor the same override, or at least noting that this race is now only meaningfully exercised on Linux.Extended reasoning...
What the issue is
The
should not remap against a stale sourcemap after a partial-file reloadtest setsenv: { ...bunEnv, BUN_INOTIFY_COALESCE_INTERVAL: "100000" }to pin the watcher's coalesce window back to 0.1 ms so the file's self-write is dispatched before the rejection is reported. However,BUN_INOTIFY_COALESCE_INTERVALis only consumed byINotifyWatcher.init()(src/watcher/INotifyWatcher.zig:114). The macOS watcher hardcodesconst coalesce_interval_ns = 10_000_000(src/watcher/KEventWatcher.zig:12) and the Windows watcher hardcodesTimeout.coalesce = 10(src/watcher/WindowsWatcher.zig) — neither consults the env var. So on macOS and Windows the env override is a no-op and the test runs under the new 10 ms coalesce window introduced by this PR.How it manifests
The test's own comment explains why this matters: "the default 10 ms coalesce would absorb it into the next
writeFulland the race under test never opens." That is exactly what now happens on macOS/Windows. ThewriteFileSync(__filename, "// stub")self-write triggers the watcher's 10 ms coalesce wait; during that window the rejection is printed (correctly mapped — no stale-sourcemap race), and then the watcher delivers the event(s) after the report. The reject→report window the test was written to exercise is therefore closed on those two platforms.Why the test still passes
I traced
driveErrorReloadCycleunder the 10 ms window and the test does not fail or flake on macOS/Windows. There are two possible interleavings:- The self-write and the test's subsequent
writeFull(N+1)both land inside the same 10 ms coalesce window → one reload with the fullN+1content →error: N+1prints →verifyLinepasses. - The self-write is delivered alone → the watcher reloads the comment-only stub, which throws nothing and emits nothing to stderr (
--hotkeeps the process alive via the keep-alive timer) → the nextwriteFull(N+1)triggers another reload →error: N+1prints.
In both cases
driveErrorReloadCycleobserves the expectederror: Nsequence and the assertions pass. So this is purely a test-coverage degradation, not a functional bug or a flaky test.Step-by-step proof
- On macOS,
KEventWatcher.init()is called → it never readsBUN_INOTIFY_COALESCE_INTERVAL;coalesce_interval_nsstays at the file-level constant10_000_000. - Test spawns
bun --hotwith the env var set → the kqueue watcher ignores it. - The hot module evaluates: it does
writeFileSync(__filename, "// stub N")thenthrow new Error('N'). - The kqueue thread receives
NOTE_WRITE, then enters thewhile (count < changelist_count …)loop with a 10 ms timeout, blocking dispatch. - Meanwhile the JS thread reports the rejection — correctly mapped, since no second reload has retranspiled the file yet. The race the test is meant to catch (second reload overwriting
source_mappings[path]between reject and report) cannot occur because the watcher is still inside its 10 ms drain. driveErrorReloadCycleseeserror: N, callswriteFull(N+1), the watcher (either coalescing it with the stub write or in a fresh cycle) reloads, and the loop progresses.- Test reaches
reloadCounter === 20and passes — but at no point did it actually open the stale-sourcemap window.
Impact and fix
The sourcemap-remapping code being guarded is platform-independent VM code, so Linux CI still catches regressions, and the env var name (
INOTIFY) signals Linux-only intent — this may even be a conscious tradeoff. But the test silently degrades to a near-tautology on macOS/Windows after this PR, which is worth flagging. Two options:- Have
KEventWatcherandWindowsWatcherhonor the same env var (or a generically-named one) so the override works everywhere. - Add a comment on the test noting the race is only exercised on Linux.
This is a nit — coverage concern only, no production-code impact, and the test should not flake. It should not block the PR.
- The self-write and the test's subsequent
|
Addressed the coverage note in 0c76ed9: The earlier |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/watcher/WindowsWatcher.zig`:
- Line 222: The loop in WindowsWatcher.zig uses an off-by-one condition: change
the while condition in the coalescing loop from "iterations <=
max_coalesce_iterations" to use "<" so it matches KEventWatcher and
INotifyWatcher; locate the while loop in the WindowsWatcher coalescing logic
(the loop that increments "iterations" and references "max_coalesce_iterations")
and replace the comparison operator to ensure the loop runs at most
max_coalesce_iterations times.
In `@test/cli/hot/hot.test.ts`:
- Line 335: Replace usage of tmpdirSync with the harness helper tempDir: import
tempDir from the test harness and swap const dir = tmpdirSync(); to create the
temporary directory via tempDir() (and await it if the helper is async), and
update any other occurrences in this file (e.g., the beforeEach that uses
tmpdirSync) to use tempDir as well; ensure variable names and cleanup logic
still work with the tempDir return shape.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 4249892f-5e6a-4983-bdea-1a1f404e2abc
📒 Files selected for processing (6)
src/bun_core/env_var.zigsrc/jsc/hot_reloader.zigsrc/watcher/INotifyWatcher.zigsrc/watcher/KEventWatcher.zigsrc/watcher/WindowsWatcher.zigtest/cli/hot/hot.test.ts
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/watcher/KEventWatcher.zig`:
- Around line 76-85: The loop drains events unconditionally which can trap if
the initial kevent syscall returns -1; check the syscall result stored in
`extra` and handle errors before slicing with `@intCast(count)`: if `extra ==
-1` and `count == 0` return or propagate the syscall error from
`std.posix.system.kevent`, otherwise only enter the draining logic when `count >
0` (i.e., guard the slices `changelist[`@intCast`(count)..]` and the secondary
kevent call) so you never slice with a zero/invalid `count` after a failed
syscall in the `KEventWatcher` coalescing loop (variables: `iterations`,
`count`, `changelist_count`, `max_coalesce_iterations`, `extra`).
In `@src/watcher/WindowsWatcher.zig`:
- Around line 144-147: The current conversion truncates sub-millisecond positive
values to 0ms; modify the conversion of
bun.env_var.BUN_INOTIFY_COALESCE_INTERVAL.get() (ns) before assigning
this.coalesce_interval_ms so that an explicit 0 stays 0, but any positive ns is
rounded up to at least 1ms (e.g., compute ms = ns / std.time.ns_per_ms using
integer ceil or set ms = 1 when ns > 0 and computed ms == 0), then cast to
std.math.cast(w.DWORD, ms) and fall back to default_coalesce_interval_ms if the
env var is absent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: d3b22350-fda9-4dde-9dd1-ec20825a0867
📒 Files selected for processing (2)
src/watcher/KEventWatcher.zigsrc/watcher/WindowsWatcher.zig
There was a problem hiding this comment.
No new issues found, but this changes the coalesce behavior of all three platform watcher backends and bumps the per-reload latency floor from ~0.1 ms to ~10 ms — worth a human sanity-check on the tradeoff and the timing-sensitive test before merge.
Extended reasoning...
Overview
The PR reworks event coalescing in all three filesystem-watcher backends (INotifyWatcher, KEventWatcher, WindowsWatcher): the single 0.1 ms extra read is replaced with a bounded drain loop that waits up to 10 ms (×32 iterations) for the queue to go quiet, the BUN_INOTIFY_COALESCE_INTERVAL default is raised 100×, and HotReloader.Task.append now dedups path hashes. A new burst-write test is added and an existing stale-sourcemap test is pinned to the old 0.1 ms window via env var.
Security risks
None identified. No auth, crypto, network, or untrusted-input parsing is involved; the change is confined to local filesystem-event timing and an internal env var.
Level of scrutiny
Moderate-to-high. The watcher backends are shared by --hot, --watch, bun test --watch, and the bake dev server, so a regression here is broadly user-visible. The change also encodes a UX tradeoff (every hot reload now waits ~10 ms after the last event before firing) and replaces a simple one-shot read with a bounded loop on three different kernel APIs — each with subtly different early-return/timer-resolution semantics that the PR thread already iterated on (kqueue iteration cap, Windows <= off-by-one, timespec ≥1 s EINVAL). That's exactly the kind of cross-platform timing logic a maintainer should eyeball.
Other factors
- The earlier macOS
hot.test.tsfailures and the kqueue iteration-cap bug were addressed in 8731480; CI results for that commit aren't reflected in the timeline yet. - One non-blocking inline comment (timespec normalization for env-var values ≥1 s) is still open.
- The new test is intentionally timing-dependent (10×
writeSyncwithsleepSync(2)vs a 10 ms window); the author has acknowledged they'll tighten it if CI proves it flaky, but that's a judgment call better confirmed by a human. - The
Timeoutenum removal inWindowsWatcher.zigchanges a public-ish signature (next); a quick check shows no other callers, but it's another small surface-area change.
There was a problem hiding this comment.
All my earlier inline concerns are addressed and I didn't find new issues, but this reworks the coalesce loop across all three watcher backends and bumps the default coalesce window 100× — worth a human sanity-check on the latency/flake tradeoff before merge.
Extended reasoning...
Overview
This PR restructures the event-coalescing logic in all three filesystem watcher backends (INotifyWatcher.zig, KEventWatcher.zig, WindowsWatcher.zig), raises the default BUN_INOTIFY_COALESCE_INTERVAL from 0.1 ms to 10 ms, adds hash dedup in HotReloader.Task.append, adds a new timing-sensitive test, and refactors an existing test to fix a pre-existing flake. The diff is ~250 lines of behavior change plus extensive comments.
Security risks
None. This is local filesystem-watcher / dev-loop tooling with no auth, crypto, network, or untrusted-input surface.
Level of scrutiny
Moderate-to-high. The change is well-reasoned and well-documented, but it:
- Touches three independent OS-specific syscall loops (inotify/ppoll, kqueue/kevent, IOCP/GetQueuedCompletionStatus) with subtly different semantics that were already the subject of several review rounds (iteration-cap parity, timespec normalization, negative-count guard, sub-ms rounding).
- Changes a user-visible default by 100× — every
--hot/--watchreload now waits up to ~10 ms longer for the queue to go quiet. That's almost certainly the right tradeoff vs. duplicate reloads, but it's a product decision. - Adds a test whose correctness depends on
sleepSync(2)gaps staying under the 10 ms window across CI platforms; the author explicitly chose to keep it at the production default and tighten later if CI proves it flaky.
Other factors
All prior review feedback (mine and CodeRabbit's) has been addressed in 8731480 and e508ede; all inline threads are resolved. The latest CI run on 8731480 shows only unrelated failures (s3-storage-class, http-emit-close), so the new test hasn't been disproven on macOS/Windows yet either. Given the cross-platform surface and the acknowledged residual timing sensitivity, a maintainer should sign off rather than auto-approve.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/cli/hot/hot.test.ts`:
- Around line 440-471: The initial-output wait can hang if the child process
exits before emitting the first "[#!root]" line; replace the plain await initial
with a guarded wait that races the initial promise against the child's early
exit so the test proceeds instead of hanging. Locate the initial/sawInitial pair
and the reader that consumes runner.stdout (symbols: initial, sawInitial,
reader, runner, finished) and change the wait to something like awaiting
Promise.race between initial and a promise that resolves when runner exits (or
when finished becomes true) so the test unblocks on early child termination.
- Around line 360-371: The async IIFE reading from runner.stdout currently
swallows all errors with .catch(() => {}), which hides parser/stream failures
and causes the test to hang; update the catch to propagate the error instead
(e.g., .catch(err => { /* surface */ throw err }) or store the error into a
shared readerError promise/variable that the test checks alongside evals) so any
failure in the stdout reader (the async IIFE that manipulates buffered and
pushes into evals) will fail the test immediately rather than being silently
ignored.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 76414366-f4d7-45cf-883d-3796fcdab39b
📒 Files selected for processing (1)
test/cli/hot/hot.test.ts
There was a problem hiding this comment.
I didn't find any new issues, but this touches all three platform watcher backends plus a user-facing default (0.1 ms → 10 ms coalesce window, worst-case ~320 ms reload latency under continuous writes), so it's worth a human sanity-check on the timing trade-offs and the Intel-macOS test skip.
Extended reasoning...
Overview
This PR reworks event coalescing in all three filesystem-watcher backends (INotifyWatcher, KEventWatcher, WindowsWatcher) so that the burst of kernel events produced by a single editor save collapses into one onFileUpdate call, fixing #13511 where --hot re-evaluated the entry point once per event. It also adds a dedup in HotReloader.Task.append, raises the BUN_INOTIFY_COALESCE_INTERVAL default from 0.1 ms to 10 ms, adds a new timing-sensitive regression test, and adjusts two existing tests to account for the new window.
Security risks
None. The change is confined to local filesystem-watch timing and an internal env var; no auth, crypto, network, or untrusted-input surfaces are touched.
Level of scrutiny
Medium-high. The diff is not large, but it changes platform-specific syscall loops (ppoll/kevent/GetQueuedCompletionStatus) with subtle semantics (early-return on event-ready, timespec normalization, Windows timer quantization), and it shifts a default that affects every --hot/--watch user — each reload now waits up to ~10 ms (or up to ~320 ms if writes never stop) before dispatching. The PR has already been through several correction rounds (kqueue iteration cap 5→32, count > 0 guard, sec/nsec split, divCeil rounding), which suggests the edge cases here are easy to get wrong and merit human eyes.
Other factors
All three of my earlier inline findings have been addressed and resolved. The new test is inherently timing-dependent (10× writeSync with sleepSync(2) vs a 10 ms window) and is already skipIf(isIntelMacOS); the author explained why the window can't be widened without defeating the regression gate. Two unresolved CodeRabbit nits remain on the test's stdout-reader error handling — minor, but still open. Given the cross-platform blast radius and the latency/design trade-offs (10 ms default, 32-iteration cap), this is beyond what I'd auto-approve.
Status: diff is green — CI red is infra/upstream noise, needs a maintainer to mergeBug (#13511): a single editor save in Root cause: editors emit several filesystem events per logical save, spread over a few milliseconds. The watcher's coalesce window was 0.1 ms (inotify/kqueue) or 0 ms (Windows), so one save split across multiple watch-loop cycles → one reload each. A second path: Fix (both the Zig reference files and the shipping Rust port):
Verification: CI history — no watcher/hot-reload test has failed on any platform in any run:
None of the failing steps touch files in this PR. |
|
Rebased onto main (post-Rust-rewrite, resolved the
Note: this and #30644 both touch the same watcher files; whichever lands second will need a trivial merge. |
0de4f5d to
cd15cac
Compare
6c5f2ab to
e77c065
Compare
There was a problem hiding this comment.
No issues found on this revision, but this touches all three platform watcher backends (inotify/kqueue/ReadDirectoryChangesW, in both Zig and Rust) plus cross-thread hot-reload scheduling, and changes the user-facing --hot reload latency default — worth a human look before merging.
Extended reasoning...
Overview
This PR rewrites the event-coalescing logic in all three filesystem watcher backends (INotifyWatcher, KEventWatcher, WindowsWatcher) — in both the Zig originals and the newly-ported Rust implementations — replacing a single fixed extra read with a bounded drain-until-quiet loop. It also raises the BUN_INOTIFY_COALESCE_INTERVAL default from 0.1 ms to 10 ms, adds hash deduplication in HotReloader.Task.append to prevent mid-onFileUpdate flushes, adds a timing-based regression test, and pins an existing test to the old interval via env var.
Security risks
None. No auth, crypto, network, or untrusted-input parsing surfaces are touched. The only external input is the BUN_INOTIFY_COALESCE_INTERVAL env var, which is bounds-handled (sec/nsec split, divCeil, iteration cap) and only affects local watcher timing.
Level of scrutiny
High. This is core runtime infrastructure for --hot/--watch mode, with platform-specific syscall semantics (ppoll, kevent, GetQueuedCompletionStatus) that differ subtly in early-return and timer-resolution behaviour. The change introduces a deliberate ~10 ms latency floor on every hot reload — a UX tradeoff a maintainer should sign off on. The Task.append dedup interacts with the pending_count atomic and the while-swap loop in Task.run across the watcher and JS threads. The PR has gone through several rounds of bot feedback (iteration-cap parity, count > 0 guard, timespec normalization, ms rounding) and a Rust port mid-review; a human should confirm the Zig and Rust paths remain semantically equivalent and that the new test's timing margin is acceptable for CI.
Other factors
All prior inline review threads (mine and CodeRabbit's) are resolved. The new test is skipIf(isIntelMacOS) and relies on a 2 ms write gap vs a 10 ms coalesce window, which the author has acknowledged as a deliberate margin against the production default. The author also flagged a known merge interaction with #30644. Given the breadth (11 files, 6 platforms × language combinations) and the design decision embedded in the new default, this exceeds the bar for bot-only approval.
e77c065 to
692e279
Compare
A single editor save typically emits several filesystem events a few milliseconds apart (truncate+write, plus matching events on the parent-directory watch). The watcher's coalesce window was 0.1 ms and performed at most one extra read, so most of those events landed in separate watch-loop cycles and --hot re-evaluated the entry point once per cycle — the user saw their script's output repeated for one save. - INotifyWatcher/KEventWatcher/WindowsWatcher: after the first read, keep draining until the queue stays quiet for ~10 ms (bounded by byte/iteration caps so a continuously-written file cannot starve the loop). Raise the default BUN_INOTIFY_COALESCE_INTERVAL to 10 ms. - hot_reloader.Task.append: dedup by hash so the many directory-watch events that all name the same file don't overflow the 8-slot buffer and flush mid-onFileUpdate, which let the JS thread start a reload while the watcher thread was still appending and produced a second reload for the same save. Pin the stale-sourcemap regression test to the old 0.1 ms interval: it depends on the self-write event being dispatched before the rejection is reported, which the new window absorbs. Fixes #13511
The stale-sourcemap test pins the coalesce window to 0.1 ms so the self-write event reaches the JS thread inside the reject→report window it was written to exercise. That override was only read by the inotify backend, so on macOS/FreeBSD/Windows the test ran with the new 10 ms window and no longer opened the race it guards. Have the kqueue and Windows watchers read the same env var (the Windows backend rounds to milliseconds) so the override works uniformly.
kevent() returns as soon as one event is ready rather than waiting the full timeout, so a burst of N writes a few ms apart consumes ~N drain iterations. With the kqueue cap at 5 the new coalesce test's 10-write burst spilled into a second watchLoopCycle on macOS and produced a second reload. Bump kqueue and Windows to 32 to match the inotify backend; the quiet-timeout break still terminates the common case after one idle interval.
- KEventWatcher/INotifyWatcher: split the coalesce interval into sec+nsec so a user-supplied value >= 1 s doesn't make kevent/ppoll fail with EINVAL (which the lenient error handling would silently turn into 'quiet' and disable coalescing). - KEventWatcher: guard the drain loop with count > 0 so a failed initial kevent() (count == -1) doesn't trap on @intcast. - WindowsWatcher: round the ns->ms conversion up so a sub-millisecond override (e.g. the 0.1 ms a test pins for the other backends) becomes 1 ms rather than truncating to 0 and disabling the wait. - hot.test.ts 'should not hot reload when a random file is written': wait for the initial evaluation to land before starting the 200 ms quiet window. On debug/ASAN builds process startup alone exceeded 200 ms, so the test asserted reloadCounter == 0 before the first line ever arrived.
With a non-zero coalesce interval the drain loop can race the self-rewriting hot file in a way that loses the pending rejection (the pre-existing edge case noted in #29740), which on release builds manifests as a 30 s timeout. A zero interval makes the drain loop non-blocking — it polls once and processes whatever is already buffered — which is the closest analogue to the pre-loop behaviour this test was tuned against, and keeps the self-write event landing in the reject→report window on all build profiles.
Intel macOS CI runners stretch sleepSync(2) well past the 10 ms coalesce window (timer coalescing + scheduler load), so the 10-write burst splits into several watch-loop cycles there and the test sees evals=[1..9] instead of [1,2]. arm64 macOS (14 and 26) and all Linux lanes pass; the reported bug (#13511) was Windows + Linux only, and the issue's own comments note it does not reproduce on macOS.
This reverts commit f400c3d.
The workspace now denies unused-doc-comments; doc comments on macro invocations don't attach to the generated item.
692e279 to
802d7e8
Compare
There was a problem hiding this comment.
No issues found in this pass, but this rewrites the coalesce loop across all three platform watcher backends (inotify/kqueue/IOCP, in both the Zig reference and the shipping Rust) and changes a user-visible timing default — worth a human look before merging.
Extended reasoning...
Overview
This PR changes how the filesystem watcher coalesces rapid event bursts so that a single editor save triggers exactly one --hot reload instead of several. It touches eleven files: the three platform watcher backends (INotifyWatcher, KEventWatcher, WindowsWatcher) in both the Zig reference and the shipping Rust port, hot_reloader.{zig,rs} (hash dedup in Task.append), env_var.{zig,rs} (default BUN_INOTIFY_COALESCE_INTERVAL 0.1 ms → 10 ms), and test/cli/hot/hot.test.ts (new regression test plus pinning an existing test to the old interval).
Security risks
None identified. The change only adjusts how long the watcher thread waits for additional kernel events after the first read and adds a linear-scan dedup over an 8-element array. No new external input is parsed, no auth/crypto/permissions surface is touched, and the env var was already user-controllable. The drain loops are bounded by both byte count and a 32-iteration cap, so a continuously-written file cannot starve the watch loop.
Level of scrutiny
This warrants human review rather than auto-approval. It is a behaviour change in a production-critical, cross-platform code path that every --hot/--watch user exercises, implemented twice (Zig + Rust) across three OS-specific syscall loops with subtly different semantics (e.g. kevent early return vs ppoll timeout, Windows ~15.6 ms timer granularity). It also embeds design decisions a maintainer should sign off on: the 10 ms default window (trades ~10 ms of reload latency for dedup), the 32-iteration cap, and reusing the Linux-named env var for all backends. The PR went through several correctness iterations during review (kqueue iteration cap 5→32, count > 0 guard, timespec sec/nsec split, ns→ms divCeil, an accidental full src/ revert), which is normal for this kind of change but underscores that it is not mechanical.
Other factors
All prior inline review threads (CodeRabbit and earlier claude runs) are resolved, and the current bug-hunting pass found nothing. The new test is well-commented and has a verification table showing it fails without the fix and passes with it, but it is inherently timing-dependent (10× writeSync with sleepSync(2) gaps vs a 10 ms window) and is already skipIf(isIntelMacOS); the author has explicitly said they will tighten the burst rather than the window if a CI lane proves it flaky. The author also flagged a likely merge conflict with #30644. None of these are blockers, but together with the scope they make this a clear case for a human reviewer rather than bot approval.
Fixes #13511.
Repro
Save
script.jsin an editor (or justtouch/ rewrite it) →Hellois printed two or more times for one save.Minimal programmatic repro on Linux: two
writeFileSyncs ~2 ms apart reliably produce two reloads on 1.3.x.Cause
A single editor save emits several filesystem events spread over a few milliseconds —
open(O_TRUNC)+write()(or write + rename for atomic saves), each mirrored on the parent-directory watch. The cross-platform watcher tried to coalesce these, but:watchLoopCycle;GetQueuedCompletionStatuscalls, so it only swept what was already queued;HotReloader.Task.append, every event for the same file re-appended the same path hash; once the fixed 8-slot buffer filled it calledenqueue()mid-onFileUpdate, letting the JS thread start a reload while the watcher thread was still appending. Thewhileloop inTask.runthen converted the laterpending_countincrements into a second reload for the same save.Each extra
onFileUpdate→Task.enqueue→vm.reload()→ full re-evaluation of the entry point.Fix
max_countminimum-size events) and an iteration cap so a continuously-written file can't starve the loop. In the common case the loop exits one coalesce interval after the last event in the burst.BUN_INOTIFY_COALESCE_INTERVALdefault raised from100_000ns to10_000_000ns.HotReloader.Task.append: skip hashes already in the buffer, so a burst of events for one file never triggers the mid-onFileUpdateflush. (VirtualMachine.reloadignores the hash list anyway; only bake's dev server consumes it, and dedup doesn't change its semantics.)The
should not remap against a stale sourcemap after a partial-file reloadtest is pinned to the old 0.1 ms interval viaBUN_INOTIFY_COALESCE_INTERVAL: it deliberately relies on the self-write event being dispatched before the rejection is reported, which the new window absorbs.Verification
New test
coalesces a burst of writes into a single reloadintest/cli/hot/hot.test.ts: spawnsbun --hot, does 10 ×writeSyncwithsleepSync(2)between them, and asserts exactly one reload.USE_SYSTEM_BUN=1 bun test(1.3.14)evals: [1, 2, 3, …]bun bdwithoutsrc/changesbun bdwith fixevals: [1, 2]Full
test/cli/hot/hot.test.tspasses (the pre-existingshould not hot reload when a random file is writtenflake on slow ASAN debug builds is unchanged frommain).zig:check-allis green across all targets.