Skip to content

watcher: coalesce per-save event bursts into a single --hot reload#30617

Open
robobun wants to merge 12 commits into
mainfrom
farm/1f3aa35a/hot-coalesce-watcher-events
Open

watcher: coalesce per-save event bursts into a single --hot reload#30617
robobun wants to merge 12 commits into
mainfrom
farm/1f3aa35a/hot-coalesce-watcher-events

Conversation

@robobun

@robobun robobun commented May 13, 2026

Copy link
Copy Markdown
Collaborator

Fixes #13511.

Repro

// script.js
console.log("Hello");
bun --hot script.js

Save script.js in an editor (or just touch / rewrite it) → Hello is printed two or more times for one save.

Minimal programmatic repro on Linux: two writeFileSyncs ~2 ms apart reliably produce two reloads on 1.3.x.

Cause

A single editor save emits several filesystem events spread over a few milliseconds — open(O_TRUNC) + write() (or write + rename for atomic saves), each mirrored on the parent-directory watch. The cross-platform watcher tried to coalesce these, but:

  • the inotify/kqueue coalesce window was 0.1 ms and performed at most one extra read, so any events arriving after that spilled into the next watchLoopCycle;
  • the Windows watcher used a 0 ms timeout for subsequent GetQueuedCompletionStatus calls, so it only swept what was already queued;
  • in HotReloader.Task.append, every event for the same file re-appended the same path hash; once the fixed 8-slot buffer filled it called enqueue() mid-onFileUpdate, letting the JS thread start a reload while the watcher thread was still appending. The while loop in Task.run then converted the later pending_count increments into a second reload for the same save.

Each extra onFileUpdateTask.enqueuevm.reload() → full re-evaluation of the entry point.

Fix

  • INotifyWatcher / KEventWatcher / WindowsWatcher: after the first read, keep draining until the queue stays quiet for ~10 ms, bounded by byte count (≤ max_count minimum-size events) and an iteration cap so a continuously-written file can't starve the loop. In the common case the loop exits one coalesce interval after the last event in the burst. BUN_INOTIFY_COALESCE_INTERVAL default raised from 100_000 ns to 10_000_000 ns.
  • HotReloader.Task.append: skip hashes already in the buffer, so a burst of events for one file never triggers the mid-onFileUpdate flush. (VirtualMachine.reload ignores the hash list anyway; only bake's dev server consumes it, and dedup doesn't change its semantics.)

The should not remap against a stale sourcemap after a partial-file reload test is pinned to the old 0.1 ms interval via BUN_INOTIFY_COALESCE_INTERVAL: it deliberately relies on the self-write event being dispatched before the rejection is reported, which the new window absorbs.

Verification

New test coalesces a burst of writes into a single reload in test/cli/hot/hot.test.ts: spawns bun --hot, does 10 × writeSync with sleepSync(2) between them, and asserts exactly one reload.

build result
USE_SYSTEM_BUN=1 bun test (1.3.14) 20/20 failevals: [1, 2, 3, …]
bun bd without src/ changes 50/50 fail
bun bd with fix 50/50 passevals: [1, 2]

Full test/cli/hot/hot.test.ts passes (the pre-existing should not hot reload when a random file is written flake on slow ASAN debug builds is unchanged from main). zig:check-all is green across all targets.

@robobun

robobun commented May 13, 2026

Copy link
Copy Markdown
Collaborator Author

@github-actions

Copy link
Copy Markdown
Contributor

Found 1 issue this PR may fix:

  1. Watch mode log two times errors in the console #10463 - Reports errors appearing twice in --watch mode on save, plausibly caused by multiple filesystem events from a single editor save each triggering a separate re-execution

If this is helpful, copy the block below into the PR description to auto-close this issue on merge.

Fixes #10463

🤖 Generated with Claude Code

@coderabbitai

coderabbitai Bot commented May 13, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Linux, macOS, and Windows file watchers now coalesce rapid event bursts using a configurable quiet-period and bounded drain iterations (default 10ms). Hot reloader deduplicates pending file-hash IDs. Tests add a regression asserting a burst produces a single reload.

Changes

File Event Coalescing Across Watchers

Layer / File(s) Summary
Coalescing Configuration Defaults
src/bun_core/env_var.zig, src/watcher/INotifyWatcher.zig
Default coalescing interval changed to 10ms (BUN_INOTIFY_COALESCE_INTERVAL) and default_coalesce_interval_ns exported as the inotify fallback.
Linux inotify Event Coalescing
src/watcher/INotifyWatcher.zig
init reads the env-var; read() replaces threshold-based double-read with a bounded ppoll + non-blocking read drain loop, handling EAGAIN/EINTR and capping iterations.
macOS kqueue Event Coalescing
src/watcher/KEventWatcher.zig
Adds coalesce_interval_ns, reads env-var in init, and replaces the single extra kevent call with a bounded loop that repeatedly calls kevent using a quiet timeout until quiet or iteration cap.
Windows Event Coalescing Loop
src/watcher/WindowsWatcher.zig
Adds coalesce_interval_ms, converts nanoseconds→milliseconds with round-up in init, changes next to accept timeout_ms, and uses an INFINITE first wait then bounded coalesce sweeps capped by iterations.
Hot Reloader Hash Deduplication
src/jsc/hot_reloader.zig
Task.append returns early if the incoming file-hash ID already exists in the pending hashes, preventing duplicate reload scheduling.
Test Coverage for Coalescing Behavior
test/cli/hot/hot.test.ts
Expanded imports and determinism: added a regression that writes a burst to a watched file and asserts a single hot reload; adjusted timing tests to await initial eval and allow overriding coalesce interval via env.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: coalescing filesystem watcher events into a single hot reload, which is the core objective across all modified files.
Description check ✅ Passed The description follows the template structure with both required sections (What does this PR do? and How did you verify your code works?) fully completed with detailed explanations, root cause analysis, fixes, and verification results.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

Warning

Review ran into problems

🔥 Problems

Stopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a @coderabbit review after the pipeline has finished.


Comment @coderabbitai help to get the list of available commands and usage tips.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional findings (outside current diff — PR may have been updated during review):

  • 🟡 test/cli/hot/hot.test.ts:658-662 — The BUN_INOTIFY_COALESCE_INTERVAL override here is only honored on Linux — KEventWatcher.zig hardcodes coalesce_interval_ns = 10_000_000 and WindowsWatcher.zig hardcodes Timeout.coalesce = 10, neither reads the env var. On macOS/Windows the test still runs with the new 10 ms window, so per the comment above, the self-write gets absorbed and the reject→report race this test guards never opens (the test still passes, it just stops exercising the regression). Might be worth having the kqueue/Windows watchers honor the same override, or at least noting that this race is now only meaningfully exercised on Linux.

    Extended reasoning...

    What the issue is

    The should not remap against a stale sourcemap after a partial-file reload test sets env: { ...bunEnv, BUN_INOTIFY_COALESCE_INTERVAL: "100000" } to pin the watcher's coalesce window back to 0.1 ms so the file's self-write is dispatched before the rejection is reported. However, BUN_INOTIFY_COALESCE_INTERVAL is only consumed by INotifyWatcher.init() (src/watcher/INotifyWatcher.zig:114). The macOS watcher hardcodes const coalesce_interval_ns = 10_000_000 (src/watcher/KEventWatcher.zig:12) and the Windows watcher hardcodes Timeout.coalesce = 10 (src/watcher/WindowsWatcher.zig) — neither consults the env var. So on macOS and Windows the env override is a no-op and the test runs under the new 10 ms coalesce window introduced by this PR.

    How it manifests

    The test's own comment explains why this matters: "the default 10 ms coalesce would absorb it into the next writeFull and the race under test never opens." That is exactly what now happens on macOS/Windows. The writeFileSync(__filename, "// stub") self-write triggers the watcher's 10 ms coalesce wait; during that window the rejection is printed (correctly mapped — no stale-sourcemap race), and then the watcher delivers the event(s) after the report. The reject→report window the test was written to exercise is therefore closed on those two platforms.

    Why the test still passes

    I traced driveErrorReloadCycle under the 10 ms window and the test does not fail or flake on macOS/Windows. There are two possible interleavings:

    1. The self-write and the test's subsequent writeFull(N+1) both land inside the same 10 ms coalesce window → one reload with the full N+1 content → error: N+1 prints → verifyLine passes.
    2. The self-write is delivered alone → the watcher reloads the comment-only stub, which throws nothing and emits nothing to stderr (--hot keeps the process alive via the keep-alive timer) → the next writeFull(N+1) triggers another reload → error: N+1 prints.

    In both cases driveErrorReloadCycle observes the expected error: N sequence and the assertions pass. So this is purely a test-coverage degradation, not a functional bug or a flaky test.

    Step-by-step proof

    1. On macOS, KEventWatcher.init() is called → it never reads BUN_INOTIFY_COALESCE_INTERVAL; coalesce_interval_ns stays at the file-level constant 10_000_000.
    2. Test spawns bun --hot with the env var set → the kqueue watcher ignores it.
    3. The hot module evaluates: it does writeFileSync(__filename, "// stub N") then throw new Error('N').
    4. The kqueue thread receives NOTE_WRITE, then enters the while (count < changelist_count …) loop with a 10 ms timeout, blocking dispatch.
    5. Meanwhile the JS thread reports the rejection — correctly mapped, since no second reload has retranspiled the file yet. The race the test is meant to catch (second reload overwriting source_mappings[path] between reject and report) cannot occur because the watcher is still inside its 10 ms drain.
    6. driveErrorReloadCycle sees error: N, calls writeFull(N+1), the watcher (either coalescing it with the stub write or in a fresh cycle) reloads, and the loop progresses.
    7. Test reaches reloadCounter === 20 and passes — but at no point did it actually open the stale-sourcemap window.

    Impact and fix

    The sourcemap-remapping code being guarded is platform-independent VM code, so Linux CI still catches regressions, and the env var name (INOTIFY) signals Linux-only intent — this may even be a conscious tradeoff. But the test silently degrades to a near-tautology on macOS/Windows after this PR, which is worth flagging. Two options:

    • Have KEventWatcher and WindowsWatcher honor the same env var (or a generically-named one) so the override works everywhere.
    • Add a comment on the test noting the race is only exercised on Linux.

    This is a nit — coverage concern only, no production-code impact, and the test should not flake. It should not block the PR.

@robobun

robobun commented May 13, 2026

Copy link
Copy Markdown
Collaborator Author

Addressed the coverage note in 0c76ed9: BUN_INOTIFY_COALESCE_INTERVAL is now honoured by the kqueue and Windows watcher backends too (despite the Linux-centric name), so the stale-sourcemap test's 0.1 ms override applies uniformly across platforms.

The earlier windows-x64-baseline-build-zig failure was a Zig compiler panic (reached unreachable code in Sema.handleExternLibName while analyzing src/runtime/image/backend_wic.zig, which this PR doesn't touch) — the non-baseline windows-x64-build-zig lane passed on the same commit.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/watcher/WindowsWatcher.zig`:
- Line 222: The loop in WindowsWatcher.zig uses an off-by-one condition: change
the while condition in the coalescing loop from "iterations <=
max_coalesce_iterations" to use "<" so it matches KEventWatcher and
INotifyWatcher; locate the while loop in the WindowsWatcher coalescing logic
(the loop that increments "iterations" and references "max_coalesce_iterations")
and replace the comparison operator to ensure the loop runs at most
max_coalesce_iterations times.

In `@test/cli/hot/hot.test.ts`:
- Line 335: Replace usage of tmpdirSync with the harness helper tempDir: import
tempDir from the test harness and swap const dir = tmpdirSync(); to create the
temporary directory via tempDir() (and await it if the helper is async), and
update any other occurrences in this file (e.g., the beforeEach that uses
tmpdirSync) to use tempDir as well; ensure variable names and cleanup logic
still work with the tempDir return shape.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4249892f-5e6a-4983-bdea-1a1f404e2abc

📥 Commits

Reviewing files that changed from the base of the PR and between 3bf4b33 and 0c76ed9.

📒 Files selected for processing (6)
  • src/bun_core/env_var.zig
  • src/jsc/hot_reloader.zig
  • src/watcher/INotifyWatcher.zig
  • src/watcher/KEventWatcher.zig
  • src/watcher/WindowsWatcher.zig
  • test/cli/hot/hot.test.ts

Comment thread src/watcher/WindowsWatcher.zig
Comment thread test/cli/hot/hot.test.ts
Comment thread src/watcher/KEventWatcher.zig Outdated
Comment thread test/cli/hot/hot.test.ts
Comment thread src/watcher/KEventWatcher.zig Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/watcher/KEventWatcher.zig`:
- Around line 76-85: The loop drains events unconditionally which can trap if
the initial kevent syscall returns -1; check the syscall result stored in
`extra` and handle errors before slicing with `@intCast(count)`: if `extra ==
-1` and `count == 0` return or propagate the syscall error from
`std.posix.system.kevent`, otherwise only enter the draining logic when `count >
0` (i.e., guard the slices `changelist[`@intCast`(count)..]` and the secondary
kevent call) so you never slice with a zero/invalid `count` after a failed
syscall in the `KEventWatcher` coalescing loop (variables: `iterations`,
`count`, `changelist_count`, `max_coalesce_iterations`, `extra`).

In `@src/watcher/WindowsWatcher.zig`:
- Around line 144-147: The current conversion truncates sub-millisecond positive
values to 0ms; modify the conversion of
bun.env_var.BUN_INOTIFY_COALESCE_INTERVAL.get() (ns) before assigning
this.coalesce_interval_ms so that an explicit 0 stays 0, but any positive ns is
rounded up to at least 1ms (e.g., compute ms = ns / std.time.ns_per_ms using
integer ceil or set ms = 1 when ns > 0 and computed ms == 0), then cast to
std.math.cast(w.DWORD, ms) and fall back to default_coalesce_interval_ms if the
env var is absent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d3b22350-fda9-4dde-9dd1-ec20825a0867

📥 Commits

Reviewing files that changed from the base of the PR and between 0c76ed9 and 8731480.

📒 Files selected for processing (2)
  • src/watcher/KEventWatcher.zig
  • src/watcher/WindowsWatcher.zig

Comment thread src/watcher/KEventWatcher.zig Outdated
Comment thread src/watcher/WindowsWatcher.zig Outdated

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No new issues found, but this changes the coalesce behavior of all three platform watcher backends and bumps the per-reload latency floor from ~0.1 ms to ~10 ms — worth a human sanity-check on the tradeoff and the timing-sensitive test before merge.

Extended reasoning...

Overview

The PR reworks event coalescing in all three filesystem-watcher backends (INotifyWatcher, KEventWatcher, WindowsWatcher): the single 0.1 ms extra read is replaced with a bounded drain loop that waits up to 10 ms (×32 iterations) for the queue to go quiet, the BUN_INOTIFY_COALESCE_INTERVAL default is raised 100×, and HotReloader.Task.append now dedups path hashes. A new burst-write test is added and an existing stale-sourcemap test is pinned to the old 0.1 ms window via env var.

Security risks

None identified. No auth, crypto, network, or untrusted-input parsing is involved; the change is confined to local filesystem-event timing and an internal env var.

Level of scrutiny

Moderate-to-high. The watcher backends are shared by --hot, --watch, bun test --watch, and the bake dev server, so a regression here is broadly user-visible. The change also encodes a UX tradeoff (every hot reload now waits ~10 ms after the last event before firing) and replaces a simple one-shot read with a bounded loop on three different kernel APIs — each with subtly different early-return/timer-resolution semantics that the PR thread already iterated on (kqueue iteration cap, Windows <= off-by-one, timespec ≥1 s EINVAL). That's exactly the kind of cross-platform timing logic a maintainer should eyeball.

Other factors

  • The earlier macOS hot.test.ts failures and the kqueue iteration-cap bug were addressed in 8731480; CI results for that commit aren't reflected in the timeline yet.
  • One non-blocking inline comment (timespec normalization for env-var values ≥1 s) is still open.
  • The new test is intentionally timing-dependent (10×writeSync with sleepSync(2) vs a 10 ms window); the author has acknowledged they'll tighten it if CI proves it flaky, but that's a judgment call better confirmed by a human.
  • The Timeout enum removal in WindowsWatcher.zig changes a public-ish signature (next); a quick check shows no other callers, but it's another small surface-area change.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All my earlier inline concerns are addressed and I didn't find new issues, but this reworks the coalesce loop across all three watcher backends and bumps the default coalesce window 100× — worth a human sanity-check on the latency/flake tradeoff before merge.

Extended reasoning...

Overview

This PR restructures the event-coalescing logic in all three filesystem watcher backends (INotifyWatcher.zig, KEventWatcher.zig, WindowsWatcher.zig), raises the default BUN_INOTIFY_COALESCE_INTERVAL from 0.1 ms to 10 ms, adds hash dedup in HotReloader.Task.append, adds a new timing-sensitive test, and refactors an existing test to fix a pre-existing flake. The diff is ~250 lines of behavior change plus extensive comments.

Security risks

None. This is local filesystem-watcher / dev-loop tooling with no auth, crypto, network, or untrusted-input surface.

Level of scrutiny

Moderate-to-high. The change is well-reasoned and well-documented, but it:

  • Touches three independent OS-specific syscall loops (inotify/ppoll, kqueue/kevent, IOCP/GetQueuedCompletionStatus) with subtly different semantics that were already the subject of several review rounds (iteration-cap parity, timespec normalization, negative-count guard, sub-ms rounding).
  • Changes a user-visible default by 100× — every --hot/--watch reload now waits up to ~10 ms longer for the queue to go quiet. That's almost certainly the right tradeoff vs. duplicate reloads, but it's a product decision.
  • Adds a test whose correctness depends on sleepSync(2) gaps staying under the 10 ms window across CI platforms; the author explicitly chose to keep it at the production default and tighten later if CI proves it flaky.

Other factors

All prior review feedback (mine and CodeRabbit's) has been addressed in 8731480 and e508ede; all inline threads are resolved. The latest CI run on 8731480 shows only unrelated failures (s3-storage-class, http-emit-close), so the new test hasn't been disproven on macOS/Windows yet either. Given the cross-platform surface and the acknowledged residual timing sensitivity, a maintainer should sign off rather than auto-approve.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/cli/hot/hot.test.ts`:
- Around line 440-471: The initial-output wait can hang if the child process
exits before emitting the first "[#!root]" line; replace the plain await initial
with a guarded wait that races the initial promise against the child's early
exit so the test proceeds instead of hanging. Locate the initial/sawInitial pair
and the reader that consumes runner.stdout (symbols: initial, sawInitial,
reader, runner, finished) and change the wait to something like awaiting
Promise.race between initial and a promise that resolves when runner exits (or
when finished becomes true) so the test unblocks on early child termination.
- Around line 360-371: The async IIFE reading from runner.stdout currently
swallows all errors with .catch(() => {}), which hides parser/stream failures
and causes the test to hang; update the catch to propagate the error instead
(e.g., .catch(err => { /* surface */ throw err }) or store the error into a
shared readerError promise/variable that the test checks alongside evals) so any
failure in the stdout reader (the async IIFE that manipulates buffered and
pushes into evals) will fail the test immediately rather than being silently
ignored.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 76414366-f4d7-45cf-883d-3796fcdab39b

📥 Commits

Reviewing files that changed from the base of the PR and between 575f9c3 and 487b821.

📒 Files selected for processing (1)
  • test/cli/hot/hot.test.ts

Comment thread test/cli/hot/hot.test.ts
Comment thread test/cli/hot/hot.test.ts Outdated

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any new issues, but this touches all three platform watcher backends plus a user-facing default (0.1 ms → 10 ms coalesce window, worst-case ~320 ms reload latency under continuous writes), so it's worth a human sanity-check on the timing trade-offs and the Intel-macOS test skip.

Extended reasoning...

Overview

This PR reworks event coalescing in all three filesystem-watcher backends (INotifyWatcher, KEventWatcher, WindowsWatcher) so that the burst of kernel events produced by a single editor save collapses into one onFileUpdate call, fixing #13511 where --hot re-evaluated the entry point once per event. It also adds a dedup in HotReloader.Task.append, raises the BUN_INOTIFY_COALESCE_INTERVAL default from 0.1 ms to 10 ms, adds a new timing-sensitive regression test, and adjusts two existing tests to account for the new window.

Security risks

None. The change is confined to local filesystem-watch timing and an internal env var; no auth, crypto, network, or untrusted-input surfaces are touched.

Level of scrutiny

Medium-high. The diff is not large, but it changes platform-specific syscall loops (ppoll/kevent/GetQueuedCompletionStatus) with subtle semantics (early-return on event-ready, timespec normalization, Windows timer quantization), and it shifts a default that affects every --hot/--watch user — each reload now waits up to ~10 ms (or up to ~320 ms if writes never stop) before dispatching. The PR has already been through several correction rounds (kqueue iteration cap 5→32, count > 0 guard, sec/nsec split, divCeil rounding), which suggests the edge cases here are easy to get wrong and merit human eyes.

Other factors

All three of my earlier inline findings have been addressed and resolved. The new test is inherently timing-dependent (10× writeSync with sleepSync(2) vs a 10 ms window) and is already skipIf(isIntelMacOS); the author explained why the window can't be widened without defeating the regression gate. Two unresolved CodeRabbit nits remain on the test's stdout-reader error handling — minor, but still open. Given the cross-platform blast radius and the latency/design trade-offs (10 ms default, 32-iteration cap), this is beyond what I'd auto-approve.

@robobun

robobun commented May 13, 2026

Copy link
Copy Markdown
Collaborator Author

Status: diff is green — CI red is infra/upstream noise, needs a maintainer to merge

Bug (#13511): a single editor save in --hot mode re-evaluates the entry point 2+ times on Linux/Windows, printing duplicate output.

Root cause: editors emit several filesystem events per logical save, spread over a few milliseconds. The watcher's coalesce window was 0.1 ms (inotify/kqueue) or 0 ms (Windows), so one save split across multiple watch-loop cycles → one reload each. A second path: HotReloader.Task's fixed 8-slot buffer could overflow mid-onFileUpdate and enqueue an extra reload for the same save.

Fix (both the Zig reference files and the shipping Rust port):

  • Default coalesce window 0.1 ms → 10 ms; after the first read, keep draining until the fd stays quiet for the window (capped at 32 iterations so nonstop writers can't starve the loop). BUN_INOTIFY_COALESCE_INTERVAL (ns) overrides it on all three backends; Windows rounds up to ms.
  • Dedup path hashes in HotReloader.Task.append so repeated events for one file can't overflow the buffer into a second reload.

Verification: test/cli/hot/hot.test.ts → "coalesces a burst of writes into a single reload" fails without the fix (observes 3+ evals) and passes with it (exactly 2: initial + one reload). Full hot.test.ts suite passes 13/13 locally after every rebase (four rebases as main moved: #30412, #31254, #31746, #31783).

CI history — no watcher/hot-reload test has failed on any platform in any run:

  • Build 60211 (sha 692e279): only red was bunx.test.ts "should handle package that requires node 24" (bunx --bun @angular/cli@latest, live npm registry) — failed identically on four unrelated PR builds (60208–60211) — plus two single-lane known flakes (test-retry-repeats-basic, streams-leak).
  • Build 60762 (sha 802d7e8, current): same bunx.test.ts @angular/cli@latest failure on every platform (still broken upstream), the verify-baseline step hit a 0-byte Intel SDE download (agent infra), and two unrelated single-lane flakes (terminal.test.ts, net-mongodb-pattern-leak.test.ts on darwin).

None of the failing steps touch files in this PR.

Comment thread test/cli/hot/hot.test.ts
@robobun

robobun commented May 14, 2026

Copy link
Copy Markdown
Collaborator Author

Rebased onto main (post-Rust-rewrite, resolved the hot.test.ts conflict — main already has the random-file deflake) and ported:

  • src/bun_core/env_var.rs: BUN_INOTIFY_COALESCE_INTERVAL default 100 µs → 10 ms
  • src/jsc/hot_reloader.rs: Task::append dedups the hash before pushing
  • src/watcher/INotifyWatcher.rs: DEFAULT_COALESCE_INTERVAL_NS / MAX_COALESCE_ITERATIONS constants; drain-until-quiet loop with sec/nsec split for ppoll
  • src/watcher/KEventWatcher.rs: coalesce_interval_ns field read from env var in init; iteration-capped drain loop with extra <= 0 break
  • src/watcher/WindowsWatcher.rs: coalesce_interval_ms field with ns→ms div_ceil conversion; Timeout enum replaced with raw w::DWORD; iteration-capped outer loop

rust:check-all clean on all targets. Gate verified: without the .rs change a 10-write burst triggers 4 reloads ([1,2,3,4]); with it, one ([1,2]). All 13 hot.test.ts tests pass. .zig kept as reference.

Note: this and #30644 both touch the same watcher files; whichever lands second will need a trivial merge.

@robobun robobun force-pushed the farm/1f3aa35a/hot-coalesce-watcher-events branch 2 times, most recently from 0de4f5d to cd15cac Compare May 21, 2026 09:54
@robobun robobun force-pushed the farm/1f3aa35a/hot-coalesce-watcher-events branch from 6c5f2ab to e77c065 Compare May 24, 2026 06:21

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found on this revision, but this touches all three platform watcher backends (inotify/kqueue/ReadDirectoryChangesW, in both Zig and Rust) plus cross-thread hot-reload scheduling, and changes the user-facing --hot reload latency default — worth a human look before merging.

Extended reasoning...

Overview

This PR rewrites the event-coalescing logic in all three filesystem watcher backends (INotifyWatcher, KEventWatcher, WindowsWatcher) — in both the Zig originals and the newly-ported Rust implementations — replacing a single fixed extra read with a bounded drain-until-quiet loop. It also raises the BUN_INOTIFY_COALESCE_INTERVAL default from 0.1 ms to 10 ms, adds hash deduplication in HotReloader.Task.append to prevent mid-onFileUpdate flushes, adds a timing-based regression test, and pins an existing test to the old interval via env var.

Security risks

None. No auth, crypto, network, or untrusted-input parsing surfaces are touched. The only external input is the BUN_INOTIFY_COALESCE_INTERVAL env var, which is bounds-handled (sec/nsec split, divCeil, iteration cap) and only affects local watcher timing.

Level of scrutiny

High. This is core runtime infrastructure for --hot/--watch mode, with platform-specific syscall semantics (ppoll, kevent, GetQueuedCompletionStatus) that differ subtly in early-return and timer-resolution behaviour. The change introduces a deliberate ~10 ms latency floor on every hot reload — a UX tradeoff a maintainer should sign off on. The Task.append dedup interacts with the pending_count atomic and the while-swap loop in Task.run across the watcher and JS threads. The PR has gone through several rounds of bot feedback (iteration-cap parity, count > 0 guard, timespec normalization, ms rounding) and a Rust port mid-review; a human should confirm the Zig and Rust paths remain semantically equivalent and that the new test's timing margin is acceptable for CI.

Other factors

All prior inline review threads (mine and CodeRabbit's) are resolved. The new test is skipIf(isIntelMacOS) and relies on a 2 ms write gap vs a 10 ms coalesce window, which the author has acknowledged as a deliberate margin against the production default. The author also flagged a known merge interaction with #30644. Given the breadth (11 files, 6 platforms × language combinations) and the design decision embedded in the new default, this exceeds the bar for bot-only approval.

@robobun robobun force-pushed the farm/1f3aa35a/hot-coalesce-watcher-events branch from e77c065 to 692e279 Compare June 3, 2026 23:44
robobun and others added 6 commits June 5, 2026 15:29
A single editor save typically emits several filesystem events a few
milliseconds apart (truncate+write, plus matching events on the
parent-directory watch). The watcher's coalesce window was 0.1 ms and
performed at most one extra read, so most of those events landed in
separate watch-loop cycles and --hot re-evaluated the entry point once
per cycle — the user saw their script's output repeated for one save.

- INotifyWatcher/KEventWatcher/WindowsWatcher: after the first read,
  keep draining until the queue stays quiet for ~10 ms (bounded by
  byte/iteration caps so a continuously-written file cannot starve the
  loop). Raise the default BUN_INOTIFY_COALESCE_INTERVAL to 10 ms.
- hot_reloader.Task.append: dedup by hash so the many directory-watch
  events that all name the same file don't overflow the 8-slot buffer
  and flush mid-onFileUpdate, which let the JS thread start a reload
  while the watcher thread was still appending and produced a second
  reload for the same save.

Pin the stale-sourcemap regression test to the old 0.1 ms interval: it
depends on the self-write event being dispatched before the rejection
is reported, which the new window absorbs.

Fixes #13511
The stale-sourcemap test pins the coalesce window to 0.1 ms so the
self-write event reaches the JS thread inside the reject→report
window it was written to exercise. That override was only read by the
inotify backend, so on macOS/FreeBSD/Windows the test ran with the new
10 ms window and no longer opened the race it guards.

Have the kqueue and Windows watchers read the same env var (the Windows
backend rounds to milliseconds) so the override works uniformly.
kevent() returns as soon as one event is ready rather than waiting the
full timeout, so a burst of N writes a few ms apart consumes ~N drain
iterations. With the kqueue cap at 5 the new coalesce test's 10-write
burst spilled into a second watchLoopCycle on macOS and produced a
second reload. Bump kqueue and Windows to 32 to match the inotify
backend; the quiet-timeout break still terminates the common case
after one idle interval.
- KEventWatcher/INotifyWatcher: split the coalesce interval into
  sec+nsec so a user-supplied value >= 1 s doesn't make kevent/ppoll
  fail with EINVAL (which the lenient error handling would silently
  turn into 'quiet' and disable coalescing).
- KEventWatcher: guard the drain loop with count > 0 so a failed
  initial kevent() (count == -1) doesn't trap on @intcast.
- WindowsWatcher: round the ns->ms conversion up so a sub-millisecond
  override (e.g. the 0.1 ms a test pins for the other backends)
  becomes 1 ms rather than truncating to 0 and disabling the wait.
- hot.test.ts 'should not hot reload when a random file is written':
  wait for the initial evaluation to land before starting the 200 ms
  quiet window. On debug/ASAN builds process startup alone exceeded
  200 ms, so the test asserted reloadCounter == 0 before the first
  line ever arrived.
With a non-zero coalesce interval the drain loop can race the
self-rewriting hot file in a way that loses the pending rejection
(the pre-existing edge case noted in #29740), which on release
builds manifests as a 30 s timeout. A zero interval makes the
drain loop non-blocking — it polls once and processes whatever
is already buffered — which is the closest analogue to the
pre-loop behaviour this test was tuned against, and keeps the
self-write event landing in the reject→report window on all
build profiles.
robobun and others added 6 commits June 5, 2026 15:29
Intel macOS CI runners stretch sleepSync(2) well past the 10 ms
coalesce window (timer coalescing + scheduler load), so the 10-write
burst splits into several watch-loop cycles there and the test sees
evals=[1..9] instead of [1,2]. arm64 macOS (14 and 26) and all Linux
lanes pass; the reported bug (#13511) was Windows + Linux only, and
the issue's own comments note it does not reproduce on macOS.
This reverts commit f400c3d.
The workspace now denies unused-doc-comments; doc comments on macro
invocations don't attach to the generated item.
@robobun robobun force-pushed the farm/1f3aa35a/hot-coalesce-watcher-events branch from 692e279 to 802d7e8 Compare June 5, 2026 15:35

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found in this pass, but this rewrites the coalesce loop across all three platform watcher backends (inotify/kqueue/IOCP, in both the Zig reference and the shipping Rust) and changes a user-visible timing default — worth a human look before merging.

Extended reasoning...

Overview

This PR changes how the filesystem watcher coalesces rapid event bursts so that a single editor save triggers exactly one --hot reload instead of several. It touches eleven files: the three platform watcher backends (INotifyWatcher, KEventWatcher, WindowsWatcher) in both the Zig reference and the shipping Rust port, hot_reloader.{zig,rs} (hash dedup in Task.append), env_var.{zig,rs} (default BUN_INOTIFY_COALESCE_INTERVAL 0.1 ms → 10 ms), and test/cli/hot/hot.test.ts (new regression test plus pinning an existing test to the old interval).

Security risks

None identified. The change only adjusts how long the watcher thread waits for additional kernel events after the first read and adds a linear-scan dedup over an 8-element array. No new external input is parsed, no auth/crypto/permissions surface is touched, and the env var was already user-controllable. The drain loops are bounded by both byte count and a 32-iteration cap, so a continuously-written file cannot starve the watch loop.

Level of scrutiny

This warrants human review rather than auto-approval. It is a behaviour change in a production-critical, cross-platform code path that every --hot/--watch user exercises, implemented twice (Zig + Rust) across three OS-specific syscall loops with subtly different semantics (e.g. kevent early return vs ppoll timeout, Windows ~15.6 ms timer granularity). It also embeds design decisions a maintainer should sign off on: the 10 ms default window (trades ~10 ms of reload latency for dedup), the 32-iteration cap, and reusing the Linux-named env var for all backends. The PR went through several correctness iterations during review (kqueue iteration cap 5→32, count > 0 guard, timespec sec/nsec split, ns→ms divCeil, an accidental full src/ revert), which is normal for this kind of change but underscores that it is not mechanical.

Other factors

All prior inline review threads (CodeRabbit and earlier claude runs) are resolved, and the current bug-hunting pass found nothing. The new test is well-commented and has a verification table showing it fails without the fix and passes with it, but it is inherently timing-dependent (10× writeSync with sleepSync(2) gaps vs a 10 ms window) and is already skipIf(isIntelMacOS); the author has explicitly said they will tighten the burst rather than the window if a CI lane proves it flaky. The author also flagged a likely merge conflict with #30644. None of these are blockers, but together with the scope they make this a clear case for a human reviewer rather than bot approval.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Duplicate Output Issue After Repeated Saves with bun --hot

1 participant