Fix four file-watcher crash signatures (kevent panic, dangling watch paths, unlocked Windows scan, watchFile join overflow)#31695
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (5)
WalkthroughThis PR hardens the watcher subsystem across four critical paths: ensuring all watcher paths are cloned to prevent use-after-free, handling buffer overflows gracefully in path normalization, eliminating unsafe FFI calls in kqueue polling, and protecting watchlist traversal with proper locking on Windows. A regression test validates the fixes for outside-root dependency hot reloading. ChangesWatcher Safety and Concurrency Improvements
Possibly Related PRs
Suggested Reviewers
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
|
Found 2 issues this PR may fix:
🤖 Generated with Claude Code |
|
Checked both suggested issues — neither is a confident match, so I'm not adding
|
|
Heads-up: #31697 (bake dev server crash fixes) contains the same one-line #31697 keeps its copy of the hunk because its regression test ("hot update touching multiple client component boundaries") exercises file edits under |
There was a problem hiding this comment.
All my earlier findings are addressed and nothing new surfaced, but the Windows watcher's drop/re-acquire of the non-recursive mutex mid-scan and the kevent error-propagation change are concurrency-sensitive enough (and unverifiable from the Linux env) that a human pass on the lock discipline is worth having.
Extended reasoning...
Overview
Four independent crash fixes in bun_watcher plus one regression test: (1) KEventWatcher::watch_loop_cycle switches from raw kevent(2) + usize::try_from(count).expect(...) to the EINTR-retrying bun_sys::kevent wrapper, with the coalescing re-read made best-effort and Output::flush moved to an RAII guard; (2) bundle_v2.rs flips add_file::<{cfg!(windows)}> to add_file::<true> so watcher paths are always cloned rather than borrowed from the per-build arena; (3) WindowsWatcher::watch_loop_cycle now holds this.mutex over the watchlist scan, dropping/re-acquiring around process_watch_event_batch and converting the inner for to a while that re-reads the live length; (4) StatWatcher::init switches to join_abs_string_buf_checked with a heap-scratch fallback. A new bake dev test covers (2), now skip: ["win32"] per my earlier comment.
Security risks
None identified. No auth/crypto/permissions surface; the changes harden against panics and use-after-free in the watcher thread. The new locking cannot deadlock against process_watch_event_batch because the guard is explicitly dropped first, and is never held across the blocking platform.next() (taken after it returns). The watchFile overflow fallback allocates a bounded heap scratch (cwd.len + slice.len + 3) and is not attacker-amplifiable beyond what PathLike::from_js already admits.
Level of scrutiny
Medium-high. This is production watcher/hot-reload code that runs on a dedicated thread and coordinates with the JS thread, transpiler workers, and the bundle thread via a non-recursive mutex. Fix (3) in particular changes lock scope and introduces a drop/relock pattern whose correctness rests on invariants spelled out in comments (lock not held across blocking wait; item_idx re-checked against live length after relock; documented swap-behind-cursor trade-off). Fix (1) changes error semantics on the macOS/FreeBSD path. Both are platform-gated and cannot be exercised from the Linux container the PR was verified in — they rely on the macOS/Windows CI lanes.
Other factors
I previously flagged three issues on this PR (Windows test hang, swap-remove scan-skip trade-off, stale flush-justification comment); all were addressed in 663a055 and bb488e3 and the threads are resolved. The current bug-hunting pass found nothing further. The robobun build comment shows a musl LTO link failure that is an infrastructure issue unrelated to these source changes. Given the concurrency sensitivity and the two unverifiable-here platform paths, I'm deferring rather than approving — not because I see a problem, but because mutex drop/re-acquire patterns in a hot watcher loop are exactly where a human reviewer's pass on lock discipline pays for itself.
There was a problem hiding this comment.
All three of my earlier inline concerns are addressed (win32 skip, scan-compaction trade-off documented, RAII flush guard). I'm not auto-approving because the Windows watcher lock drop/re-acquire around process_watch_event_batch and the kevent error-propagation change are concurrency-sensitive and couldn't be exercised on their target platforms from the author's environment — worth human eyes.
Extended reasoning...
Overview
Four independent crash fixes in the file-watcher stack: (1) KEventWatcher::watch_loop_cycle switches from a raw kevent(2) call whose -1 return panicked a checked usize::try_from to the EINTR-retrying bun_sys::kevent wrapper with ? propagation; (2) bundle_v2.rs flips add_file::<{cfg!(windows)}> to add_file::<true> so watcher entries always clone the path instead of borrowing arena-backed bytes that DevServer frees per build; (3) WindowsWatcher::watch_loop_cycle now holds this.mutex while scanning watchlist.items_file_path(), dropping and re-acquiring it around the mid-batch process_watch_event_batch flush (which locks the same non-recursive mutex internally), and converts the for 0..n_items loop into a while item_idx < live_len loop; (4) StatWatcher::init uses join_abs_string_buf_checked with a heap-scratch fallback so an over-long cwd-joined path creates a watcher whose stat polls fail with ENAMETOOLONG instead of panicking. A new bake regression test covers (2) on POSIX. I previously flagged three issues (Windows test hang, swap_remove-behind-cursor skip, stale flush justification comment); all three were addressed in 663a055 / bb488e3.
Security risks
None apparent. No auth/crypto/permissions surface; the changes tighten lifetime and locking discipline in an internal watcher subsystem. The path-join fallback allocates a bounded Vec (cwd.len() + slice.len() + 3) and does not expose any new untrusted-input parsing.
Level of scrutiny
High. This is core infrastructure (dev-server hot reload, --watch, fs.watchFile) and three of the four fixes are concurrency- or lifetime-sensitive. The WindowsWatcher change in particular introduces a lock → drop → callback-that-relocks → re-acquire → continue pattern with a documented best-effort trade-off; correctness depends on process_watch_event_batch being the only thing that needs the lock dropped, on platform.next() / iter.next() not blocking under the lock, and on the re-read length guard being sufficient after compaction. The KEventWatcher change alters error semantics (propagate vs. swallow) for the watch loop. The author explicitly notes that (1) and (3) target macOS/Windows code paths that could not be run in their Linux verification environment, so CI on those lanes is the first real exercise.
Other factors
- All of my earlier inline findings are resolved and the resolutions look correct in the current diff (
skip: ["win32"]is present, the swap-behind-cursor comment is in place,Output::flush_guard()replaces the trailing flush + stale comment). - The bug-hunting system found nothing new on the current revision.
- The robobun CI comment shows failures on bb488e3 followed by a
ci: retriggercommit; I can't see the post-retrigger result, so a human should confirm green CI on macOS/Windows lanes before merge. - There is a known one-line overlap with PR #31697 in
bundle_v2.rs(sameadd_file::<true>fix), noted by the author — comment-only conflict for whichever lands second.
Given the concurrency subtlety and the untested-on-target-platform caveat, this should get a human reviewer rather than an auto-approval.
|
CI status for maintainers: the diff is green on every lane that ran — 70/73 checks pass, including all build lanes on every platform (darwin rust/cpp/bun builds compile the The only 3 failures — |
- KEventWatcher: kevent(2) returning -1 (e.g. EINTR) panicked the watch loop on usize::try_from(count); use the EINTR-retrying bun_sys::kevent wrapper, treat the coalescing re-read as best-effort, and propagate real errors. - bundle_v2: watcher paths were appended borrowed (CLONE_FILE_PATH=false on POSIX). Paths from dupe_alloc's disjoint text/pretty branch (imports outside the project root, bake's ssr:-prefixed pretty paths) live in the per-build bundle arena, which DevServer frees after every build, leaving dangling watchlist entries the watcher thread later reads in on_file_update. Always clone the path into the watchlist. - WindowsWatcher: the watchlist scan in watch_loop_cycle ran without this.mutex while other threads append entries under it; MultiArrayList growth frees the old backing slab mid-scan and is_parent_or_equal dereferenced freed memory. Scan under the mutex, dropping it around process_watch_event_batch (mirrors INotifyWatcher's locked snapshot). - StatWatcher (fs.watchFile): switch the cwd-join to join_abs_string_buf_checked with a heap-scratch fallback so an over-long joined path polls stat (ENAMETOOLONG -> zeroed stats, like Node) instead of overflowing the pooled path buffer.
…n trade-off
The Windows watcher refuses outside-root paths ("will not be watched"),
so the test's dev.write would wait forever for a watch event there — and
the dangling-borrow bug it guards was POSIX-only (Windows always cloned).
Also note in WindowsWatcher::watch_loop_cycle why the scan cursor is not
rewound after a mid-batch flush compacts the watchlist via swap_remove.
cbc6890 to
cf75042
Compare
|
Updated CI status after the rebase (build 61112): all watcher-related lanes are green — the new bake regression test passes everywhere including linux-x64-asan, and the Windows/macOS lanes build and run their test suites against the changed code. The remaining failures are not from this diff:
One |
Fixes four crash signatures reported from 1.4 canary, all in or around
bun_watcher. None have Zig analogs: two are checked-Rust versions of silent Zig UB, two are port-introduced lifetime/locking gaps.1. BUN-3AVT — panic
int cast: TryFromIntError(NegOverflow)in the watcher loop (macOS/FreeBSD)Cause:
KEventWatcher::watch_loop_cyclecalled rawkevent(2)and didusize::try_from(count).expect("int cast")on its return value.keventreturns-1on failure (EINTR being the everyday case), so the watcher thread panicked. Zig's@intCast/@max(0, count)let the same-1degrade into an empty event batch.Fix: use the existing EINTR-retrying
bun_sys::keventwrapper. Real errors now propagate towatch_loop→on_errorinstead of panicking; the 0.1 ms coalescing re-read is best-effort (deliver what we already have, a persistent kqueue error resurfaces on the next blocking call).2. BUN-3EAD — SIGSEGV in
DevServer::on_file_update→append_dir/append_fileCause:
BundleV2::on_parse_task_completeappended watcher entries withCLONE_FILE_PATH = cfg!(windows), i.e. borrowed on POSIX. The borrow is only sound for paths interned process-lifetime inFilenameStore/DirnameStore.dupe_alloc's disjointtext/prettybranch (imports from outside the project root, whose pretty path starts with../, and bake'sssr:-prefixed pretty paths — i.e. every SSR-graph module) instead allocatespath.textin the per-build bundle arena (graph.heap), with an explicit comment that the consumingPathmust not outlive the build. DevServer frees that arena at the end of every build, while the watcher keeps the pointer for the process lifetime and reads it on the watcher thread on every event for that file (and for its auto-watched parent directory — theappend_dirflavor of the crash).Fix: always clone the path into the watchlist (
add_file::<true>); cloning only happens the first time a file is added (add_fileearly-returns on a hash hit).Repro:
test/bake/dev/bundle.test.ts"hot reload of a file imported from outside the project root". On an unfixed ASAN debug build the dev server dies withAddressSanitizer: SEGV ... READ memory accesson the FileWatcher thread, 3/3 runs (a few rebuilds first so the freed arena pages get recycled); fixed build hot-reloads the file, 3/3 runs.3. BUN-3D64 — SIGSEGV in
is_parent_or_equalfrom the Windows watcherCause:
windows_watcher::watch_loop_cyclescannedwatchlist.items_file_path()withoutthis.mutex, while the JS thread / transpiler workers / bundle thread append entries under that mutex.MultiArrayList::set_capacityfrees the old backing slab on growth, so the unlocked scan could read a freed slab and handis_parent_or_equala garbageCowpointer.INotifyWatcheralready carries the locked-snapshot divergence for exactly this race (its PORT NOTE documents the realloc crash); the Windows backend never got it. Zig scanned unlocked too — same race, but Zig's column snapshot + interned strings made it quieter.Fix: hold
this.mutexfor the watchlist scan, dropping/re-acquiring it aroundprocess_watch_event_batch(which locks the same non-recursive mutex internally). The lock is never held across a blocking wait:platform.next()(GetQueuedCompletionStatus) runs before the guard is taken, anditer.next()only parses the already-filled completion buffer. The item index is re-checked against the live length after relock sinceon_file_updatecan evict + compact.4. BUN-3C2V — panic in
sliceinsideNodeFS::watch_fileCause:
StatWatcher::init(fs.watchFile) joined the user path with the cwd via the uncheckedjoin_abs_string_bufinto a fixed pooledPathBuffer; when the joined result exceeds the buffer the join panics on a slice write. The siblingFSWatcher::init(fs.watch) was already hardened tojoin_abs_string_buf_checked+ ENAMETOOLONG for exactly this shape; watchFile was missed. (Currentfs.watchFilepre-resolves in JS and byte-validates inPathLike::from_js, which front-runs the overflow for the inputs I could construct — so this one is belt-and-braces at the exact panic-capable site rather than a reproducible-on-HEAD fix.)Fix: use the checked join; on overflow fall back to a heap scratch (sized like
join_abs_string_buf_checked's slow path) so the watcher is still created and its stat polls fail with ENAMETOOLONG → the listener observes zeroed stats, matching Node (which never throws for un-stat-able watchFile paths, verified against v24).Verification
test/bake/dev/bundle.test.ts -t "outside the project root": fails 3/3 (ASAN SEGV on the watcher thread) withsrc/stashed, passes 3/3 with the fix.test/bake/dev/bundle.test.ts(21),test/bake/dev/hot.test.ts(9),test/js/node/watch/fs.watchFile.test.ts(7+2 skip) pass. (fs.watch.test.ts's two EACCES tests fail in this container because it runs as root — pre-existing, unrelated.)cargo check/clippy -p bun_watcherclean for linux, darwin and windows-msvc targets;clippy -p bun_bundler -p bun_runtimeclean.