Skip to content

fs: reuse DirEntry in bustEntriesCache instead of orphaning it#29919

Open
robobun wants to merge 2 commits into
mainfrom
farm/cb4c44e1/fix-bustentriescache-leak
Open

fs: reuse DirEntry in bustEntriesCache instead of orphaning it#29919
robobun wants to merge 2 commits into
mainfrom
farm/cb4c44e1/fix-bustentriescache-leak

Conversation

@robobun

@robobun robobun commented Apr 29, 2026

Copy link
Copy Markdown
Collaborator

What

RealFS::bust_entries_cache() leaked a heap DirEntry (plus its EntryMap) per call and forced append-only growth of EntryStore/FilenameStore. Triggered on every HMR/watch file change and every FileSystemRouter.reload().

Repro

const router = new Bun.FileSystemRouter({ dir, style: "nextjs", fileExtensions: [".tsx"] });
for (let i = 0; i < 500; i++) router.reload();

4 subdirs × 20 files → ~50 MB RSS growth over 500 reloads.

Cause

BSSMap::remove() only drops the hash→index mapping. The backing slot — and the EntriesOption::Entries(&mut DirEntry) it holds — stays, orphaned. On the next get_or_put() the key is gone so .index = Unassigned, at_index() returns None, in_place = None, and read_directory_with_iterator / dir_info_cached_maybe_log allocate a fresh DirEntry and consume a fresh BSSMap slot. The old DirEntry (with its EntryMap hashmap) is never freed, and because prev_map is None every file in the directory gets a new Entry appended to EntryStore and (for long names) new strings appended to FilenameStore.

Fix

Don't remove the key. Flag the existing DirEntry as stale and let the next read take the in_place path that already exists for the generation-bump case — it reuses the DirEntry allocation, passes the old EntryMap as prev_map so Entry structs and filename strings are reused, and writes back to the same BSSMap slot. Err / NotFound entries (no DirEntry) still fall back to remove().

Since the orphaned slot was previously never written again (accidental RCU), callers that captured a *DirEntry and read its EntryMap across calls that could now trigger an in_place rewrite need explicit synchronization:

  • hot_reloader::on_file_update: hold entries_mutex around the dir_ent.entries reads (covers both within-event and cross-event races with a concurrent JS-thread resolve); bust after releasing the lock since bust_entries_cache takes it internally.
  • RouteLoader::load / FileSystemRouter::bust_dir_cache_recursive: snapshot subdir *Entry pointers (stable in the append-only EntryStore) before recursing, so read_dir_info(child) can safely rewrite the parent.
  • entries_at: carry over the previous DirEntry.fd through the refresh so DirInfo::get_file_descriptor keeps working.

Verification

RSS growth over 500 reload()s
before ~50 MB
after ~8 MB

(The residual is dir_cache.remove()DirInfo slots — a separate, smaller leak not addressed here because recycling those slots would invalidate DirInfo.parent index references held by children.)

All 21 filesystem_router.test.ts tests pass, including reload() works with new dirs/files which verifies the stale flag still forces a disk re-read.


Rebase note: this PR was originally implemented against the Zig resolver and re-targeted at the Rust port after #30412. The Rust reload() already deep-copies extensions/asset_prefix_path via .clone(), and the resolver sites already pattern-match on EntriesOption::Entries(_), so those follow-ups from the Zig review history are not needed here.

@coderabbitai

coderabbitai Bot commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Directory-entry caching gains a stale flag so listings can be invalidated without evicting allocations; cache invalidation, reuse, and rebuild logic now honor stale. Router reload deep-copies extensions and asset_prefix_path into the reload arena. A regression test verifies reload does not leak directory-entry caches.

Changes

Directory-entry stale invalidation + Router reload memory fix

Layer / File(s) Summary
Data Shape
src/fs.zig
Added stale: bool = false to FileSystem.DirEntry.
Cache Invalidation
src/fs.zig
RealFS.bustEntriesCache marks cached .entries variants as entry.entries.stale = true instead of evicting the cache slot; non-.entries cases still remove the cache mapping.
Cache Reuse / Refresh
src/fs.zig
RealFS.entriesAt and readDirectoryWithIterator treat .entries cache hits as misses when entries.stale is true or generation is older, forcing an in-place rebuild; the newly built DirEntry preserves entries.fd from the previous cached entry.
Resolver Integration
src/resolver/resolver.zig
dirInfoForResolution and dirInfoCachedMaybeLog now require !cached_entry.entries.stale in addition to generation checks before reusing cached directory listings; stale entries trigger in-place update paths.
Hot Reloader Ordering
src/bun.js/hot_reloader.zig
Watcher thread reads dir_ent.entries under rfs.entries_mutex and moves ctx.bustDirCache(...) to run after releasing the lock to avoid races with in-place updates.
Router Reload Memory Handling
src/bun.js/api/filesystem_router.zig
FileSystemRouter.reload allocates a fresh []string in the reload arena and deep-copies each extension string’s bytes; asset_prefix_path is duplicated into the reload arena rather than reusing the old arena pointer.
Tests
test/js/bun/util/filesystem_router.test.ts
Added regression test reload() should not leak directory entry caches that spawns a subprocess, runs repeated FileSystemRouter.reload() with forced GC, and asserts route count, RSS delta, and empty stderr.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: reusing DirEntry in bustEntriesCache instead of orphaning it, which is the core fix in the PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description comprehensively covers the issue, root cause, fix, and verification with detailed tables and commit history.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@robobun

robobun commented Apr 29, 2026

Copy link
Copy Markdown
Collaborator Author

@github-actions

Copy link
Copy Markdown
Contributor

Found 4 issues this PR may fix:

  1. Memory Leak running Typescript server on file reload #15857 - Memory leak on file reload: RSS grows ~2-20MB per save, matching the orphaned DirEntry pattern fixed here
  2. bun --hot always leaks memory #11083 - bun --hot always leaks memory: RSS grows monotonically on each reload, consistent with bustEntriesCache leaking DirEntry slots
  3. Memory Not Freed After Running bun --hot Command #14734 - Memory not freed after bun --hot: same pattern of growing memory per file save under hot reload
  4. Intermittent Errors with Bun.build and Bun.FileSystemRouter when importing absolute paths #9517 - Intermittent "Unexpected reading file" errors with Bun.FileSystemRouter + Bun.build(): stale/corrupted directory entry state could cause these intermittent file-reading failures

If this is helpful, copy the block below into the PR description to auto-close these issues on merge.

Fixes #15857
Fixes #11083
Fixes #14734
Fixes #9517

🤖 Generated with Claude Code

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix looks correct and well-reasoned to me, but since it changes cache-invalidation semantics in the resolver/fs directory cache (shared by HMR, watch, bundler, and install), I'd like a human to confirm the stale-flag approach has no edge cases I'm missing.

Extended reasoning...

Overview

This PR fixes a memory leak in RealFS.bustEntriesCache() by replacing entries.remove(path) with a stale flag on the existing DirEntry, so the next read takes the pre-existing in_place re-scan path (reusing the DirEntry heap allocation, EntryMap, and Entry/FilenameStore slots) instead of orphaning them and allocating fresh ones. It threads the stale check through the three cache-hit sites (fs.zig:entriesAt, fs.zig:readDirectoryWithIterator, and two spots in resolver.zig). It also fixes a real use-after-free in FileSystemRouter.reload() where extension strings were shallow-duped from an arena that gets freed in the same call. A new memory-growth regression test is added.

Security risks

None. This is internal cache lifecycle management with no auth/crypto/input-handling implications.

Level of scrutiny

High. bustEntriesCache is called via Resolver.bustDirCache, which is on the hot path for HMR, --watch, --hot, bun install, and FileSystemRouter.reload(). The change is semantically equivalent to the existing generation-bump path (the in_place reuse logic already exists and is exercised), and I verified that all three re-scan sites overwrite the DirEntry via ptr.* = new_entry (from DirEntry.init()), which resets stale back to false. The .err/NotFound fallback to remove() is preserved. Still, this is core resolver/fs caching shared across every build/resolve operation on every platform, so a subtle regression here would have broad impact.

Other factors

  • The UAF fix in filesystem_router.zig (deep-copying extension strings into the new arena) is straightforward and correct.
  • The new test follows the pattern of the adjacent MatchedRoute.params does not leak test (subprocess + RSS delta), with a generous 32 MB threshold vs. the ~1 MB observed post-fix.
  • Existing test reload() works with new dirs/files covers the functional correctness of the stale flag (directory is actually re-read from disk).
  • Only a nit was flagged (explicit test timeout, which follows local file precedent).
  • No CODEOWNERS cover these files.

Comment thread test/js/bun/util/filesystem_router.test.ts
@robobun

robobun commented May 1, 2026

Copy link
Copy Markdown
Collaborator Author

Heads-up: the filesystem_router.zig drive-by here (deep-copy extensions on reload()) is superseded by #30070, which also fixes asset_prefix_path (same defect — not copied at all) and adds a dedicated regression test. When rebasing this PR to resolve the current conflicts, that hunk can be dropped.

@robobun robobun force-pushed the farm/cb4c44e1/fix-bustentriescache-leak branch from bcd3e71 to ba8181e Compare May 1, 2026 19:41
@robobun

robobun commented May 1, 2026

Copy link
Copy Markdown
Collaborator Author

Status

Latest: f3f83c30 — rebased onto current main (27d5181a). Rust implementation, single fix commit + autofix-ci rustfmt.

Rebase notes:

  • Main's 8553428 independently added the same entries_mutex lock to bust_entries_cache that this fix relies on (convergent change); resolved by merging both comment rationales and keeping the stale-flag body.
  • Everything else merged cleanly, including the test file.

Fix (.rs only)

  • src/resolver/fs.rs: DirEntry.stale: bool
  • src/resolver/lib.rs: bust_entries_cache sets stale instead of removing; entries_at / read_directory_with_iterator honor it; entries_at preserves fd
  • src/resolver/resolver.rs: both in_place sites check !stale
  • src/jsc/hot_reloader.rs: entries_mutex held around dir_ent.entries reads; bust after release
  • src/router/lib.rs + src/runtime/api/filesystem_router.rs: snapshot subdir *Entry pointers before recursing

Verification (400 reloads, 4 dirs × ~100 entries)

RSS growth
without fix ~65+ MB, eventually aborts (BSSMap overflow list exhausted)
with fix ~17 MB

Re-ran test/js/bun/util/filesystem_router.test.ts after this rebase: 23/24 pass including my leak test, and MatchedRoute.params does not leak (previously failing on plain main in debug builds) now passes — that main regression has been fixed. The single local failure is #31221's hash-collision test timing out on my slow debug box only: its pure-JS 600k-iteration wyhash search alone takes ~6.4 s there, past the 5 s default timeout, before any router code runs. It passes in CI.

Notes

  • Deferred follow-ups (pre-existing): entries_at writer-writer locking, FrameworkRouter.scanInner snapshot, closing old DirEntry.fd at the other in_place sites under store_fd.
  • Previous full CI run on this diff (build 56945) was green: 242 jobs, 0 test failures. The later run (57877) was red only on the main-wide Windows Bun.Transpiler stack overflows test breakage, which hit every PR build at the time.

CI on the rebased push (build 61139)

The only error-level failure is test/cli/install/bunx.test.ts ("should handle package that requires node 24"), which fails identically on every current PR build (61098, 61103, 61104, 61130, 61134, 61135, 61138, ...) — it depends on a live npm package's engine requirements, so it's main-wide breakage, not this diff. One flaky-then-passed retry: test/cli/hot/hot.test.ts "deleted and rewritten" got 4 reloads instead of 3 once on debian-asan. Since this PR touches the hot reloader I checked it specifically: the test passes 8/8 locally under the ASAN debug build of this branch, the same file flakes on unrelated PR builds (61096, 61117, 61127), and it has a history of deflake commits (a04817c, 1e75cd5). The extra-reload mode is fs-event timing (delete and rewrite landing in separate watcher wakeups on a slow machine), which predates this change.

No unresolved review threads; claude[bot]'s latest pass reports no new issues and defers to human sign-off on the locking invariants. The diff itself is green — ready for human review/merge.

@robobun robobun force-pushed the farm/cb4c44e1/fix-bustentriescache-leak branch from ba8181e to 72b3853 Compare May 2, 2026 00:12

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/resolver/resolver.zig (1)

2972-2978: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Guard the union tag before reading cached_entry.entries.

This stale-aware branch is missing the cached_entry.* == .entries check that Line 2266 already has. If a prior lookup cached .err, Line 2973 will read the wrong tagged-union field and trip a safety check instead of falling back to the in-place refresh path.

Suggested fix
             if (rfs.entries.atIndex(cached_dir_entry_result.index)) |cached_entry| {
-                if (!cached_entry.entries.stale and cached_entry.entries.generation >= r.generation) {
-                    dir_entries_option = cached_entry;
-                    needs_iter = false;
-                } else {
-                    in_place = cached_entry.entries;
+                if (cached_entry.* == .entries) {
+                    if (!cached_entry.entries.stale and cached_entry.entries.generation >= r.generation) {
+                        dir_entries_option = cached_entry;
+                        needs_iter = false;
+                    } else {
+                        in_place = cached_entry.entries;
+                    }
                 }
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/resolver/resolver.zig` around lines 2972 - 2978, The code reads
cached_entry.entries without checking the union tag, which can panic if
cached_entry contains .err; update the branch around
rfs.entries.atIndex(cached_dir_entry_result.index) to first test the tag
(cached_entry.* == .entries) before accessing cached_entry.entries: when the tag
is .entries and not stale and generation >= r.generation set dir_entries_option
and needs_iter=false; when the tag is .entries but stale set in_place =
cached_entry.entries; otherwise (tag != .entries) treat it as a cache
miss/fallback and proceed with the in-place refresh path (do not read
cached_entry.entries).
src/fs.zig (1)

621-643: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve the cached directory FD when refreshing a stale entry.

Now that Line 621 routes invalidations through entriesAt(), this helper also needs the same FD lifecycle as readDirectoryWithIterator(). Today it always calls readdir(false, ...) and always closes handle, so refreshing a stale DirEntry that already had fd set will overwrite it with .invalid at Line 643 and leak the old descriptor. That breaks callers that rely on entries.fd after invalidation and turns repeated busts into FD churn.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/fs.zig` around lines 621 - 643, When refreshing a stale DirEntry you must
preserve any previously-set directory FD instead of overwriting and leaking it:
if existing.entries.fd is valid, avoid closing or discarding that descriptor
when calling bun.openDirForIteration() / readdir(), and after readdir returns
ensure the new_entry retains the original FD (assign new_entry.fd =
existing.entries.fd when readdir did not produce a new FD) before you clear/free
existing.entries.data and overwrite existing.entries.*; only close the handle if
you opened a fresh descriptor for this refresh and did not reuse the
existing.entries.fd.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/bun.js/api/filesystem_router.zig`:
- Around line 265-268: The new Router is being initialized with
.asset_prefix_path pointing at this.router.config.asset_prefix_path which lives
in the old arena that is freed right after reload; update the Router.init call
to deep-copy the asset prefix by duplicating
this.router.config.asset_prefix_path via the allocator (similar to how .dir is
created with allocator.dupe) so Router.config.asset_prefix_path owns its own
memory and won't dangle when the old arena is freed; ensure you use
allocator.dupe(u8, this.router.config.asset_prefix_path) (or equivalent) and
handle errors consistently with the existing catch/unreachable pattern.

In `@src/fs.zig`:
- Around line 783-801: bustEntriesCache currently mutates rfs.entries (setting
entry.entries.stale and calling rfs.entries.remove) without taking
entries_mutex, racing with readDirectoryWithIterator and dirInfoCachedMaybeLog
which expect the map to be mutex-protected; fix by either acquiring
entries_mutex at the start of bustEntriesCache and releasing it after the
mutation, or require/assert that the caller holds entries_mutex (e.g., add an
assertion using entries_mutex.isHeld() or similar) and document that contract;
ensure the lock covers both the lookup (rfs.entries.get) and the subsequent
mutation/removal to prevent races on entry.entries.stale and
rfs.entries.remove().

In `@test/js/bun/util/filesystem_router.test.ts`:
- Around line 613-628: The test currently asserts stderr is exactly empty
(expect(stderr).toBe("")) but debug/ASAN builds can emit a known warning from
the spawned process; update the assertion in the test that spawns proc (using
Bun.spawn with bunExe() and bunEnv) to either remove the strict stderr equality
check or filter out the ASAN warning string (e.g., lines containing "WARNING:
ASAN interferes with JSC signal handlers" or similar) from stderr before
asserting the remainder is empty so unrelated ASAN noise doesn't fail the
leak/regression assertions for proc, stdout, stderr, and exitCode.

---

Outside diff comments:
In `@src/fs.zig`:
- Around line 621-643: When refreshing a stale DirEntry you must preserve any
previously-set directory FD instead of overwriting and leaking it: if
existing.entries.fd is valid, avoid closing or discarding that descriptor when
calling bun.openDirForIteration() / readdir(), and after readdir returns ensure
the new_entry retains the original FD (assign new_entry.fd = existing.entries.fd
when readdir did not produce a new FD) before you clear/free
existing.entries.data and overwrite existing.entries.*; only close the handle if
you opened a fresh descriptor for this refresh and did not reuse the
existing.entries.fd.

In `@src/resolver/resolver.zig`:
- Around line 2972-2978: The code reads cached_entry.entries without checking
the union tag, which can panic if cached_entry contains .err; update the branch
around rfs.entries.atIndex(cached_dir_entry_result.index) to first test the tag
(cached_entry.* == .entries) before accessing cached_entry.entries: when the tag
is .entries and not stale and generation >= r.generation set dir_entries_option
and needs_iter=false; when the tag is .entries but stale set in_place =
cached_entry.entries; otherwise (tag != .entries) treat it as a cache
miss/fallback and proceed with the in-place refresh path (do not read
cached_entry.entries).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 282efbb6-841f-47da-8d06-959d3431a34e

📥 Commits

Reviewing files that changed from the base of the PR and between 1b82e1d and 72b3853.

📒 Files selected for processing (4)
  • src/bun.js/api/filesystem_router.zig
  • src/fs.zig
  • src/resolver/resolver.zig
  • test/js/bun/util/filesystem_router.test.ts

Comment thread src/bun.js/api/filesystem_router.zig
Comment thread src/resolver/fs.zig
Comment thread test/js/bun/util/filesystem_router.test.ts

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/fs.zig (2)

621-644: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Refresh-to-error still drops the reused DirEntry.

Both stale refresh branches clear the old EntryMap and then route failures through readDirectoryError(). If that replaces the cache slot with .err/NotFound, the heap *DirEntry is no longer referenced, so delete/recreate or permission-flap loops can still grow EntryStore/FilenameStore again on the next successful read.

Also applies to: 1095-1139

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/fs.zig` around lines 621 - 644, The bug is that
existing.entries.data.clearAndFree(...) is called before calling
readDirectoryError() and before the new DirEntry is successfully assigned, which
can drop the only heap reference to the old DirEntry when readDirectoryError()
replaces the cache with an error, allowing resource-growth on subsequent reads;
fix by deferring clearing/freeing of existing.entries.data until after new_entry
is successfully obtained and assigned (i.e., remove the pre-error clearAndFree
calls and only call existing.entries.data.clearAndFree(bun.default_allocator)
after existing.entries.* = new_entry), and apply the same change to the other
stale-refresh branch (lines ~1095-1139) so that the old DirEntry remains
referenced if readDirectoryError() winds up writing an error into the cache.

618-644: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Lock entriesAt() while it rewrites cached entries.

bustEntriesCache() now serializes .stale writes with entries_mutex, but entriesAt() still clears existing.entries.data and overwrites existing.entries.* without taking that mutex. That leaves the stale-refresh path open to lost invalidations and concurrent cache mutation. If callers are supposed to hold the lock already, please assert/document that contract here instead of leaving it implicit.

🔒 Suggested fix
 pub fn entriesAt(this: *RealFS, index: allocators.IndexType, generation: bun.Generation) ?*EntriesOption {
+    this.entries_mutex.lock();
+    defer this.entries_mutex.unlock();
+
     var existing = this.entries.atIndex(index) orelse return null;
     if (existing.* == .entries) {
         if (existing.entries.stale or existing.entries.generation < generation) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/fs.zig` around lines 618 - 644, entriesAt() updates cached entries
(clearing existing.entries.data and writing existing.entries.*) without holding
entries_mutex, which races with bustEntriesCache() that serializes .stale
writes; fix by taking the same entries_mutex around the stale-refresh section in
entriesAt() (or assert the lock is already held) so cache mutation is
serialized: acquire entries_mutex before calling bun.openDirForIteration/readdir
and before clearAndFree/assignment, release after assigning new_entry, and/or
add an explicit debug assert in entriesAt() if the locking contract is that
callers must hold entries_mutex (referencing entriesAt, bustEntriesCache,
entries_mutex, existing.entries.data and existing.entries.*).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/fs.zig`:
- Around line 621-644: The bug is that existing.entries.data.clearAndFree(...)
is called before calling readDirectoryError() and before the new DirEntry is
successfully assigned, which can drop the only heap reference to the old
DirEntry when readDirectoryError() replaces the cache with an error, allowing
resource-growth on subsequent reads; fix by deferring clearing/freeing of
existing.entries.data until after new_entry is successfully obtained and
assigned (i.e., remove the pre-error clearAndFree calls and only call
existing.entries.data.clearAndFree(bun.default_allocator) after
existing.entries.* = new_entry), and apply the same change to the other
stale-refresh branch (lines ~1095-1139) so that the old DirEntry remains
referenced if readDirectoryError() winds up writing an error into the cache.
- Around line 618-644: entriesAt() updates cached entries (clearing
existing.entries.data and writing existing.entries.*) without holding
entries_mutex, which races with bustEntriesCache() that serializes .stale
writes; fix by taking the same entries_mutex around the stale-refresh section in
entriesAt() (or assert the lock is already held) so cache mutation is
serialized: acquire entries_mutex before calling bun.openDirForIteration/readdir
and before clearAndFree/assignment, release after assigning new_entry, and/or
add an explicit debug assert in entriesAt() if the locking contract is that
callers must hold entries_mutex (referencing entriesAt, bustEntriesCache,
entries_mutex, existing.entries.data and existing.entries.*).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a4a7b9b2-b800-452d-a1cd-069f268a9cad

📥 Commits

Reviewing files that changed from the base of the PR and between 72b3853 and 81a1dc8.

📒 Files selected for processing (2)
  • src/bun.js/api/filesystem_router.zig
  • src/fs.zig

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/fs.zig (1)

621-648: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Don't free the cached directory path on a failed in-place refresh.

These two refresh paths now pass existing.entries.dir / existing.dir back into readdir() for reuse, but readdir() still does errdefer dir.deinit(...). If the rescan fails, that cleanup frees the borrowed path, and the subsequent readDirectoryError(dir, err) runs with a dangling slice while the cached DirEntry still points at freed memory. Since bustEntriesCache() now invalidates by setting stale, transient watch/HMR read failures will hit this path regularly.

🛠️ Suggested fix
         fn readdir(
             fs: *RealFS,
             store_fd: bool,
             prev_map: ?*DirEntry.EntryMap,
             _dir: string,
             generation: bun.Generation,
             handle: std.fs.Dir,
             comptime Iterator: type,
             iterator: Iterator,
         ) !DirEntry {
             _ = fs;

             var iter = bun.iterateDir(.fromStdDir(handle));
             var dir = DirEntry.init(_dir, generation);
             const allocator = bun.default_allocator;
-            errdefer dir.deinit(allocator);
+            errdefer {
+                dir.data.deinit(allocator);
+                if (prev_map == null) allocator.free(dir.dir);
+            }

             if (store_fd) {
                 FileSystem.setMaxFd(handle.fd);
                 dir.fd = .fromStdDir(handle);
             }

As per coding guidelines, "In Zig code, be careful with allocators and use defer for cleanup."

Also applies to: 1098-1151

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/fs.zig` around lines 621 - 648, The in-place refresh path calls
readdir(...) with an owned dir slice borrowed from existing.entries.dir but
still lets readdir() errdefer dir.deinit(...) free that slice on error, leaving
existing.entries pointing at freed memory; update the error paths so you do not
free the cached path when readdir fails: when calling readdir from the refresh
branches (the block using existing.entries.dir / existing.dir and function
readdir), remove or conditionally skip the cleanup that deallocates the borrowed
dir slice on error (i.e., avoid calling existing.entries.data.clearAndFree or
dir.deinit/free of the cached path before calling readDirectoryError), and
instead ensure readDirectoryError receives a valid borrowed slice or clone the
path into a temporary owned buffer before calling readdir so cleanup is safe;
apply the same fix to the other refresh region (around lines 1098–1151)
referencing the same symbols.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/fs.zig`:
- Around line 621-648: The in-place refresh path calls readdir(...) with an
owned dir slice borrowed from existing.entries.dir but still lets readdir()
errdefer dir.deinit(...) free that slice on error, leaving existing.entries
pointing at freed memory; update the error paths so you do not free the cached
path when readdir fails: when calling readdir from the refresh branches (the
block using existing.entries.dir / existing.dir and function readdir), remove or
conditionally skip the cleanup that deallocates the borrowed dir slice on error
(i.e., avoid calling existing.entries.data.clearAndFree or dir.deinit/free of
the cached path before calling readDirectoryError), and instead ensure
readDirectoryError receives a valid borrowed slice or clone the path into a
temporary owned buffer before calling readdir so cleanup is safe; apply the same
fix to the other refresh region (around lines 1098–1151) referencing the same
symbols.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 16da5bf4-cbc9-496e-9ff0-117a43b83081

📥 Commits

Reviewing files that changed from the base of the PR and between 81a1dc8 and cb3f88a.

📒 Files selected for processing (2)
  • src/fs.zig
  • src/resolver/resolver.zig

Comment thread src/resolver/fs.zig Outdated
Comment thread src/resolver/resolver.zig Outdated
Comment thread src/resolver/fs.zig Outdated
Comment thread src/bun.js/hot_reloader.zig Outdated
Jarred-Sumner pushed a commit that referenced this pull request May 3, 2026
…chedMaybeLog (#30170)

## Problem

`EntriesOption` is a tagged union:
```zig
pub const EntriesOption = union(Tag) {
    entries: *DirEntry,
    err: DirEntry.Err,  // { original_err: anyerror, canonical_error: anyerror }
};
```

When `readDirectory()` fails with a non-ENOENT error (EACCES, EMFILE,
…), `readDirectoryError` stores `.err` in `rfs.entries` (fs.zig:1002).
If the directory later becomes openable and `dirInfoCachedMaybeLog`
processes it as queue slot [0] (the target path itself, which is never
pre-checked against `rfs.entries` during queue construction), it
reached:

```zig
if (rfs.entries.atIndex(cached_dir_entry_result.index)) |cached_entry| {
    if (cached_entry.entries.generation >= r.generation) {   // no tag check
```

Reading `.entries` while `.err` is active reinterprets the two
`anyerror` values as a `*DirEntry` pointer and dereferences it. In debug
this is a safety panic; in release it's a segfault at e.g.
`0x1D401D401F8`.

## Repro

```js
chmodSync(bad, 0o000);
try { Bun.resolveSync('./bad/index.js', root); } catch {}  // caches .err for 'bad'
chmodSync(bad, 0o755);
Bun.resolveSync('./bad', root);  // dirInfoCached(bad): open OK, cache has .err -> crash
```

Before:
```
panic(main thread): access of union field 'entries' while field 'err' is active
resolver.zig:2973:33 in dirInfoCachedMaybeLog
```

## Fix

Add the same `if (cached_entry.* == .entries)` guard that
`dirInfoForResolution` already has (resolver.zig:2265). When the cached
entry is `.err`, fall through with `needs_iter = true` so the directory
is re-iterated now that it can be opened.

## Verification

Added a test to `test/js/bun/resolve/resolve.test.ts` that reproduces
the exact sequence. Runs as the current user when non-root, or drops to
`nobody` via `runuser` when root (since root bypasses DAC). Skipped on
Windows.

- Without fix: panics with `access of union field 'entries' while field
'err' is active`
- With fix: resolves to `bad/index.js`

---

Note: #29919 independently adds this same guard as part of a larger
memory-leak fix; this PR is the minimal targeted crash fix with a
dedicated regression test.

---------

Co-authored-by: robobun <robobun@users.noreply.github.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
@robobun robobun force-pushed the farm/cb4c44e1/fix-bustentriescache-leak branch from c05672a to dd26f56 Compare May 3, 2026 07:17
Comment thread src/runtime/api/filesystem_router.zig Outdated
Comment thread src/router.zig Outdated
Comment thread src/router/router.zig Outdated
@Jarred-Sumner Jarred-Sumner force-pushed the farm/cb4c44e1/fix-bustentriescache-leak branch from 7328c03 to 87be2a7 Compare May 4, 2026 10:28

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No new issues found and all prior review threads are resolved — but this swaps an RCU-via-leak for explicit synchronization across the watcher/JS/bundle threads on the shared rfs.entries cache, and took several rounds of race-condition fixes to settle (with two related items deferred to follow-ups), so it's worth a human pass on the concurrency reasoning before merge.

Extended reasoning...

Overview

This PR fixes a memory leak in RealFS.bustEntriesCache() where each cache bust orphaned a heap *DirEntry (plus its EntryMap) and forced append-only growth of EntryStore/FilenameStore. The fix adds a stale: bool flag to DirEntry and has bustEntriesCache mark the existing entry stale (under entries_mutex) instead of removing the key, so the next read takes the pre-existing in_place rebuild path and reuses all allocations. Touches: src/resolver/fs.zig (core cache + entriesAt fd-preservation), src/resolver/resolver.zig (two !stale guards), src/jsc/hot_reloader.zig (mutex around affected-files reads + bust reordering), src/router/router.zig and src/runtime/api/filesystem_router.zig (snapshot subdirs before recursing; deep-copy extensions/asset_prefix_path on reload), plus a new RSS-leak regression test.

Security risks

None. No auth/crypto/permissions/parsing of untrusted input is touched. The risk surface here is memory-safety (UAF / torn reads on a shared hashmap), not security.

Level of scrutiny

High. The leak being removed was — as the PR's own review threads document — the de-facto RCU that made several unlocked reads of *DirEntry sound. Removing it required compensating synchronization that was iterated on across three separate race findings during review (within-event reader/writer race → reorder bust; cross-event race → lock the affected-files loop; same-thread reentrancy in RouteLoader.load/bustDirCacheRecursive → snapshot subdirs before recursing). rfs.entries is a process-global singleton shared across the JS-thread resolver, the watcher thread, and BundleThread, so the blast radius of a mistake here is HMR/--hot/--watch crashes under load. Two related issues were explicitly deferred to follow-ups (the pre-existing unlocked entriesAt writer-writer race, and the FrameworkRouter.scanInner snapshot). That's reasonable scoping, but it means a human should confirm the boundary drawn is acceptable.

Other factors

All seven of my prior inline review threads are resolved with specific commits and rationale; coderabbit's three findings are also resolved. CI build #50578 passes on 71 jobs including all ASAN shards; the only failures are pre-existing fleet-wide flakes unrelated to this change. The new regression test is solid (subprocess RSS-delta assertion with a generous 32 MB bound vs. ~72 MB pre-fix). The bug-hunting system found no new issues on the current revision. I'm deferring rather than approving solely because the concurrency reasoning is subtle enough — and was wrong in non-obvious ways multiple times during this review cycle — that a human sign-off on the final synchronization story is warranted.

@robobun robobun force-pushed the farm/cb4c44e1/fix-bustentriescache-leak branch from 87be2a7 to 081f6c2 Compare May 4, 2026 12:57

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All my concerns from earlier rounds have been addressed and I have nothing further to flag, but given how subtle the in-place-reuse concurrency invariants turned out to be (three separate race fixes during review), this one warrants a human pass before merge.

Extended reasoning...

Overview

This PR changes RealFS.bustEntriesCache() from removing the BSSMap key (orphaning the backing *DirEntry slot) to marking the existing DirEntry as .stale so the next read takes the existing in_place re-scan path and reuses the allocation. It touches six files: the core fs cache (src/resolver/fs.zig), the resolver's stale/generation checks (src/resolver/resolver.zig), the hot-reloader watcher thread (src/jsc/hot_reloader.zig), the route loader (src/router/router.zig), the JS FileSystemRouter API (src/runtime/api/filesystem_router.zig), and adds a leak regression test.

Security risks

None identified. The change is internal to the resolver's directory-entry cache and HMR watcher; no auth, crypto, network, or untrusted-input parsing is involved. The risk surface is memory-safety/concurrency (UAF, torn reads), not security.

Level of scrutiny

High. The previous orphaning behavior was effectively poor-man's RCU: leaked *DirEntry slots were never written again, which made several unlocked reads of entries.data (hot_reloader watcher thread, RouteLoader.load recursion, bustDirCacheRecursive) accidentally sound. Switching to in-place reuse removes that implicit guarantee, and the PR had to re-establish it explicitly via entries_mutex locking in the watcher and snapshot-before-recurse in the route loaders. Three distinct races were found and fixed during review (within-event, cross-event, and same-thread reentrancy), and two related issues were explicitly deferred to follow-ups (entriesAt unlocked writer-writer, FrameworkRouter.scanInner). Code that needed three rounds of concurrency fixes to converge is not something I should approve without a human look.

Other factors

  • All review threads (mine and coderabbit's) are resolved; the current revision has no outstanding bug findings.
  • CI passes on 71 lanes including all ASAN shards; the 3 failures are documented pre-existing fleet-wide flakes unrelated to this change.
  • The filesystem_router.zig deep-copy of extensions/asset_prefix_path fixes a real pre-existing UAF that the new test surfaced under ASAN.
  • The resolver and fs cache are on the hot path for every module resolution in Bun; a regression here would be high-impact.

@robobun robobun force-pushed the farm/cb4c44e1/fix-bustentriescache-leak branch from 081f6c2 to 9eecfda Compare May 4, 2026 20:08

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All prior threads are resolved and the bug hunter found nothing new on this revision, but this reworks resolver cache invalidation from orphan-as-RCU to in-place reuse across the watcher/JS-thread boundary (with two acknowledged follow-ups deferred), so it warrants a human sign-off.

Extended reasoning...

Overview

This PR changes RealFS.bustEntriesCache from removing the BSSMap key (orphaning the slot) to marking the existing DirEntry as stale so the next read reuses it via the in_place path. That core change ripples into fs.zig (entriesAt, readDirectoryWithIterator), resolver.zig (two in_place trigger sites), hot_reloader.zig (mutex + bust ordering on the watcher thread), and router.zig / filesystem_router.zig (snapshot-before-recurse to avoid same-thread iterator invalidation). It also fixes a pre-existing UAF in reload() (shallow-duped extensions/asset_prefix_path) and adds an RSS-growth regression test.

Security risks

None identified — no auth, crypto, network, or untrusted-input parsing is touched. The risk class here is memory safety (UAF / iterator invalidation) under concurrency, not security.

Level of scrutiny

High. The orphaned-slot leak being fixed was the de-facto RCU that made several unlocked reads of *DirEntry sound; replacing it with in-place reuse required three rounds of race-condition fixes during review (within-event, cross-event, and same-thread reentrancy via RouteLoader.load/bustDirCacheRecursive). Two related issues were explicitly deferred to follow-ups: the pre-existing entriesAt unlocked writer-writer race (can't trivially lock due to non-recursive entries_mutex reentry from dirInfoUncached), and the structurally-identical FrameworkRouter.scanInner site. This is production-critical-path code (bun --hot, --watch, Bun.FileSystemRouter, bundler watch) where concurrency bugs surface as rare, hard-to-reproduce crashes.

Other factors

All inline review threads (mine and CodeRabbit's) are resolved; the per-author status comment reports CI green (build 50578, 71/71 modulo fleet-wide flakes) including ASAN lanes; the new test exercises the leak fix and the existing reload() works with new dirs/files test covers correctness of the stale-flag re-read. None of the touched files are under CODEOWNERS. Given the iteration history and the deferred follow-ups, a human reviewer should confirm the final mutex/ordering story before merge rather than relying on auto-approval.

xhjkl pushed a commit to xhjkl/bun that referenced this pull request May 14, 2026
…chedMaybeLog (oven-sh#30170)

## Problem

`EntriesOption` is a tagged union:
```zig
pub const EntriesOption = union(Tag) {
    entries: *DirEntry,
    err: DirEntry.Err,  // { original_err: anyerror, canonical_error: anyerror }
};
```

When `readDirectory()` fails with a non-ENOENT error (EACCES, EMFILE,
…), `readDirectoryError` stores `.err` in `rfs.entries` (fs.zig:1002).
If the directory later becomes openable and `dirInfoCachedMaybeLog`
processes it as queue slot [0] (the target path itself, which is never
pre-checked against `rfs.entries` during queue construction), it
reached:

```zig
if (rfs.entries.atIndex(cached_dir_entry_result.index)) |cached_entry| {
    if (cached_entry.entries.generation >= r.generation) {   // no tag check
```

Reading `.entries` while `.err` is active reinterprets the two
`anyerror` values as a `*DirEntry` pointer and dereferences it. In debug
this is a safety panic; in release it's a segfault at e.g.
`0x1D401D401F8`.

## Repro

```js
chmodSync(bad, 0o000);
try { Bun.resolveSync('./bad/index.js', root); } catch {}  // caches .err for 'bad'
chmodSync(bad, 0o755);
Bun.resolveSync('./bad', root);  // dirInfoCached(bad): open OK, cache has .err -> crash
```

Before:
```
panic(main thread): access of union field 'entries' while field 'err' is active
resolver.zig:2973:33 in dirInfoCachedMaybeLog
```

## Fix

Add the same `if (cached_entry.* == .entries)` guard that
`dirInfoForResolution` already has (resolver.zig:2265). When the cached
entry is `.err`, fall through with `needs_iter = true` so the directory
is re-iterated now that it can be opened.

## Verification

Added a test to `test/js/bun/resolve/resolve.test.ts` that reproduces
the exact sequence. Runs as the current user when non-root, or drops to
`nobody` via `runuser` when root (since root bypasses DAC). Skipped on
Windows.

- Without fix: panics with `access of union field 'entries' while field
'err' is active`
- With fix: resolves to `bad/index.js`

---

Note: oven-sh#29919 independently adds this same guard as part of a larger
memory-leak fix; this PR is the minimal targeted crash fix with a
dedicated regression test.

---------

Co-authored-by: robobun <robobun@users.noreply.github.com>
Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
@robobun robobun force-pushed the farm/cb4c44e1/fix-bustentriescache-leak branch from 9eecfda to 6a91fbd Compare May 17, 2026 02:06
Comment thread src/resolver/lib.rs
@robobun robobun force-pushed the farm/cb4c44e1/fix-bustentriescache-leak branch from 9d43bef to 539ddee Compare May 22, 2026 20:33

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No new issues on this revision, but this changes the aliasing/lifetime contract of a process-global cache (the prior leak was load-bearing as accidental RCU) and threads new entries_mutex synchronization + unsafe pointer snapshots through the hot-reloader and router — worth a human pass, and CI is currently red on build-rust across most platforms for 539ddee.

Extended reasoning...

Overview

This PR fixes a per-reload memory leak in RealFS::bust_entries_cache by replacing BSSMap::remove() (which orphaned the backing DirEntry slot) with an in-place stale flag so the next read reuses the allocation via the existing in_place path. Because the old orphaning behavior was effectively an accidental RCU that made several unlocked readers sound, the fix required compensating synchronization at every site that previously captured a *DirEntry and read it across a potential rewrite: hot_reloader::on_file_update now holds entries_mutex around the affected-files loop and busts after releasing it; RouteLoader::load and FileSystemRouter::bust_dir_cache_recursive snapshot subdir *Entry pointers before recursing; entries_at carries over the prior fd. Touches 6 Rust files across resolver/router/hot-reloader plus a regression test.

Security risks

No direct security surface (no auth/crypto/network input parsing). The risk class here is memory safety: the change rewires lifetime/aliasing invariants for a process-global cache shared across the watcher thread and JS thread, with several new unsafe { &*entry_ptr } dereferences whose validity depends on EntryStore being append-only and on the new lock ordering. The review history found and fixed three distinct UAF/iterator-invalidation races introduced by earlier revisions; the current revision looks correct to the bug hunter, but the blast radius of a missed case is a UAF in --hot/reload().

Level of scrutiny

High. This is concurrency- and aliasing-sensitive core infrastructure (module resolver directory cache, HMR watcher thread). The PR itself documents that correctness now hinges on subtle invariants ("previously never written again (accidental RCU)", "bust after releasing the lock since bust_entries_cache takes it internally", "*Entry values live in the append-only EntryStore"). Three follow-up items were explicitly deferred as pre-existing (entries_at writer-writer locking, FrameworkRouter.scanInner snapshot, fd close at other in_place sites). A human reviewer should confirm the lock ordering and the EntryStore-stability argument, and decide whether the deferred items are acceptable to ship separately.

Other factors

  • CI on the head commit (539ddee) shows build-rust/build-cpp failures across linux x64/aarch64/musl, freebsd, android, and macOS x64 — needs to go green before merge regardless.
  • The PR went through ~8 revision rounds with multiple 🔴 findings (all resolved); that history itself signals this is not a mechanical change.
  • Regression test added and passing locally per the author's status comment (20/21, the 1 failure is a pre-existing main regression).

@robobun robobun force-pushed the farm/cb4c44e1/fix-bustentriescache-leak branch from 843746d to ffb9993 Compare May 25, 2026 06:18
robobun and others added 2 commits June 6, 2026 15:36
BSSMap::remove() only drops the hash→index mapping; the backing slot
and the heap DirEntry it points at are orphaned. The next lookup
allocates a fresh slot + fresh DirEntry, which skips the in_place
re-scan path — so every bust_dir_cache (HMR/watch, FileSystemRouter
.reload()) leaked one DirEntry + EntryMap per directory and appended
N Entry/FilenameStore slots per file, unbounded.

Keep the slot and flag the DirEntry as stale instead. The existing
in_place logic in read_directory_with_iterator/entries_at/
dir_info_cached_maybe_log then re-reads the directory while reusing
the DirEntry allocation, its EntryMap, and the Entry/FilenameStore
slots via prev_map.

Since the orphaned slot was previously never written again (accidental
RCU), callers that captured a *DirEntry and read its EntryMap across
calls that could now trigger an in_place rewrite need explicit
synchronization:

- hot_reloader on_file_update: hold entries_mutex around the
  dir_ent.entries reads (covers both within-event and cross-event
  races with a concurrent JS-thread resolve); bust after releasing
  the lock since bust_entries_cache takes it internally.
- RouteLoader::load / bust_dir_cache_recursive: snapshot subdir
  *Entry pointers (stable in the append-only EntryStore) before
  recursing, so read_dir_info(child) can safely rewrite the parent.
- entries_at: carry over the previous DirEntry.fd through the refresh
  so DirInfo::get_file_descriptor keeps working.

500 reloads of 4 dirs × 20 files: ~50 MB RSS growth → ~8 MB.
@robobun robobun force-pushed the farm/cb4c44e1/fix-bustentriescache-leak branch from ffb9993 to f3f83c3 Compare June 6, 2026 15:39

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No new issues found on this revision, but given the change swaps the resolver's de-facto RCU (orphan-on-bust) for in-place DirEntry rewrites under entries_mutex — with several rounds of cross-thread race fixes already applied — this warrants a human pass over the locking/snapshot invariants before merge.

Extended reasoning...

Overview

This PR changes RealFS::bust_entries_cache from removing the BSSMap key (orphaning the heap DirEntry and its EntryMap) to flagging the existing DirEntry as stale so the next read takes the pre-existing in_place re-scan path. Touches src/resolver/{fs,lib,resolver}.rs (the stale flag, entries_at fd carry-over, both in_place guard sites), src/jsc/hot_reloader.rs (watcher-thread entries_mutex hold + bust reordering), and src/router/lib.rs + src/runtime/api/filesystem_router.rs (snapshot subdir *Entry pointers before recursing). Adds an RSS-leak regression test.

Security risks

None in the auth/crypto sense. The risk surface is memory-safety: the old orphaning behavior was an accidental RCU that made several unlocked *DirEntry reads sound; reusing the slot in place required adding explicit synchronization at three caller sites (hot_reloader affected-files loop, RouteLoader::load, bust_dir_cache_recursive). The PR went through three rounds of race-condition fixes (within-event, cross-event, and same-thread iterator-invalidation via recursive read_dir_info) and explicitly defers three pre-existing issues (entries_at writer-writer locking, FrameworkRouter.scanInner snapshot, fd close at other in_place sites).

Level of scrutiny

High. This is production-critical resolver/HMR code with unsafe raw-pointer dereferences, non-recursive mutex ordering constraints, and cross-thread invariants whose correctness depends on which caller holds entries_mutex vs. dir_cache state at each step. The fix is well-reasoned and well-documented in-source, but the class of bug it's been iterating on (captured *DirEntry rewritten under the reader) is exactly the kind that merits a maintainer's eyes on the final locking topology and the deferred follow-up list.

Other factors

All prior inline review threads are resolved; CI was green on a previous full run and the latest push is just a rebase + rustfmt. The new test exercises the single-threaded reload() path but not the --hot watcher-thread races the synchronization changes target. The author has flagged the PR as ready for human review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant