Skip to content

Fix use-after-free in AggregateError creation for multi-error module loads#31399

Closed
robobun wants to merge 1 commit into
mainfrom
farm/e1ad33a3/fix-buildmessage-aggregate-uaf
Closed

Fix use-after-free in AggregateError creation for multi-error module loads#31399
robobun wants to merge 1 commit into
mainfrom
farm/e1ad33a3/fix-buildmessage-aggregate-uaf

Conversation

@robobun

@robobun robobun commented May 25, 2026

Copy link
Copy Markdown
Collaborator

What does this PR do?

Fixes a heap-use-after-free found by fuzzing (fingerprint Address:heap-use-after-free:bun-debug+0x11afa5ac, READ of size 1 at offset 144 of a 152-byte region — BuildMessage.logged read through VirtualMachine::print_error_from_maybe_private_data while iterating an AggregateError).

When a module fails to load with multiple transpiler/resolver errors, process_fetch_log creates one BuildMessage/ResolveMessage wrapper per log message and then wraps them in an AggregateError. The Rust port staged those freshly created JSValues in a heap Vec<JSValue>:

  • JSC's conservative scan only covers the native stack and registers, so values sitting in a heap Vec are invisible to the GC.
  • Each BuildMessage::create/ResolveMessage::create allocates JS cells and can trigger a collection, so a GC landing in the middle of the loop could sweep wrappers created in earlier iterations.
  • Their finalizers free the 152-byte Box<BuildMessage> payloads, and the AggregateError ends up holding stale values. Printing it (uncaught exception / unhandled rejection / console.error) or touching error.errors[i] afterwards reads the freed payload.

The Zig implementation (VirtualMachine.zig var errors_stack: [256]JSValue) deliberately kept this staging buffer on the stack so the conservative scan roots the wrappers until createAggregateError runs. This PR restores that behavior in the Rust port by using a stack array instead of a heap Vec.

Reproduced with BUN_JSC_collectContinuously=true BUN_JSC_forceRAMSize=1000000 and a dynamic import of a module with ~250 parse errors: ASAN reports the UAF on the unfixed build (read in print_error_from_maybe_private_dataCell<bool>::get, freed by BuildMessageClass__finalize, allocated by process_fetch_logBuildMessage::create); the fixed build is clean.

How did you verify your code works?

  • New regression test AggregateError from a module with many build errors survives GC during its creation in test/js/bun/resolve/build-error.test.ts. It imports a module with 250 build errors under BUN_JSC_collectContinuously=true, then inspects and prints the resulting AggregateError. It fails on the unfixed ASAN debug build (AddressSanitizer abort in the subprocess) and passes with this fix.
  • bun bd test test/js/bun/resolve/build-error.test.ts — 3 pass.
  • Manually re-ran the standalone reproduction 6× on the fixed ASAN debug build with no reports; the unfixed build reports the use-after-free on most runs.

…eated

process_fetch_log collected the per-message BuildMessage/ResolveMessage
wrappers in a heap Vec<JSValue> before creating the AggregateError. The
conservative GC scan only covers the stack and registers, so a collection
triggered while later wrappers were still being created could sweep the
earlier ones and free their native payloads; printing or inspecting the
resulting AggregateError then read freed memory.

Use a stack array for the staging buffer, matching VirtualMachine.zig's
errors_stack, so the wrappers stay rooted until createAggregateError runs.
@coderabbitai

coderabbitai Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor

Warning

Review limit reached

@robobun, we couldn't start this review because you've used your available PR reviews for now.

Your plan includes 5 reviews of capacity. Refill in 10 minutes and 56 seconds.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more review capacity refills, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than trial, open-source, and free plans. In all cases, review capacity refills continuously over time.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f1816fd4-7e9a-4b90-b8ac-a92ba51d60c6

📥 Commits

Reviewing files that changed from the base of the PR and between a49b308 and 2f3bf67.

📒 Files selected for processing (2)
  • src/jsc/VirtualMachine.rs
  • test/js/bun/resolve/build-error.test.ts

Comment @coderabbitai help to get the list of available commands and usage tips.

@robobun

robobun commented May 25, 2026

Copy link
Copy Markdown
Collaborator Author
Updated 12:57 PM PT - May 25th, 2026

@robobun, your commit 2f3bf67 has 2 failures in Build #57993 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 31399

That installs a local version of the PR into your bun-31399 executable, so you can run:

bun-31399 --bun

@github-actions

Copy link
Copy Markdown
Contributor

Found 1 issue this PR may fix:

  1. Processing dynamic imports with errors crashes Bun instead of throwing errors #23181 - Dynamic imports with errors crash Bun instead of throwing catchable errors, matching the use-after-free in process_fetch_log when multiple BuildMessage/ResolveMessage JSValues are GC'd mid-loop

If this is helpful, copy the block below into the PR description to auto-close this issue on merge.

Fixes #23181

🤖 Generated with Claude Code

@robobun

robobun commented May 25, 2026

Copy link
Copy Markdown
Collaborator Author

Checked #23181 against this fix: it's most likely a different bug, so I'm not adding a Fixes line for it.

  • That report is from Bun v1.2.23, where this code path was still the Zig implementation (VirtualMachine.zig), which already staged the wrappers in a stack array — the regression fixed here was introduced later, in the Rust port of process_fetch_log.
  • The crash signature there is also different: a Windows segfault at 0x18 inside JSC::JSObject::getNonIndexPropertySlot during property access, rather than a use-after-free of a BuildMessage payload while printing/iterating an AggregateError.

The symptom overlaps (dynamic imports of modules with errors), but the root cause in #23181 would need its own investigation.

@github-actions

Copy link
Copy Markdown
Contributor

This PR may be a duplicate of:

  1. Root BuildMessage cells on the stack in Log.to_js #30671 - Fixes the same use-after-free root cause (unrooted BuildMessage/ResolveMessage JSValues in heap Vec invisible to JSC's conservative GC) with the same stack-array fix in process_fetch_log, and additionally fixes the same issue in Log::to_js

🤖 Generated with Claude Code

@robobun

robobun commented May 25, 2026

Copy link
Copy Markdown
Collaborator Author

Confirmed — this is a duplicate of #30671, which contains the identical stack-array fix for process_fetch_log plus the same fix for the other call site that builds these AggregateErrors (Log::to_js), along with regression tests for both paths. That PR also already documents this exact module-loader/Worker manifestation (freed BuildMessage, logged flag at offset 144 of the 152-byte allocation).

Closing this one in favor of #30671 to keep a single PR for the fix.

@robobun robobun closed this May 25, 2026

const [stdout, stderr, exitCode] = await Promise.all([proc.stdout.text(), proc.stderr.text(), proc.exited]);

expect(stderr).not.toContain("AddressSanitizer");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 nit: expect(stderr).not.toContain("AddressSanitizer") is the kind of assertion CLAUDE.md says to avoid ("NEVER write tests that check for no 'panic' or 'uncaught exception' or similar in the test output. These tests will never fail in CI."). Non-ASAN CI builds will never emit this string, and on ASAN builds a UAF aborts the subprocess so expect(exitCode).toBe(0) and expect(stdout).toContain("OK 250") already catch it — this line can be dropped.

Extended reasoning...

What this is

The new regression test asserts expect(stderr).not.toContain("AddressSanitizer") against the subprocess output. The repo's root CLAUDE.md (line 126) explicitly prohibits this style of assertion:

NEVER write tests that check for no "panic" or "uncaught exception" or similar in the test output. These tests will never fail in CI.

Checking for the absence of an AddressSanitizer diagnostic falls squarely into the "or similar" bucket — it is a crash-diagnostic-absence check, just like checking for the absence of panic: or uncaught exception.

Why it is redundant

Walk through the two CI configurations:

  1. Non-ASAN build (most CI jobs). The subprocess binary was not compiled with -fsanitize=address, so it is physically incapable of writing AddressSanitizer to stderr regardless of whether the bug is present. The assertion is a tautology and can never fail. This is exactly the failure mode the CLAUDE.md rule warns about.

  2. ASAN build with the bug present. ASAN detects the heap-use-after-free, prints its report to stderr, and aborts the process with a non-zero exit code before console.log("OK", aggregate.errors.length) runs. So:

    • expect(exitCode).toBe(0) fails (process aborted), and
    • expect(stdout).toContain("OK 250") fails (the OK line was never printed).

    The regression is already detected by the two assertions that follow; the stderr check adds no coverage.

A grep across test/ confirms no other test file uses .not.toContain("AddressSanitizer"), so this is not following an established local convention.

Impact

None functionally — the test still passes/fails correctly with this line present. It is purely dead weight that contradicts an explicit repo testing convention.

Fix

Delete line 69. The remaining expect(stdout).toContain("OK 250") and expect(exitCode).toBe(0) fully cover the regression. If a more descriptive failure message on ASAN runners is desired, prefer attaching stderr to the exit-code assertion (e.g. expect(exitCode, stderr).toBe(0)) rather than asserting on the absence of a sanitizer string.

Comment thread src/jsc/VirtualMachine.rs
Comment on lines +2561 to +2569
// Spec caps at 256 (`var errors_stack: [256]JSValue`). The array
// must stay on the stack like the Zig spec: until
// `create_aggregate_error` runs, these freshly created
// BuildMessage/ResolveMessage wrappers have no other GC-visible
// reference, and `create()` can trigger a collection mid-loop.
// JSC's conservative scan only covers the stack/registers, so a
// heap `Vec<JSValue>` would let earlier wrappers be swept (and
// their native payloads freed) before the AggregateError exists.
let mut errors_stack = [JSValue::ZERO; 256];

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟣 Heads-up: the same heap-Vec<JSValue> UAF pattern this PR fixes still exists in LogJsc::to_js at src/jsc/lib.rs:2146-2151let mut errors_stack: Vec<JSValue> = … followed by a loop pushing msg_to_js() (which calls the same BuildMessage::create/ResolveMessage::create). That path is live via Bun.Transpiler (scanImports/transformSync), web_worker, TOMLObject, JSONCObject, css_internals, and filesystem_router for any input with 2+ log messages. This is pre-existing code the PR doesn't touch, but it's the identical root cause with the identical 3-line fix ([JSValue::ZERO; 256] like src/ast_jsc/lib.rs:50 already does), so it's probably worth applying here too.

Extended reasoning...

What the bug is

This PR fixes a use-after-free in process_fetch_log (src/jsc/VirtualMachine.rs) by replacing a heap Vec<JSValue> with a stack [JSValue; 256], so JSC's conservative stack/register scan keeps freshly-created BuildMessage/ResolveMessage wrappers rooted while later loop iterations call create() (which can trigger GC). The PR's new comment at lines 2561-2568 documents exactly why a heap Vec<JSValue> is unsafe in this pattern.

However, the identical buggy pattern remains in src/jsc/lib.rs:2146-2151, in impl LogJsc for bun_ast::Log → fn to_js:

_ => {
    let mut errors_stack: Vec<JSValue> = Vec::with_capacity(count);
    for msg in &msgs[0..count] {
        errors_stack.push(msg_to_js(msg, global)?);
    }
    let out = bun_core::ZigString::init(message.as_bytes());
    global.create_aggregate_error(&errors_stack, &out)
}

The local is even named errors_stack (after the Zig spec's on-stack [256]JSValue), but it is a heap Vec. msg_to_js (lib.rs:2131-2135) dispatches to BuildMessage::create / ResolveMessage::create — the exact same GC-allocating calls that motivated this PR's fix.

Code path that triggers it

LogJsc::to_js is a live path. Verified callers (via use crate::LogJsc + .to_js(...)):

  • src/runtime/api/JSTranspiler.rs — lines 903, 1029, 1046, 1083, 1398, 1404, 1589, 1595, 1818, 1824 (e.g. transformSync / scanImports on input with multiple parse errors)
  • src/jsc/web_worker.rs:1432, 1680
  • src/runtime/api/TOMLObject.rs:25, 30, 49
  • src/runtime/api/JSONCObject.rs:27
  • src/css_jsc/css_internals.rs:226, 386
  • src/runtime/api/filesystem_router.rs:262, 297, 311, 497, 523

Any of these producing 2+ log messages enters the _ => branch and stages the wrappers in a heap Vec.

Why existing code doesn't prevent it

Quoting this PR's own new comment: "JSC's conservative scan only covers the stack/registers, so a heap Vec<JSValue> would let earlier wrappers be swept (and their native payloads freed) before the AggregateError exists." That reasoning applies verbatim to lib.rs:2146. Until create_aggregate_error runs, the only references to the per-message wrappers live in the Vec's heap buffer; the Vec struct on the stack holds a pointer/len/cap, and conservative scanning does not chase heap pointers.

For contrast, the parallel implementation in src/ast_jsc/lib.rs:49-50 already does this correctly:

// On-stack array: conservative GC stack scan keeps these JSValues alive (see PORTING.md §JSC).
let mut errors_stack: [JSValue; 256] = [JSValue::default(); 256];

and the Zig spec at src/ast_jsc/logger_jsc.zig:44 uses var errors_stack: [256]jsc.JSValue. Only the src/jsc/lib.rs port diverged.

Step-by-step proof

Concrete walk-through with new Bun.Transpiler().transformSync(src) where src has 3 parse errors, under BUN_JSC_collectContinuously=true:

  1. Transpiler hits 3 errors → log.msgs.len() == 3JSTranspiler.rs calls log.to_js(global, "...").
  2. to_js enters the _ => arm, allocates Vec<JSValue> with capacity 3 — the 24-byte buffer lives on the heap.
  3. Iteration 0: msg_to_jsBuildMessage::create allocates a JS cell wrapping a 152-byte Box<BuildMessage> payload; the resulting JSValue is written to vec_buf[0] on the heap. No stack slot or register continues to hold it after push returns.
  4. Iteration 1: msg_to_jsBuildMessage::create allocates again. With collectContinuously, this triggers a full GC. The conservative scan walks the native stack and registers; it sees the Vec's (ptr, len, cap) triple but does not dereference ptr, so the wrapper from iteration 0 has zero visible references. It is swept; BuildMessageClass__finalize runs and frees the 152-byte Box<BuildMessage>.
  5. Iteration 1's wrapper is pushed; iteration 2 likewise (potentially sweeping iteration 1's wrapper).
  6. create_aggregate_error(&errors_stack, …) reads errors_stack[0] — a stale JSValue pointing at a swept cell — and stores it in the AggregateError's errors array.
  7. The caller throws/returns the AggregateError. When user code reads e.errors[0].message / .position, or when the runtime prints the error, it dereferences the freed 152-byte payload → heap-use-after-free, identical fingerprint to the one this PR fixes.

Impact

Same as the bug this PR fixes: heap UAF reading freed BuildMessage/ResolveMessage payloads when constructing/printing an AggregateError from a multi-error log, reachable from several public APIs (Bun.Transpiler, TOML/JSONC parsing, CSS internals, FileSystemRouter, Workers). It requires a GC to land mid-loop, so it's rare in practice but deterministic under BUN_JSC_collectContinuously=true — exactly the conditions in this PR's new regression test.

How to fix

Apply the same change as this PR:

_ => {
    let mut errors_stack = [JSValue::ZERO; 256];
    for (i, msg) in msgs[..count].iter().enumerate() {
        errors_stack[i] = msg_to_js(msg, global)?;
    }
    let out = bun_core::ZigString::init(message.as_bytes());
    global.create_aggregate_error(&errors_stack[..count], &out)
}

This is pre-existing code that the PR does not touch, so I've marked it as such — but since the PR's commit title is "Keep multi-error AggregateError children GC-visible while they are created" and this is a second instance of that exact class with the same 3-line fix, it seems worth folding in here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant