Skip to content

Root BuildMessage cells on the stack in Log.to_js#30671

Closed
robobun wants to merge 8 commits into
mainfrom
farm/0003cfea/transpiler-async-error-uaf
Closed

Root BuildMessage cells on the stack in Log.to_js#30671
robobun wants to merge 8 commits into
mainfrom
farm/0003cfea/transpiler-async-error-uaf

Conversation

@robobun

@robobun robobun commented May 14, 2026

Copy link
Copy Markdown
Collaborator

What does this PR do?

Fixes a GC crash when Bun.Transpiler().transform() rejects with multiple parse errors.

Log.to_js builds an AggregateError by allocating one BuildMessage/ResolveMessage JS cell per log entry. The Rust port collected those cells in a heap Vec<JSValue>:

let mut errors_stack: Vec<JSValue> = Vec::with_capacity(count);
for msg in &msgs[0..count] {
    errors_stack.push(msg_to_js(msg, global)?);
}

Only the Vec header is on the stack, so the conservative GC scan does not see the cell pointers stored in the heap buffer. Allocating the second BuildMessage can trigger a collection that sweeps the first one, and create_aggregate_error then hands the collector a zapped StructureID (ASSERTION FAILED: decontaminate() under debug WebKit; SEGV in SlotVisitor::visitChildren on release).

The Zig spec (logger_jsc.zig:44) used a stack [256]JSValue precisely so the scan keeps every cell rooted. This change restores that.

Fuzzilli fingerprint: 824b56d2fb4455bf — the fuzzer hit a variant of this via the TransformTask.thenlog.toJS path.

How did you verify your code works?

Added test/js/bun/transpiler/transpiler-async-error-uaf.test.ts, which queues 500 failing async transform() calls and asserts the rejection carries the right nested error text.

  • Debug/ASAN before fix: ~65% of runs (13/20) hit ASSERTION FAILED: decontaminate(); after: 30/30 pass.
  • Release before fix at 20 k iterations: 9/10 runs SEGV in SlotVisitor::visitChildren; after: 10/10 pass.
  • Existing test/js/bun/transpiler/ and test/bundler/transpiler/transpiler.test.js suites still pass.

@robobun

robobun commented May 14, 2026

Copy link
Copy Markdown
Collaborator Author
Updated 1:05 PM PT - May 23rd, 2026

@robobun, your commit 8ddc9ac has 1 failures in Build #57343 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 30671

That installs a local version of the PR into your bun-30671 executable, so you can run:

bun-30671 --bun

@coderabbitai

coderabbitai Bot commented May 14, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

This PR deep-clones buffered transpiler log messages from a temporary MimallocArena into the default allocator before the arena is freed, and adds a regression test that repeatedly calls Bun.Transpiler().transform() with parse errors and asserts the final rejection and nested error message.

Changes

Async Transpiler Log Memory Safety Fix

Layer / File(s) Summary
Deep-clone log messages and regression test
src/runtime/api/JSTranspiler.zig, test/js/bun/transpiler/transpiler-async-error-uaf.test.ts
TransformTask.run adds a defer that deep-clones each this.log.msgs entry to bun.default_allocator before the local MimallocArena is deinitialized so JS-thread then() can safely read messages. A new regression test repeatedly invokes Bun.Transpiler().transform() with invalid syntax, yields to the event loop, collects the last rejection, and asserts the top-level AggregateError message "Transform failed" and the first nested error 'Expected ";" but found "b"'.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title 'Root BuildMessage cells on the stack in Log.to_js' directly references the technical fix implemented in the PR, which involves storing BuildMessage cells on the stack instead of in a heap Vec to ensure they remain visible to the GC.
Description check ✅ Passed The description provides both required sections: 'What does this PR do?' clearly explains the GC crash fix with technical context and code examples, and 'How did you verify your code works?' describes testing across debug/ASAN/release builds with concrete pass rates.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/js/bun/transpiler/transpiler-async-error-uaf.test.ts`:
- Around line 10-19: The test fires 500 async Bun.Transpiler().transform(...)
calls but never awaits them, causing flakiness; modify the loop to collect each
transform() Promise (e.g., push to an array) instead of just attaching a .catch
and then await Promise.all(...) on that array (after the loop and any interim
setImmediate yields) so all transform() calls and their catch handlers complete
before asserting on last.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b18aff0b-c90d-435b-b9cc-bb40e17e3995

📥 Commits

Reviewing files that changed from the base of the PR and between b8ecc78 and 4b485ec.

📒 Files selected for processing (2)
  • src/runtime/api/JSTranspiler.zig
  • test/js/bun/transpiler/transpiler-async-error-uaf.test.ts

Comment thread test/js/bun/transpiler/transpiler-async-error-uaf.test.ts Outdated
@github-actions

Copy link
Copy Markdown
Contributor

This PR may be a duplicate of:

  1. Fix use-after-free in Bun.Transpiler.transform() error reporting #30309 - Fixes the same use-after-free of arena-allocated log messages in the async Bun.Transpiler.transform() error path by cloning before arena deallocation
  2. Fix use-after-free in Bun.Transpiler async transform() errors #30263 - Addresses the identical UAF bug where error messages reference freed MimallocArena memory when the async transform promise is settled on the main thread
  3. JSTranspiler: fix use-after-free of log messages in async transform() #30180 - Targets the same lifetime issue by deep-cloning log messages into bun.default_allocator before MimallocArena.deinit() in TransformTask.run()
  4. Fix use-after-free in Bun.Transpiler async transform() errors #29958 - Fixes the same use-after-free where async transpiler error strings are invalidated after the thread-local arena is torn down

🤖 Generated with Claude Code

Comment thread src/runtime/api/JSTranspiler.zig Outdated
Comment thread test/js/bun/transpiler/transpiler-async-error-uaf.test.ts Outdated
@robobun robobun changed the title Fix use-after-free in Bun.Transpiler().transform() error path Root BuildMessage cells on the stack in Log.to_js May 14, 2026
Comment thread src/jsc/lib.rs
Comment thread src/jsc/lib.rs
@robobun

robobun commented May 14, 2026

Copy link
Copy Markdown
Collaborator Author

CI note: the only hard failure on #54236 is test/js/web/fetch/fetch-tcp-keepalive.test.ts (Expected "02", Received "00" for the SO_KEEPALIVE timer probe) on debian-13-x64-asan. That test is unrelated to this change — nothing here touches HTTP/sockets — and the same failure is currently showing on unrelated PR builds #54230, #54231, #54232, #54225, #54228. transpiler-async-error-uaf.test.ts passed on every shard.

Comment thread src/jsc/lib.rs
Comment on lines +2121 to +2123
// Must be a stack array so the conservative GC scan keeps the freshly
// allocated BuildMessage/ResolveMessage cells alive while we allocate
// the next one; a heap `Vec<JSValue>` would leave them unrooted.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟣 FYI / pre-existing: the same heap-Vec<JSValue> GC-rooting pattern this PR fixes (and now documents as broken) also exists at src/runtime/bake/production.rs:966-985css_chunk_js_strings: Vec<JSValue> is filled with freshly-allocated JSString cells, then six create_empty_array calls plus a per-route allocation loop run before the Vec is read at lines 1109/1127/1150, so any GC in that window can sweep the unrooted strings. Unlike here, css_chunks_count is unbounded so a stack [JSValue; 256] won't work — this needs MarkedArgumentBuffer or per-cell .protect(). Not blocking (the PR doesn't touch production.rs and the Zig spec at production.zig:541 has the same heap-alloc pattern), but worth batching with the already-flagged VirtualMachine.rs:2501 follow-up. Separately, the inline comment at src/runtime/server/mod.rs:896 claiming "the conservative GC scan reaches the heap allocation as well as the stack" is now directly contradicted by the new comment here and should be reworded (that site is incidentally safe only because the values are also stack-resident).

Extended reasoning...

What the bug is

The new comment this PR adds at lib.rs:2121-2123 documents that storing freshly-allocated JS cells in a heap Vec<JSValue> leaves them invisible to JSC's conservative stack scanner. An earlier review comment already flags src/jsc/VirtualMachine.rs:2501 as a sibling instance; another unfixed instance of the same bug class — with a substantially larger GC window — exists at src/runtime/bake/production.rs:966-1150.

The specific code path

// production.rs:966
let mut css_chunk_js_strings: Vec<JSValue> = vec![JSValue::ZERO; css_chunks_count];
...
for (output_file, str) in ... .zip(css_chunk_js_strings.iter_mut()) {
    *str = BunString::create_format(format_args!("{}{}", ...))
        .to_js(global)?;   // allocates a JSString cell
}

The fill loop (971-985) writes each BunString::create_format(...).to_js(global)? result — a freshly-allocated JSString cell — into a heap Vec slot via *str = .... The loop variable str is &mut JSValue pointing into the heap buffer, not a stack copy; the BunString temporary drops at end-of-statement, so its WTF::StringImpl's sole remaining owner is the JSString cell, and the JSString cell's sole reference lives in the malloc-heap Vec buffer.

After the fill loop, lines 989-1015 perform six consecutive JSValue::create_empty_array(global, ...) allocations, followed by a per-route loop (1018-1158) containing many more JSC allocations (create_empty_array at 1086/1087, preload_bundled_module, put_index), before css_chunk_js_strings[...] is first read at lines 1109/1127/1150.

Why existing safeguards don't help

JSC's conservative scanner walks the machine stack and registers; it sees the Vec's {ptr, len, cap} triple on the stack, but ptr points at a mimalloc allocation, not a JSC MarkedBlock, so the scanner does not follow it. There is no .protect(), no MarkedArgumentBuffer, and — unlike server/mod.rs:898 — no separate stack local that incidentally roots the cells. Contrast with the safe create_array_from_iter pattern (JSValue.rs) where each cell is immediately put_index-ed into a stack-rooted JSArray before the next allocation.

Step-by-step proof

  1. A bake production / static-site build (bun build --app) produces ≥2 CSS chunks, so css_chunks_count >= 2.
  2. Fill-loop iteration 0: BunString::create_format(...).to_js(global) allocates JSString cell A; *str = A stores it at css_chunk_js_strings[0] (heap). The BunString temporary drops; nothing on the stack now references A.
  3. Fill-loop iteration 1: .to_js(global) allocates B. Even within the fill loop, this allocation can already trigger GC and sweep A. Either way, the loop completes with all cells rooted only by the heap Vec.
  4. Line 989: create_empty_array(global, navigatable_routes.len()) allocates a JSArray. Suppose this triggers a full GC.
  5. The conservative scanner walks the stack/registers, finds css_chunk_js_strings's {ptr, len, cap}, recognises ptr as non-JSC-heap, and does not follow it. A and B have no roots and are swept.
  6. Lines 990-1087 perform ~7 more allocating calls; the per-route loop runs preload_bundled_module (which evaluates JS).
  7. Line 1109: styles.put_index(global, css_file_count, css_chunk_js_strings[idx]) reads a zapped cell and installs it into a live JSArrayASSERTION FAILED: decontaminate() (debug) / SEGV in SlotVisitor::visitChildren (release) — the same crash signature this PR fixes for Log::to_js.

Impact

A bun build --app / bake static-site build with multiple CSS chunks under GC pressure can crash with a zapped-StructureID assertion or segfault when populating per-route stylesheet arrays. The window is large (six guaranteed allocations plus an unbounded per-route loop including JS evaluation between fill and read), so it is more exposed than the Log::to_js instance.

Why pre-existing, not blocking

This is flagged as pre-existing rather than normal severity because: (1) the PR does not touch, call into, or otherwise interact with src/runtime/bake/production.rs (different crate, different subsystem); (2) unlike the VirtualMachine.rs:2501 sibling, this is not a Rust-port deviation — the Zig spec at production.zig:541 also heap-allocates via try allocator.alloc(JSValue, css_chunks_count), so the Rust port faithfully reproduces a latent Zig-side hazard; (3) the directly-related, in-crate, same-consumer sibling (VirtualMachine.rs:2501) is already flagged separately. It's surfaced here so it can be batched into the same follow-up.

How to fix

css_chunks_count is unbounded, so the fixed-size stack-array fix from this PR does not directly apply. Either:

  • Use MarkedArgumentBuffer (already exposed at bun_jsc::MarkedArgumentBuffer) to hold the cells, or
  • .protect() each cell after to_js() and unprotect via a drop guard (the VirtualMachine.rs:5840 pattern), or
  • Build the strings directly into a stack-rooted JSArray via create_array_from_iter and index into that instead of a Rust Vec.

Side note: misleading comment at server/mod.rs:896

src/runtime/server/mod.rs:896-897 carries an inline comment asserting "The conservative GC scan reaches the heap allocation as well as the stack, so a small Vec is sound." That claim is directly contradicted by this PR's new lib.rs:2121-2123 comment and by the crash this PR fixes. That particular site happens to be safe — prepared.js_request and the extra_args it copies from are stack-resident — but the stated rationale is wrong and will mislead future readers into thinking heap Vec<JSValue> is generally OK. Worth correcting the comment when the follow-up lands.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All prior feedback is addressed and the stack-array fix matches the Zig spec at both sites — looks correct to me, but since this is GC-rooting in the JSC bindings I'd like a human to give it a final glance before merge.

Extended reasoning...

Overview

Two product-code changes plus a regression test:

  • src/jsc/lib.rs (Log::to_js): replace heap Vec<JSValue> with a stack [JSValue::UNDEFINED; 256] so freshly-allocated BuildMessage/ResolveMessage cells stay visible to JSC's conservative stack scanner while the loop allocates the next one.
  • src/jsc/VirtualMachine.rs (process_fetch_log multi-error arm): identical fix, applied in 546cc6b after I flagged it as a sibling instance.
  • test/js/bun/transpiler/transpiler-async-error-uaf.test.ts: new subprocess-based regression test using BUN_JSC_useZombieMode=1 (+ collectContinuously on non-Windows) that drives 256 BuildMessages through Log::to_js 20× and verifies none are corrupted.

Both code changes are 1:1 restorations of the Zig spec's stack [256]JSValue (logger_jsc.zig / VirtualMachine.zig), reverting a Rust-port deviation; create_aggregate_error already takes &[JSValue] so the slice-of-stack-array signature lines up.

Security risks

None. This is a GC-rooting correctness fix with no auth/crypto/permissions/input-handling surface. The 2 KiB stack array is fixed-size and bounded by min(len, 256), matching the existing Zig behaviour.

Level of scrutiny

Moderate-to-high. The change itself is small and mechanical, but it lives in memory-safety-critical JSC binding code where mistakes manifest as UAF/crashes. The fact that it restores documented Zig-spec behaviour (rather than introducing new logic) lowers the risk substantially, and the bug-hunting system found nothing on the current revision — but I'd still prefer a maintainer familiar with the Rust JSC port to confirm.

Other factors

  • All earlier review threads (CodeRabbit's test-await concern; my JSTranspiler.zig leak, sibling process_fetch_log instance, and stale-description notes) are resolved. The remaining open inline is my own 🟣 FYI about production.rs, which I explicitly marked pre-existing/non-blocking and suitable for a follow-up.
  • The test was rewritten (fdd51f9) to a deterministic subprocess with GC-stress env vars and follows house style (using tempDir, stderr-on-failure guard, exitCode asserted last, explicit timeout justified by subprocess + collectContinuously).
  • github-actions flagged four potential duplicate PRs, but those targeted the pre-rewrite Zig JSTranspiler.zig arena-lifetime bug, which no longer applies after main's Rust rewrite — this PR now fixes a different (GC-rooting) bug on the same user-visible path.
  • No CODEOWNERS entry covers src/jsc/.

@robobun

robobun commented May 14, 2026

Copy link
Copy Markdown
Collaborator Author

Build #54262: same unrelated fetch-tcp-keepalive.test.ts failure on 13 x64-asan (also on #54255, #54257, #54259, #54261), plus v8-heap-snapshot.test.ts SIGKILL on 25.04 x64 (also on #54255). Neither touches the Log.to_js path; transpiler-async-error-uaf.test.ts passed on every shard.

robobun added 5 commits May 14, 2026 17:29
TransformTask.run() parses on a worker thread using an arena allocator
that is destroyed before returning. Parse error messages were allocated
in that arena but not read until then() runs on the JS thread, at which
point the memory may have been decommitted.

Clone log messages into the default allocator before the arena is freed
so the error text survives until the promise is rejected.
After merging main (which replaced the Zig runtime with Rust), the
original Zig-side arena-text UAF no longer applies. The Rust port of
Log.to_js has a different bug in the same path: it collects the
per-message BuildMessage/ResolveMessage JS cells in a heap
Vec<JSValue>, so the first cell can be swept while allocating the
second, and the resulting AggregateError visits a zapped StructureID.

Mirror the Zig spec's stack `[256]JSValue` so the conservative stack
scan keeps every cell rooted until create_aggregate_error runs.

Drop the now-dead JSTranspiler.zig change and reword the regression
test comment to describe the actual failure mode.
The module-loader multi-error path in VirtualMachine.rs had the same
heap Vec<JSValue> pattern, leaving BuildMessage/ResolveMessage cells
unrooted between allocations. Mirror the Zig spec's stack [256]JSValue
here as well.
Spawn a subprocess with BUN_JSC_useZombieMode + collectContinuously so
the collector reliably races the BuildMessage allocation loop in
Log.to_js. With 256 errors per transformSync call (capped from 300
input lines) the unrooted heap Vec<JSValue> is swept mid-loop every
run: 10/10 fail on both debug/ASAN and release before the fix, 10/10
pass after.
@robobun

robobun commented May 23, 2026

Copy link
Copy Markdown
Collaborator Author

Fuzzing hit this same root cause again, this time through the module loader: a Worker whose module failed to build with multiple errors crashed with a heap-use-after-free while the unhandled AggregateError was being printed (print_error_from_maybe_private_data reading BuildMessage.logged after the wrapper had been finalized). The process_fetch_log half of this PR is exactly what fixes it — reproduced with import() of a module with ~80 build errors under BUN_JSC_collectContinuously=1, and verified it no longer crashes with this branch.

Pushed one commit adding a regression test for that path (test/js/bun/resolve/build-error.test.ts), so both call sites that build the AggregateError are covered. Closing #31284 as a duplicate of this PR.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All prior feedback has been addressed and the new build-error.test.ts regression looks good — deferring only because this is a GC-rooting change in src/jsc/ and memory-safety fixes there warrant a human sign-off.

Extended reasoning...

Overview

The PR replaces heap Vec<JSValue> with stack [JSValue; 256] arrays in two places — Log::to_js (src/jsc/lib.rs) and process_fetch_log (src/jsc/VirtualMachine.rs) — so freshly-allocated BuildMessage/ResolveMessage cells stay visible to JSC's conservative stack scanner while subsequent cells are allocated. Two regression tests cover both call sites (transpiler async error path and module-loader multi-error import path). Since my last review, the only new change is the added test/js/bun/resolve/build-error.test.ts case, which follows established subprocess-test conventions.

Security risks

None in the traditional sense (no auth/input-validation surface). The change fixes a memory-safety bug (use-after-free of GC'd cells); it does not introduce new unsafe blocks or widen any trust boundary. The 2 KiB stack array is fixed-size and matches the Zig spec's [256]JSValue.

Level of scrutiny

High. src/jsc/ is the core JavaScriptCore binding layer, and correctness here depends on understanding JSC's conservative GC root-scanning semantics. The fix is small (~20 net lines of product code) and faithfully restores the Zig spec's stack-allocation pattern, but GC-rooting changes in the runtime are exactly the kind of thing a human with JSC expertise should confirm rather than a bot auto-approving.

Other factors

All four of my earlier inline concerns were addressed (sibling process_fetch_log fix applied, PR description updated, test made deterministic via Promise.allSettled → later reworked into a subprocess fixture with useZombieMode/collectContinuously, stale Zig comment resolved). The remaining unresolved inline (production.rs / server/mod.rs comment) was explicitly flagged as pre-existing/non-blocking follow-up work. CI passed on all shards for the new tests; the only failures noted were pre-existing flakes unrelated to this path. No CODEOWNERS concerns surfaced. The bug-hunting system found nothing this round.

Comment on lines +35 to +40
await using proc = Bun.spawn({
cmd: [bunExe(), "index.js"],
cwd: String(dir),
env: { ...bunEnv, BUN_JSC_collectContinuously: "1" },
stdout: "pipe",
stderr: "pipe",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Nit: this test sets BUN_JSC_collectContinuously: "1" unconditionally, but every other test in the repo that uses this env var — including this PR's own sibling test at transpiler-async-error-uaf.test.ts:38-44 — gates it behind !isWindows with the rationale "Windows + collectContinuously is prohibitively slow in CI and the code path is platform-agnostic." The workload here is small so it may not actually time out, but for consistency consider spreading the env conditionally (e.g. ...(isWindows ? {} : { BUN_JSC_collectContinuously: "1" })) to match the convention.

Extended reasoning...

What the issue is

The new process_fetch_log regression test added in commit 45e8e72 (test/js/bun/resolve/build-error.test.ts:38) passes env: { ...bunEnv, BUN_JSC_collectContinuously: "1" } to its spawned subprocess unconditionally, with no Windows guard and no explicit per-test timeout. This is inconsistent with the established repo convention and — notably — with the PR's own sibling test added earlier in the same PR.

The repo convention

I checked every test file in test/ that references BUN_JSC_collectContinuously. All eight other usages guard against Windows in one of two ways:

  • Conditional env var (if (!isWindows) gcEnv.BUN_JSC_collectContinuously = "1"): transpiler-async-error-uaf.test.ts:44 (this PR), require-esm-gc-roots.test.ts:47-49, jest-each-gc-root.test.ts:99
  • test.skipIf(isWindows) / describe.skipIf(isWindows): sourcetextmodule-link-gc.test.ts:19, module-children-concurrent-gc.test.ts:27, 29519.test.ts:16, 30205.test.ts:90/113/144, message-event-init-gc.test.ts:23

Each carries an inline comment explaining why — e.g. require-esm-gc-roots.test.ts says collectContinuously is ">60s for a single subprocess on x64-baseline" on Windows. The new build-error.test.ts test is the sole exception in the codebase.

Why this is internally inconsistent within the PR

The most telling point is that this PR's own other test, transpiler-async-error-uaf.test.ts:38-44, explicitly documents and applies the guard:

// Windows + collectContinuously is prohibitively slow in CI and the code
// path is platform-agnostic, so rely on zombie mode alone there.
const gcEnv: Record<string, string | undefined> = {
  ...bunEnv,
  BUN_JSC_useZombieMode: "1",
};
if (!isWindows) gcEnv.BUN_JSC_collectContinuously = "1";

So the author was clearly aware of the convention when writing the first test but did not apply it when adding the second test in the later commit (45e8e72).

Step-by-step proof

  1. On a Windows CI shard, build-error.test.ts runs the new test with no skipIf(isWindows).
  2. Bun.spawn launches bunExe() index.js with BUN_JSC_collectContinuously=1 in the environment (line 38).
  3. The subprocess parses bad.js (120 lines → ~80 build errors), constructs the AggregateError via process_fetch_log, and prints it as an unhandled rejection — all while JSC's collector runs continuously.
  4. Per the repo's own documented experience (">60s for a single subprocess on x64-baseline"), continuous collection on Windows is dramatically slower than on POSIX. The test has no explicit timeout, so it relies on the default per-test timeout.
  5. If the subprocess exceeds that default, the test fails as a timeout on Windows CI even though the product code is correct — i.e. a platform-specific flake on a UAF regression test.

Impact / why this is a nit

This is a test-quality / CI-reliability concern, not a product bug. The workload is much lighter than the sibling test (one subprocess, one import() of a 120-line file producing ~80 errors, then exit — vs. the sibling's 20×300-error transformSync loop that needed a 60s timeout), so it may well finish within the default timeout even on Windows. But the inconsistency with a unanimous repo convention and with the PR's own sibling test is worth fixing.

How to fix

Match the sibling test's pattern — import isWindows from harness and gate the env var:

env: { ...bunEnv, ...(isWindows ? {} : { BUN_JSC_collectContinuously: "1" }) },

Or, equivalently, build the env object the same way transpiler-async-error-uaf.test.ts does (and optionally add BUN_JSC_useZombieMode: "1" so Windows still gets deterministic dangling-access detection).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed — the env var is now gated behind !isWindows (with useZombieMode kept unconditionally), matching the sibling test's pattern.

Match the repo convention (and this PR's sibling test): continuous
collection is prohibitively slow on Windows CI and the code path is
platform-agnostic, so rely on zombie mode alone there.
@robobun

robobun commented May 25, 2026

Copy link
Copy Markdown
Collaborator Author

Additional data point: Fuzzilli independently hit this same use-after-free through the module loader path (process_fetch_log) — a Worker whose entry module fails to transpile with multiple parse errors crashes on the worker thread when the aggregated error is printed, reading the freed native BuildMessage (logged flag at offset 144 of the 152-byte allocation).

Deterministic reproduction on an ASAN debug build (crashes every run without this fix, passes with it):

# many-errors.js: ~60 lines like `v0: 1 2 3`, each a recoverable parse error
BUN_JSC_collectContinuously=1 bun-debug -e 'new Worker("./many-errors.js").addEventListener("error", () => {});'

The direct run (BUN_JSC_collectContinuously=1 bun-debug many-errors.js) reproduces it on the main thread too. Verified the stack-array change in this PR fixes both.

robobun added a commit that referenced this pull request Jun 5, 2026
Ported from #30671, which this branch supersedes. transformSync with 256
parse errors under zombie mode and continuous collection crashes the
unfixed ASAN build (zapped StructureID while marking the errors array)
and passes with the stack-array fix.
@robobun

robobun commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator Author

Superseded by #31874. This branch is conflicting with main after the comment rewrites in #31783, and the fuzzer independently re-found the same bug through the module loader path (process_fetch_log), which prompted a fresh fix on current main. #31874 carries the same two stack-array changes, includes this PR's transpiler regression test (as transpiler-error-gc-uaf.test.ts), adds an import-path regression test, and fixes one more instance of the same unrooted-heap-Vec shape in bake's production build.

@robobun robobun closed this Jun 5, 2026
Jarred-Sumner pushed a commit that referenced this pull request Jun 5, 2026
…eErrors (#31874)

### What

Fixes a GC use-after-free found by fuzzing (ASAN fingerprint
`heap-use-after-free`, read of size 1 at offset 144 inside a 152-byte
region: that is exactly `BuildMessage.logged` inside a freed
`Box<BuildMessage>`, confirmed via DWARF layout).

### Root cause

JSC roots JSValues through the conservative stack scan. Values that live
only in a malloc'd `Vec<JSValue>` buffer are invisible to the GC.

`process_fetch_log` (module fetch failure path) created one
`BuildMessage`/`ResolveMessage` wrapper per log message and pushed each
into a heap `Vec<JSValue>` before handing the slice to
`create_aggregate_error`. Every `create` call in that loop allocates JS
cells, so a collection can run mid-loop; the wrappers created in earlier
iterations have no reachable reference at that point, get swept, and
their finalizers free the native `BuildMessage`. Anything that touches
the stale cells afterwards (printing the unhandled rejection, reading
`.message` off `error.errors[i]`) reads freed memory. The original Zig
implementation used an on-stack `var errors_stack: [256]JSValue`
specifically so the conservative scan would root these; the port
replaced it with a heap `Vec`.

The fixing lines are the `Vec<JSValue>` to `[JSValue; 256]` changes in
`src/jsc/VirtualMachine.rs` (`process_fetch_log`) and `src/jsc/lib.rs`
(`LogJsc::to_js`, same pattern, also feeds `create_aggregate_error`).
`ast_jsc::log_to_js` already used the on-stack array and documents why.

Same bug shape in `bake/production.rs`: `css_chunk_js_strings` held
freshly created `JSString`s in a heap `Vec` across
`preload_bundled_module` calls (which evaluate JS modules). That one is
unbounded in length, so each value is kept `protect()`ed (RAII
`ProtectedJSValue`) until consumed. An audit of the other `Vec<JSValue>`
sites in the tree found them safe (values rooted elsewhere via
`MarkedArgumentBuffer`, `protect()`, `Strong`, or no allocation while
held).

### Reproduction

A module with 257 duplicate `const` declarations produces 256 log
messages. Importing it 16 times with
`BUN_JSC_slowPathAllocsBetweenGCs=100` crashed the unfixed ASAN debug
build 6/6 runs (ASAN UAF or `ASSERTION FAILED: isSymbol()` from
swept-and-reused cells, depending on timing). The fuzzer hit the same
window on a Worker thread with its `Bun.gc(true)` suffix.

### Tests

`test/js/bun/resolve/build-error.test.ts`: "import with many build
errors keeps AggregateError entries alive across GC". Fails on the
unfixed ASAN debug build (child aborts, 6/6), passes on the fixed build
(15/15 runs). It passes under a release build either way since the UAF
needs ASAN (or unlucky reuse) to become observable; the ASAN CI suites
are the enforcing ones.
`test/js/bun/transpiler/transpiler-error-gc-uaf.test.ts` covers the
`LogJsc::to_js` site directly: `Bun.Transpiler().transformSync` with 256
parse errors under `BUN_JSC_useZombieMode` and
`BUN_JSC_collectContinuously` crashes the unfixed ASAN build (zapped
StructureID while marking the errors array) and passes on this branch.
Existing `plugins.test.ts`, `transpiler.test.js`, `dev-and-prod.test.ts`
(5 PROD variants) and `framework-router.test.ts` pass.


### Linked issue

Fixes #23181

That report is the same scenario: a loop of dynamic imports over files
where some fail with build errors, with a try/catch around each. A
script mirroring its `extract-messages.ts` shape (sequential `await
import()` of modules with multiple build errors, catch block touching
the error) crashes the unfixed ASAN build 3/3 runs under GC pressure and
completes cleanly 3/3 on this branch. The original report was against
v1.2.23, whose implementation of this path differed, so the exact 1.2.23
mechanism can't be re-verified, but the reported workload no longer
crashes on current Bun with this change.



### Supersedes #30671

That earlier PR fixed the same two sites (`process_fetch_log`,
`LogJsc::to_js`) but is now conflicting with main after the comment
rewrites in #31783. This PR carries the same changes rebased on current
main, absorbs its transpiler regression test, and additionally fixes the
same unrooted-heap-Vec shape in bake's production build.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant