Skip to content
21 changes: 11 additions & 10 deletions src/jsc/VirtualMachine.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2574,23 +2574,24 @@ pub fn process_fetch_log(
}

_ => {
// Spec caps at 256 (`var errors_stack: [256]JSValue`). PERF(port):
// was inline switch — Zig stack-allocated; we heap-allocate the
// exact `len` since `JSValue` is a thin u64 and 256 * 8 B = 2 KiB
// is fine either way, but `Vec` avoids the uninit-array dance.
let len = log.msgs.len().min(256);
let mut errors: alloc::vec::Vec<JSValue> = alloc::vec::Vec::with_capacity(len);
for msg in log.msgs.drain(..len) {
let v = match msg.metadata {
// Spec caps at 256 (`var errors_stack: [256]JSValue`). Must be a
// stack array so the conservative GC scan keeps each freshly
// allocated BuildMessage/ResolveMessage cell alive while we
// allocate the next one; a heap `Vec<JSValue>` leaves them
// unrooted and they can be swept mid-loop.
let mut errors = [JSValue::UNDEFINED; 256];
let len = log.msgs.len().min(errors.len());
for (i, msg) in log.msgs.drain(..len).enumerate() {
errors[i] = match msg.metadata {
bun_ast::Metadata::Build => take(BuildMessage::create(global_this, msg)),
bun_ast::Metadata::Resolve(_) => take(ResolveMessage::create(
global_this,
&msg,
referrer_utf8.slice(),
)),
};
errors.push(v);
}
let errors = &errors[..len];

// C++ `Zig::toString` does `createWithoutCopying`, so the buffer
// must outlive the AggregateError. Mark it global so JSC adopts it
Expand All @@ -2604,7 +2605,7 @@ pub fn process_fetch_log(
message.mark_global();
*ret = ErrorableResolvedSource::err(
err,
take(global_this.create_aggregate_error(&errors, &message)),
take(global_this.create_aggregate_error(errors, &message)),
);
}
}
Expand Down
13 changes: 8 additions & 5 deletions src/jsc/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2138,17 +2138,20 @@ impl LogJsc for bun_ast::Log {
fn to_js(&self, global: &JSGlobalObject, message: &str) -> JsResult<JSValue> {
let msgs = &self.msgs;
// Spec: `@min(msgs.len, errors_stack.len)` — errors_stack is `[256]JSValue`.
let count = msgs.len().min(256);
// Must be a stack array so the conservative GC scan keeps the freshly
// allocated BuildMessage/ResolveMessage cells alive while we allocate
// the next one; a heap `Vec<JSValue>` would leave them unrooted.
Comment on lines +2141 to +2143

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟣 FYI / pre-existing: the same heap-Vec<JSValue> GC-rooting pattern this PR fixes (and now documents as broken) also exists at src/runtime/bake/production.rs:966-985css_chunk_js_strings: Vec<JSValue> is filled with freshly-allocated JSString cells, then six create_empty_array calls plus a per-route allocation loop run before the Vec is read at lines 1109/1127/1150, so any GC in that window can sweep the unrooted strings. Unlike here, css_chunks_count is unbounded so a stack [JSValue; 256] won't work — this needs MarkedArgumentBuffer or per-cell .protect(). Not blocking (the PR doesn't touch production.rs and the Zig spec at production.zig:541 has the same heap-alloc pattern), but worth batching with the already-flagged VirtualMachine.rs:2501 follow-up. Separately, the inline comment at src/runtime/server/mod.rs:896 claiming "the conservative GC scan reaches the heap allocation as well as the stack" is now directly contradicted by the new comment here and should be reworded (that site is incidentally safe only because the values are also stack-resident).

Extended reasoning...

What the bug is

The new comment this PR adds at lib.rs:2121-2123 documents that storing freshly-allocated JS cells in a heap Vec<JSValue> leaves them invisible to JSC's conservative stack scanner. An earlier review comment already flags src/jsc/VirtualMachine.rs:2501 as a sibling instance; another unfixed instance of the same bug class — with a substantially larger GC window — exists at src/runtime/bake/production.rs:966-1150.

The specific code path

// production.rs:966
let mut css_chunk_js_strings: Vec<JSValue> = vec![JSValue::ZERO; css_chunks_count];
...
for (output_file, str) in ... .zip(css_chunk_js_strings.iter_mut()) {
    *str = BunString::create_format(format_args!("{}{}", ...))
        .to_js(global)?;   // allocates a JSString cell
}

The fill loop (971-985) writes each BunString::create_format(...).to_js(global)? result — a freshly-allocated JSString cell — into a heap Vec slot via *str = .... The loop variable str is &mut JSValue pointing into the heap buffer, not a stack copy; the BunString temporary drops at end-of-statement, so its WTF::StringImpl's sole remaining owner is the JSString cell, and the JSString cell's sole reference lives in the malloc-heap Vec buffer.

After the fill loop, lines 989-1015 perform six consecutive JSValue::create_empty_array(global, ...) allocations, followed by a per-route loop (1018-1158) containing many more JSC allocations (create_empty_array at 1086/1087, preload_bundled_module, put_index), before css_chunk_js_strings[...] is first read at lines 1109/1127/1150.

Why existing safeguards don't help

JSC's conservative scanner walks the machine stack and registers; it sees the Vec's {ptr, len, cap} triple on the stack, but ptr points at a mimalloc allocation, not a JSC MarkedBlock, so the scanner does not follow it. There is no .protect(), no MarkedArgumentBuffer, and — unlike server/mod.rs:898 — no separate stack local that incidentally roots the cells. Contrast with the safe create_array_from_iter pattern (JSValue.rs) where each cell is immediately put_index-ed into a stack-rooted JSArray before the next allocation.

Step-by-step proof

  1. A bake production / static-site build (bun build --app) produces ≥2 CSS chunks, so css_chunks_count >= 2.
  2. Fill-loop iteration 0: BunString::create_format(...).to_js(global) allocates JSString cell A; *str = A stores it at css_chunk_js_strings[0] (heap). The BunString temporary drops; nothing on the stack now references A.
  3. Fill-loop iteration 1: .to_js(global) allocates B. Even within the fill loop, this allocation can already trigger GC and sweep A. Either way, the loop completes with all cells rooted only by the heap Vec.
  4. Line 989: create_empty_array(global, navigatable_routes.len()) allocates a JSArray. Suppose this triggers a full GC.
  5. The conservative scanner walks the stack/registers, finds css_chunk_js_strings's {ptr, len, cap}, recognises ptr as non-JSC-heap, and does not follow it. A and B have no roots and are swept.
  6. Lines 990-1087 perform ~7 more allocating calls; the per-route loop runs preload_bundled_module (which evaluates JS).
  7. Line 1109: styles.put_index(global, css_file_count, css_chunk_js_strings[idx]) reads a zapped cell and installs it into a live JSArrayASSERTION FAILED: decontaminate() (debug) / SEGV in SlotVisitor::visitChildren (release) — the same crash signature this PR fixes for Log::to_js.

Impact

A bun build --app / bake static-site build with multiple CSS chunks under GC pressure can crash with a zapped-StructureID assertion or segfault when populating per-route stylesheet arrays. The window is large (six guaranteed allocations plus an unbounded per-route loop including JS evaluation between fill and read), so it is more exposed than the Log::to_js instance.

Why pre-existing, not blocking

This is flagged as pre-existing rather than normal severity because: (1) the PR does not touch, call into, or otherwise interact with src/runtime/bake/production.rs (different crate, different subsystem); (2) unlike the VirtualMachine.rs:2501 sibling, this is not a Rust-port deviation — the Zig spec at production.zig:541 also heap-allocates via try allocator.alloc(JSValue, css_chunks_count), so the Rust port faithfully reproduces a latent Zig-side hazard; (3) the directly-related, in-crate, same-consumer sibling (VirtualMachine.rs:2501) is already flagged separately. It's surfaced here so it can be batched into the same follow-up.

How to fix

css_chunks_count is unbounded, so the fixed-size stack-array fix from this PR does not directly apply. Either:

  • Use MarkedArgumentBuffer (already exposed at bun_jsc::MarkedArgumentBuffer) to hold the cells, or
  • .protect() each cell after to_js() and unprotect via a drop guard (the VirtualMachine.rs:5840 pattern), or
  • Build the strings directly into a stack-rooted JSArray via create_array_from_iter and index into that instead of a Rust Vec.

Side note: misleading comment at server/mod.rs:896

src/runtime/server/mod.rs:896-897 carries an inline comment asserting "The conservative GC scan reaches the heap allocation as well as the stack, so a small Vec is sound." That claim is directly contradicted by this PR's new lib.rs:2121-2123 comment and by the crash this PR fixes. That particular site happens to be safe — prepared.js_request and the extra_args it copies from are stack-resident — but the stated rationale is wrong and will mislead future readers into thinking heap Vec<JSValue> is generally OK. Worth correcting the comment when the follow-up lands.

let mut errors_stack = [JSValue::UNDEFINED; 256];
Comment thread
claude[bot] marked this conversation as resolved.
let count = msgs.len().min(errors_stack.len());
Comment thread
claude[bot] marked this conversation as resolved.
match count {
0 => Ok(JSValue::UNDEFINED),
1 => msg_to_js(&msgs[0], global),
_ => {
let mut errors_stack: Vec<JSValue> = Vec::with_capacity(count);
for msg in &msgs[0..count] {
errors_stack.push(msg_to_js(msg, global)?);
for (i, msg) in msgs[0..count].iter().enumerate() {
errors_stack[i] = msg_to_js(msg, global)?;
}
let out = bun_core::ZigString::init(message.as_bytes());
global.create_aggregate_error(&errors_stack, &out)
global.create_aggregate_error(&errors_stack[0..count], &out)
}
}
}
Expand Down
40 changes: 39 additions & 1 deletion test/js/bun/resolve/build-error.test.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import { tempDir } from "harness";
import { bunEnv, bunExe, isWindows, tempDir } from "harness";
import { join } from "node:path";

test("BuildError is modifiable", async () => {
Expand All @@ -19,6 +19,44 @@ test("BuildError is modifiable", async () => {
expect(error!.message).not.toBe(message);
});

test("importing a module with many build errors does not crash while reporting them", async () => {
// The AggregateError for a failed module build is assembled from one
// BuildMessage wrapper per log message (process_fetch_log). Those wrappers
// used to be collected only in a heap Vec (invisible to the conservative GC
// scan), so a GC during the loop could finalize earlier wrappers and free
// their native BuildMessage before the AggregateError was created, causing
// a use-after-free when the unhandled rejection was printed.
using dir = tempDir("build-error-many", {
// 40 declarations + 80 redeclarations -> ~80 build errors in one module
"bad.js": Array.from({ length: 120 }, (_, i) => `const x${i % 40} = 1;`).join("\n"),
"index.js": `import("./bad.js");`,
});

// Windows + collectContinuously is prohibitively slow in CI and the code
// path is platform-agnostic, so rely on zombie mode alone there.
const gcEnv: Record<string, string | undefined> = {
...bunEnv,
BUN_JSC_useZombieMode: "1",
};
if (!isWindows) gcEnv.BUN_JSC_collectContinuously = "1";

await using proc = Bun.spawn({
cmd: [bunExe(), "index.js"],
cwd: String(dir),
env: gcEnv,
stdout: "pipe",
stderr: "pipe",
Comment on lines +43 to +48

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Nit: this test sets BUN_JSC_collectContinuously: "1" unconditionally, but every other test in the repo that uses this env var — including this PR's own sibling test at transpiler-async-error-uaf.test.ts:38-44 — gates it behind !isWindows with the rationale "Windows + collectContinuously is prohibitively slow in CI and the code path is platform-agnostic." The workload here is small so it may not actually time out, but for consistency consider spreading the env conditionally (e.g. ...(isWindows ? {} : { BUN_JSC_collectContinuously: "1" })) to match the convention.

Extended reasoning...

What the issue is

The new process_fetch_log regression test added in commit 45e8e72 (test/js/bun/resolve/build-error.test.ts:38) passes env: { ...bunEnv, BUN_JSC_collectContinuously: "1" } to its spawned subprocess unconditionally, with no Windows guard and no explicit per-test timeout. This is inconsistent with the established repo convention and — notably — with the PR's own sibling test added earlier in the same PR.

The repo convention

I checked every test file in test/ that references BUN_JSC_collectContinuously. All eight other usages guard against Windows in one of two ways:

  • Conditional env var (if (!isWindows) gcEnv.BUN_JSC_collectContinuously = "1"): transpiler-async-error-uaf.test.ts:44 (this PR), require-esm-gc-roots.test.ts:47-49, jest-each-gc-root.test.ts:99
  • test.skipIf(isWindows) / describe.skipIf(isWindows): sourcetextmodule-link-gc.test.ts:19, module-children-concurrent-gc.test.ts:27, 29519.test.ts:16, 30205.test.ts:90/113/144, message-event-init-gc.test.ts:23

Each carries an inline comment explaining why — e.g. require-esm-gc-roots.test.ts says collectContinuously is ">60s for a single subprocess on x64-baseline" on Windows. The new build-error.test.ts test is the sole exception in the codebase.

Why this is internally inconsistent within the PR

The most telling point is that this PR's own other test, transpiler-async-error-uaf.test.ts:38-44, explicitly documents and applies the guard:

// Windows + collectContinuously is prohibitively slow in CI and the code
// path is platform-agnostic, so rely on zombie mode alone there.
const gcEnv: Record<string, string | undefined> = {
  ...bunEnv,
  BUN_JSC_useZombieMode: "1",
};
if (!isWindows) gcEnv.BUN_JSC_collectContinuously = "1";

So the author was clearly aware of the convention when writing the first test but did not apply it when adding the second test in the later commit (45e8e72).

Step-by-step proof

  1. On a Windows CI shard, build-error.test.ts runs the new test with no skipIf(isWindows).
  2. Bun.spawn launches bunExe() index.js with BUN_JSC_collectContinuously=1 in the environment (line 38).
  3. The subprocess parses bad.js (120 lines → ~80 build errors), constructs the AggregateError via process_fetch_log, and prints it as an unhandled rejection — all while JSC's collector runs continuously.
  4. Per the repo's own documented experience (">60s for a single subprocess on x64-baseline"), continuous collection on Windows is dramatically slower than on POSIX. The test has no explicit timeout, so it relies on the default per-test timeout.
  5. If the subprocess exceeds that default, the test fails as a timeout on Windows CI even though the product code is correct — i.e. a platform-specific flake on a UAF regression test.

Impact / why this is a nit

This is a test-quality / CI-reliability concern, not a product bug. The workload is much lighter than the sibling test (one subprocess, one import() of a 120-line file producing ~80 errors, then exit — vs. the sibling's 20×300-error transformSync loop that needed a 60s timeout), so it may well finish within the default timeout even on Windows. But the inconsistency with a unanimous repo convention and with the PR's own sibling test is worth fixing.

How to fix

Match the sibling test's pattern — import isWindows from harness and gate the env var:

env: { ...bunEnv, ...(isWindows ? {} : { BUN_JSC_collectContinuously: "1" }) },

Or, equivalently, build the env object the same way transpiler-async-error-uaf.test.ts does (and optionally add BUN_JSC_useZombieMode: "1" so Windows still gets deterministic dangling-access detection).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed — the env var is now gated behind !isWindows (with useZombieMode kept unconditionally), matching the sibling test's pattern.

});

const [stderr, exitCode] = await Promise.all([proc.stderr.text(), proc.exited]);

// Every error in the AggregateError should have been printed.
expect(stderr).toContain('"x0" has already been declared');
expect(stderr).toContain('"x39" has already been declared');
// Unhandled rejection -> clean exit with code 1, not a crash.
expect(exitCode).toBe(1);
});

test("BuildMessage finalize frees with the same allocator it was created with", async () => {
// BuildMessage.create() clones the message with the passed allocator
// but finalize() was freeing it with bun.default_allocator and never
Expand Down
61 changes: 61 additions & 0 deletions test/js/bun/transpiler/transpiler-async-error-uaf.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
// Log.to_js (used by Bun.Transpiler().transform/transformSync when rejecting
// with parse errors, and by the module loader via process_fetch_log) builds
// an AggregateError by allocating one BuildMessage JS cell per log entry. The
// Rust port collected those cells in a heap Vec<JSValue>, which the
// conservative GC scan does not see, so an earlier cell could be swept while
// allocating a later one and the AggregateError would reference a zapped
// StructureID.
//
// useZombieMode scribbles 0xbadbeef0 over swept cells so the dangling access
// manifests deterministically; collectContinuously races the collector against
// the allocation loop so it reliably sweeps mid-loop.
import { expect, test } from "bun:test";
import { bunEnv, bunExe, isWindows, tempDir } from "harness";

const fixture = `
const src = Array.from({ length: 300 }, () => "a b").join("\\n");
const t = new Bun.Transpiler();
for (let i = 0; i < 20; i++) {
let err;
try { t.transformSync(src); } catch (e) { err = e; }
if (!(err instanceof AggregateError)) throw new Error("not AggregateError: " + err);
if (err.errors.length !== 256) throw new Error("wrong count: " + err.errors.length);
for (const m of err.errors) {
const msg = m.message;
if (msg !== 'Expected ";" but found "b"') {
throw new Error("corrupt BuildMessage: " + JSON.stringify(typeof msg) + " " + String(msg).slice(0, 80));
}
}
}
console.log("OK");
`;

test("Log.to_js roots BuildMessage cells across allocation", async () => {
using dir = tempDir("log-to-js-gc-root", {
"fixture.js": fixture,
});

// Windows + collectContinuously is prohibitively slow in CI and the code
// path is platform-agnostic, so rely on zombie mode alone there.
const gcEnv: Record<string, string | undefined> = {
...bunEnv,
BUN_JSC_useZombieMode: "1",
};
if (!isWindows) gcEnv.BUN_JSC_collectContinuously = "1";

await using proc = Bun.spawn({
cmd: [bunExe(), "fixture.js"],
env: gcEnv,
cwd: String(dir),
stderr: "pipe",
stdout: "pipe",
});

const [stdout, stderr, exitCode] = await Promise.all([proc.stdout.text(), proc.stderr.text(), proc.exited]);

if (exitCode !== 0) {
expect(stderr).toBe("");
}
expect(stdout.trim()).toBe("OK");
expect(exitCode).toBe(0);
}, 60_000);
Loading