-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Root BuildMessage cells on the stack in Log.to_js #30671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
0aa8a92
52b8476
49ebc04
f96bca4
434fcd9
45e8e72
637e18b
8ddc9ac
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,4 @@ | ||
| import { tempDir } from "harness"; | ||
| import { bunEnv, bunExe, isWindows, tempDir } from "harness"; | ||
| import { join } from "node:path"; | ||
|
|
||
| test("BuildError is modifiable", async () => { | ||
|
|
@@ -19,6 +19,44 @@ test("BuildError is modifiable", async () => { | |
| expect(error!.message).not.toBe(message); | ||
| }); | ||
|
|
||
| test("importing a module with many build errors does not crash while reporting them", async () => { | ||
| // The AggregateError for a failed module build is assembled from one | ||
| // BuildMessage wrapper per log message (process_fetch_log). Those wrappers | ||
| // used to be collected only in a heap Vec (invisible to the conservative GC | ||
| // scan), so a GC during the loop could finalize earlier wrappers and free | ||
| // their native BuildMessage before the AggregateError was created, causing | ||
| // a use-after-free when the unhandled rejection was printed. | ||
| using dir = tempDir("build-error-many", { | ||
| // 40 declarations + 80 redeclarations -> ~80 build errors in one module | ||
| "bad.js": Array.from({ length: 120 }, (_, i) => `const x${i % 40} = 1;`).join("\n"), | ||
| "index.js": `import("./bad.js");`, | ||
| }); | ||
|
|
||
| // Windows + collectContinuously is prohibitively slow in CI and the code | ||
| // path is platform-agnostic, so rely on zombie mode alone there. | ||
| const gcEnv: Record<string, string | undefined> = { | ||
| ...bunEnv, | ||
| BUN_JSC_useZombieMode: "1", | ||
| }; | ||
| if (!isWindows) gcEnv.BUN_JSC_collectContinuously = "1"; | ||
|
|
||
| await using proc = Bun.spawn({ | ||
| cmd: [bunExe(), "index.js"], | ||
| cwd: String(dir), | ||
| env: gcEnv, | ||
| stdout: "pipe", | ||
| stderr: "pipe", | ||
|
Comment on lines
+43
to
+48
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 Nit: this test sets Extended reasoning...What the issue isThe new The repo conventionI checked every test file in
Each carries an inline comment explaining why — e.g. Why this is internally inconsistent within the PRThe most telling point is that this PR's own other test, // Windows + collectContinuously is prohibitively slow in CI and the code
// path is platform-agnostic, so rely on zombie mode alone there.
const gcEnv: Record<string, string | undefined> = {
...bunEnv,
BUN_JSC_useZombieMode: "1",
};
if (!isWindows) gcEnv.BUN_JSC_collectContinuously = "1";So the author was clearly aware of the convention when writing the first test but did not apply it when adding the second test in the later commit (45e8e72). Step-by-step proof
Impact / why this is a nitThis is a test-quality / CI-reliability concern, not a product bug. The workload is much lighter than the sibling test (one subprocess, one How to fixMatch the sibling test's pattern — import env: { ...bunEnv, ...(isWindows ? {} : { BUN_JSC_collectContinuously: "1" }) },Or, equivalently, build the env object the same way
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Addressed — the env var is now gated behind |
||
| }); | ||
|
|
||
| const [stderr, exitCode] = await Promise.all([proc.stderr.text(), proc.exited]); | ||
|
|
||
| // Every error in the AggregateError should have been printed. | ||
| expect(stderr).toContain('"x0" has already been declared'); | ||
| expect(stderr).toContain('"x39" has already been declared'); | ||
| // Unhandled rejection -> clean exit with code 1, not a crash. | ||
| expect(exitCode).toBe(1); | ||
| }); | ||
|
|
||
| test("BuildMessage finalize frees with the same allocator it was created with", async () => { | ||
| // BuildMessage.create() clones the message with the passed allocator | ||
| // but finalize() was freeing it with bun.default_allocator and never | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,61 @@ | ||
| // Log.to_js (used by Bun.Transpiler().transform/transformSync when rejecting | ||
| // with parse errors, and by the module loader via process_fetch_log) builds | ||
| // an AggregateError by allocating one BuildMessage JS cell per log entry. The | ||
| // Rust port collected those cells in a heap Vec<JSValue>, which the | ||
| // conservative GC scan does not see, so an earlier cell could be swept while | ||
| // allocating a later one and the AggregateError would reference a zapped | ||
| // StructureID. | ||
| // | ||
| // useZombieMode scribbles 0xbadbeef0 over swept cells so the dangling access | ||
| // manifests deterministically; collectContinuously races the collector against | ||
| // the allocation loop so it reliably sweeps mid-loop. | ||
| import { expect, test } from "bun:test"; | ||
| import { bunEnv, bunExe, isWindows, tempDir } from "harness"; | ||
|
|
||
| const fixture = ` | ||
| const src = Array.from({ length: 300 }, () => "a b").join("\\n"); | ||
| const t = new Bun.Transpiler(); | ||
| for (let i = 0; i < 20; i++) { | ||
| let err; | ||
| try { t.transformSync(src); } catch (e) { err = e; } | ||
| if (!(err instanceof AggregateError)) throw new Error("not AggregateError: " + err); | ||
| if (err.errors.length !== 256) throw new Error("wrong count: " + err.errors.length); | ||
| for (const m of err.errors) { | ||
| const msg = m.message; | ||
| if (msg !== 'Expected ";" but found "b"') { | ||
| throw new Error("corrupt BuildMessage: " + JSON.stringify(typeof msg) + " " + String(msg).slice(0, 80)); | ||
| } | ||
| } | ||
| } | ||
| console.log("OK"); | ||
| `; | ||
|
|
||
| test("Log.to_js roots BuildMessage cells across allocation", async () => { | ||
| using dir = tempDir("log-to-js-gc-root", { | ||
| "fixture.js": fixture, | ||
| }); | ||
|
|
||
| // Windows + collectContinuously is prohibitively slow in CI and the code | ||
| // path is platform-agnostic, so rely on zombie mode alone there. | ||
| const gcEnv: Record<string, string | undefined> = { | ||
| ...bunEnv, | ||
| BUN_JSC_useZombieMode: "1", | ||
| }; | ||
| if (!isWindows) gcEnv.BUN_JSC_collectContinuously = "1"; | ||
|
|
||
| await using proc = Bun.spawn({ | ||
| cmd: [bunExe(), "fixture.js"], | ||
| env: gcEnv, | ||
| cwd: String(dir), | ||
| stderr: "pipe", | ||
| stdout: "pipe", | ||
| }); | ||
|
|
||
| const [stdout, stderr, exitCode] = await Promise.all([proc.stdout.text(), proc.stderr.text(), proc.exited]); | ||
|
|
||
| if (exitCode !== 0) { | ||
| expect(stderr).toBe(""); | ||
| } | ||
| expect(stdout.trim()).toBe("OK"); | ||
| expect(exitCode).toBe(0); | ||
| }, 60_000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟣 FYI / pre-existing: the same heap-
Vec<JSValue>GC-rooting pattern this PR fixes (and now documents as broken) also exists atsrc/runtime/bake/production.rs:966-985—css_chunk_js_strings: Vec<JSValue>is filled with freshly-allocatedJSStringcells, then sixcreate_empty_arraycalls plus a per-route allocation loop run before the Vec is read at lines 1109/1127/1150, so any GC in that window can sweep the unrooted strings. Unlike here,css_chunks_countis unbounded so a stack[JSValue; 256]won't work — this needsMarkedArgumentBufferor per-cell.protect(). Not blocking (the PR doesn't touchproduction.rsand the Zig spec atproduction.zig:541has the same heap-alloc pattern), but worth batching with the already-flaggedVirtualMachine.rs:2501follow-up. Separately, the inline comment atsrc/runtime/server/mod.rs:896claiming "the conservative GC scan reaches the heap allocation as well as the stack" is now directly contradicted by the new comment here and should be reworded (that site is incidentally safe only because the values are also stack-resident).Extended reasoning...
What the bug is
The new comment this PR adds at
lib.rs:2121-2123documents that storing freshly-allocated JS cells in a heapVec<JSValue>leaves them invisible to JSC's conservative stack scanner. An earlier review comment already flagssrc/jsc/VirtualMachine.rs:2501as a sibling instance; another unfixed instance of the same bug class — with a substantially larger GC window — exists atsrc/runtime/bake/production.rs:966-1150.The specific code path
The fill loop (971-985) writes each
BunString::create_format(...).to_js(global)?result — a freshly-allocatedJSStringcell — into a heap Vec slot via*str = .... The loop variablestris&mut JSValuepointing into the heap buffer, not a stack copy; theBunStringtemporary drops at end-of-statement, so itsWTF::StringImpl's sole remaining owner is the JSString cell, and the JSString cell's sole reference lives in the malloc-heap Vec buffer.After the fill loop, lines 989-1015 perform six consecutive
JSValue::create_empty_array(global, ...)allocations, followed by a per-route loop (1018-1158) containing many more JSC allocations (create_empty_arrayat 1086/1087,preload_bundled_module,put_index), beforecss_chunk_js_strings[...]is first read at lines 1109/1127/1150.Why existing safeguards don't help
JSC's conservative scanner walks the machine stack and registers; it sees the Vec's
{ptr, len, cap}triple on the stack, butptrpoints at a mimalloc allocation, not a JSCMarkedBlock, so the scanner does not follow it. There is no.protect(), noMarkedArgumentBuffer, and — unlikeserver/mod.rs:898— no separate stack local that incidentally roots the cells. Contrast with the safecreate_array_from_iterpattern (JSValue.rs) where each cell is immediatelyput_index-ed into a stack-rootedJSArraybefore the next allocation.Step-by-step proof
bun build --app) produces ≥2 CSS chunks, socss_chunks_count >= 2.BunString::create_format(...).to_js(global)allocatesJSStringcell A;*str = Astores it atcss_chunk_js_strings[0](heap). TheBunStringtemporary drops; nothing on the stack now references A..to_js(global)allocates B. Even within the fill loop, this allocation can already trigger GC and sweep A. Either way, the loop completes with all cells rooted only by the heap Vec.create_empty_array(global, navigatable_routes.len())allocates aJSArray. Suppose this triggers a full GC.css_chunk_js_strings's{ptr, len, cap}, recognisesptras non-JSC-heap, and does not follow it. A and B have no roots and are swept.preload_bundled_module(which evaluates JS).styles.put_index(global, css_file_count, css_chunk_js_strings[idx])reads a zapped cell and installs it into a liveJSArray→ASSERTION FAILED: decontaminate()(debug) / SEGV inSlotVisitor::visitChildren(release) — the same crash signature this PR fixes forLog::to_js.Impact
A
bun build --app/ bake static-site build with multiple CSS chunks under GC pressure can crash with a zapped-StructureID assertion or segfault when populating per-route stylesheet arrays. The window is large (six guaranteed allocations plus an unbounded per-route loop including JS evaluation between fill and read), so it is more exposed than theLog::to_jsinstance.Why pre-existing, not blocking
This is flagged as pre-existing rather than normal severity because: (1) the PR does not touch, call into, or otherwise interact with
src/runtime/bake/production.rs(different crate, different subsystem); (2) unlike theVirtualMachine.rs:2501sibling, this is not a Rust-port deviation — the Zig spec atproduction.zig:541also heap-allocates viatry allocator.alloc(JSValue, css_chunks_count), so the Rust port faithfully reproduces a latent Zig-side hazard; (3) the directly-related, in-crate, same-consumer sibling (VirtualMachine.rs:2501) is already flagged separately. It's surfaced here so it can be batched into the same follow-up.How to fix
css_chunks_countis unbounded, so the fixed-size stack-array fix from this PR does not directly apply. Either:MarkedArgumentBuffer(already exposed atbun_jsc::MarkedArgumentBuffer) to hold the cells, or.protect()each cell afterto_js()and unprotect via a drop guard (theVirtualMachine.rs:5840pattern), orJSArrayviacreate_array_from_iterand index into that instead of a RustVec.Side note: misleading comment at server/mod.rs:896
src/runtime/server/mod.rs:896-897carries an inline comment asserting "The conservative GC scan reaches the heap allocation as well as the stack, so a small Vec is sound." That claim is directly contradicted by this PR's newlib.rs:2121-2123comment and by the crash this PR fixes. That particular site happens to be safe —prepared.js_requestand theextra_argsit copies from are stack-resident — but the stated rationale is wrong and will mislead future readers into thinking heapVec<JSValue>is generally OK. Worth correcting the comment when the follow-up lands.