Skip to content

strings: bind StringJoiner borrowed segments to a lifetime#31493

Merged
Jarred-Sumner merged 4 commits into
mainfrom
claude/stringjoiner-lifetime
May 28, 2026
Merged

strings: bind StringJoiner borrowed segments to a lifetime#31493
Jarred-Sumner merged 4 commits into
mainfrom
claude/stringjoiner-lifetime

Conversation

@Jarred-Sumner

Copy link
Copy Markdown
Collaborator

StringJoiner stored every borrowed segment as a lifetime-erased raw fat pointer (RawSlice<u8> plus an owns_slice flag), with blanket unsafe impl Send/Sync on both the joiner and its nodes and a manual Drop to free owned slices. Nothing tied a pushed segment to the data it pointed at, so every push() call site relied on an undocumented "must outlive done()" convention.

What changed

  • StringJoiner and Node take a lifetime parameter; Node is now enum { Borrowed(&'a [u8]), Owned(Box<[u8]>) }. This deletes the manual Drop, the four unsafe impl Send/Sync, the RawSlice round-trips, and Watcher.input's &'static workaround.
  • push() / push_static() borrow segments for the joiner's lifetime, so the sourcemap join (append_source_map_chunk), the dev-server sourcemap store, and most bundler/Blob/FormData call sites are now plain borrow-checked code.
  • The bundler's deferred path stores the joiner in Chunk.intermediate_output (no lifetime parameter), so each of the four break_output_into_pieces callers (postProcessJSChunk, postProcessCSSChunk, postProcessHTMLChunk, MetafileBuilder::generate) re-tags the joiner with a new documented unsafe StringJoiner::detach_lifetime() immediately before the call, with a SAFETY comment naming exactly which storage the borrowed nodes point into and why it outlives the stored joiner.
  • Blob constructor / FormData serialization keep borrowing (not copying) buffers whose liveness comes from GC rooting, via bun_ptr::detach_lifetime with per-site SAFETY comments — including the FormData file-entry Data::Bytes site and the top-level typed-array arm, each with its own justification.
  • Fixes a latent use-after-free in postProcessJSChunk's InternalBakeDev arm: it pushed the bytes of a block-local MutableString by reference into a joiner that is stored in chunk.intermediate_output and read later. The bytes are now moved into the joiner (take_slice() + push_owned).
  • Adds unit tests for the node-storage rewrite.

Net unsafe accounting

Removed: 4 blanket unsafe impl Send/Sync (which made every push call site unauditable), the unsafe free in Node::drop, the unsafe RawSlice::from_raw in push_owned, and the lifetime-erased RawSlice storage itself.

Added: one unsafe fn detach_lifetime() on StringJoiner, 4 detach_lifetime() call sites in the bundler, and 6 bun_ptr::detach_lifetime slice re-tags in Blob/FormData — 10 narrowly-scoped expressions, each with a SAFETY comment naming the owner of the bytes and why it outlives the joiner's last read.

The trade is deliberate: instead of two type-level unsafe impls that silence the compiler for every present and future call site, the remaining unsafety is at the specific points where the borrow genuinely cannot be expressed.

Why lifetimes alone can't cover the remaining sites

  • Chunk self-reference: the joiner stored in Chunk.intermediate_output borrows from the same chunk's compile_results_for_chunk (and from linker-graph data). A struct field cannot name a lifetime that refers to a sibling field of the same struct, so expressing this with pure lifetimes would mean copying every compiled segment at join time or restructuring chunk storage.
  • GC-kept-alive Blob/FormData buffers: the pushed bytes are owned by JS-heap objects (ArrayBuffers, Blob stores, DOMFormData entries) whose liveness over the push→done() span is guaranteed by GC rooting, not by any Rust owner with a nameable lifetime. Borrow-checking those would require cloning each buffer; the prior code already borrowed them, so this keeps the no-copy behavior with the justification written down.

Testing

  • cargo check -p bun_bin, bun run rust:check-all (10/10 targets), debug build smoke test.
  • Bundler: bundler_edgecase, bundler_npm, bundler_minify, bundler_splitting — all pass.
  • Bake dev server: test/bake/dev/esm.test.ts, test/bake/dev/html.test.ts — pass.
  • FormData/Blob: FormData.test.ts, FormData-multipart-serialization.test.ts, blob.test.ts — pass except two pre-existing debug-build failures (a 5s-timeout GC test and an RSS-threshold test) that fail identically on a main debug build.
  • Sourcemaps: test/js/bun/sourcemap/, compile-sourcemap-internal.test.ts — pass. The bun-build-compile*.test.ts standalone-executable tests time out at 5s when spawning debug-build binaries on this machine; the same tests fail identically on a main debug build (0 new failures).
  • New StringJoiner unit tests compile under cargo check; cargo test -p bun_core does not link standalone on main either (pre-existing).

StringJoiner stored borrowed slices as lifetime-erased raw fat pointers
(RawSlice + owns_slice flag), so nothing tied a pushed segment to the data
it pointed at. Give the joiner a lifetime parameter instead:

- Node becomes enum { Borrowed(&'a [u8]), Owned(Box<[u8]>) }, which drops
  the manual Drop impl, the four unsafe Send/Sync impls, and the RawSlice
  round-trips; Watcher.input loses its &'static workaround.
- push()/push_static() now borrow segments for the joiner's lifetime, so
  the bundler post-processing, sourcemap join, dev-server sourcemap store,
  Blob constructor, and FormData serializer call sites are borrow-checked.
- The deferred bundler path (Chunk.intermediate_output has no lifetime
  parameter) re-tags the joiner via a new documented unsafe
  StringJoiner::detach_lifetime() right before break_output_into_pieces;
  Blob/FormData pushes of GC-kept-alive buffers use bun_ptr::detach_lifetime
  with SAFETY comments instead of copying.
- postProcessJSChunk's InternalBakeDev arm pushed a block-local
  MutableString's bytes by reference; the deferred joiner could read that
  buffer after it was freed. Move the bytes into the joiner instead.
- Add unit tests for the node storage rewrite.
@robobun

robobun commented May 28, 2026

Copy link
Copy Markdown
Collaborator
Updated 9:57 PM PT - May 27th, 2026

@Jarred-Sumner, your commit f12dabcb4899a5b10cbbf7400a1489ceef267d51 passed in Build #58565! 🎉


🧪   To try this PR locally:

bunx bun-pr 31493

That installs a local version of the PR into your bun-31493 executable, so you can run:

bun-31493 --bun

@github-actions

Copy link
Copy Markdown
Contributor

Found 2 issues this PR may fix:

  1. Dev server crash #18502 - Dev server crash with segfault in js_printer/runQuotedSourceContents during HMR — the stack trace shows a crash in the linker/source-map task thread, which is exactly the code path where the StringJoiner use-after-free in postProcessJSChunk's InternalBakeDev arm could cause a crash
  2. unsafe: InternalSourceMap.free_owned() — conditional ownership with no runtime guard #30768 - InternalSourceMap.free_owned() conditional ownership with no runtime guard — same class of unsoundness (raw-pointer-plus-flag instead of enum discriminating Borrowed/Owned) that this PR fixes for StringJoiner

If this is helpful, copy the block below into the PR description to auto-close these issues on merge.

Fixes #18502
Fixes #30768

🤖 Generated with Claude Code

@coderabbitai

coderabbitai Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f4643683-de81-414c-aea8-c0b8b0768cf8

📥 Commits

Reviewing files that changed from the base of the PR and between 487fd28 and f12dabc.

📒 Files selected for processing (3)
  • src/bundler/Chunk.rs
  • src/runtime/webcore/Blob.rs
  • test/js/web/fetch/blob.test.ts

Walkthrough

This PR parameterizes StringJoiner by lifetime ('a), replaces RawSlice+ownership bits with Node<'a> (Borrowed/Owned), centralizes push bookkeeping, adds unsafe detach_lifetime() to retag borrows as 'static, and propagates the lifetime changes through bundler, Blob/FormData, and a few external APIs with accompanying SAFETY comments and tests.

Changes

StringJoiner lifetime parameterization and bundler integration

Layer / File(s) Summary
StringJoiner struct and node representation refactoring
src/bun_core/string/StringJoiner.rs
StringJoiner<'a> and Watcher<'a> now carry lifetime parameters. Node<'a> enum stores Borrowed(&'a [u8]) and Owned(Box<[u8]>) variants. New push(&'a [u8]) API, internal push_node, and unsafe detach_lifetime() are added. Tests validate concatenation, empty-skipping, watcher behavior, and detach round-trip.
IntermediateOutput and break_output_into_pieces contract
src/bundler/Chunk.rs, src/bundler/LinkerContext.rs
IntermediateOutput::Joiner now stores StringJoiner<'static>. LinkerContext::break_output_into_pieces requires &mut StringJoiner<'static>, establishing storage boundaries for lifetime-detached joiners.
Bundler phases with unsafe lifetime detachment
src/bundler/linker_context/MetafileBuilder.rs, src/bundler/linker_context/postProcessCSSChunk.rs, src/bundler/linker_context/postProcessHTMLChunk.rs, src/bundler/linker_context/postProcessJSChunk.rs
Metafile, CSS, HTML, and JS chunk processors call unsafe { j.detach_lifetime() } before passing joiners to break_output_into_pieces(). Each adds SAFETY comments documenting why borrowed sources outlive stored output. PostProcessJS replaces arena-backed slices with owned boxed slices in InternalBakeDev path.
External API lifetime signatures
src/runtime/bake/dev_server/source_map_store.rs, src/sourcemap/lib.rs
SourceMapStore::join_vlq() binds StringJoiner<'a> and &'a self lifetimes. append_source_map_chunk() becomes generic over 'a, accepting StringJoiner<'a> and &'a [u8] source_map_, tying parameter lifetimes together.
Blob and FormData marshalling with detach_lifetime
src/runtime/webcore/Blob.rs, test/js/web/fetch/blob.test.ts
Blob byte-joining and FormData serialization replace push_static() with joiner.push(unsafe { detach_lifetime(...) }) in contexts where pre-scan or state guarantees no user-JS re-entrancy before joiner.done(). FormDataContext becomes StringJoiner<'a> generic; multiple typed-array, DOMWrapper, and file/content-type sites are updated. A new test verifies Blob stringification order for non-Blob parts.

Possibly related PRs

  • oven-sh/bun#31379: Also modifies StringJoiner push semantics and Blob/FormData joiner handling, directly overlapping with this PR's core refactor and marshalling changes.
  • oven-sh/bun#31175: Touches bundler post-processors (postProcessCSSChunk/postProcessJSChunk) and may be related to lifetime/packaging changes in bundler phases.

Suggested reviewers

  • RiskyMH
  • dylan-conway
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main change: binding StringJoiner borrowed segments to a lifetime parameter, which is the core refactoring described throughout the changeset.
Description check ✅ Passed The description comprehensively covers what changed (lifetime-parameterized StringJoiner and Node), how it was verified (cargo checks, bundler tests, FormData/Blob tests, sourcemap tests), and documents the safety implications and remaining unsafe code with clear justification.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/bun_core/string/StringJoiner.rs`:
- Around line 103-135: Add a direct unit test that constructs a StringJoiner
with at least one borrowed node and a watcher.input (use push or push_static to
create borrowed segments and set watcher.input), then call unsafe {
string_joiner.detach_lifetime() } to produce a StringJoiner<'static> and assert
the detached joiner preserves len, nodes' contents, and watcher fields
(estimated_count and needs_newline) and that borrowed bytes are still readable
(e.g., compare slices/collected output). Make the test exercise Node::Borrowed
paths and watcher.input round-trip through detach_lifetime to validate the
unsafe lifetime-retagging contract for StringJoiner::detach_lifetime.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 79f6478d-acef-4011-906c-61b9c969f700

📥 Commits

Reviewing files that changed from the base of the PR and between 4ee835b and fa66f6c.

📒 Files selected for processing (10)
  • src/bun_core/string/StringJoiner.rs
  • src/bundler/Chunk.rs
  • src/bundler/LinkerContext.rs
  • src/bundler/linker_context/MetafileBuilder.rs
  • src/bundler/linker_context/postProcessCSSChunk.rs
  • src/bundler/linker_context/postProcessHTMLChunk.rs
  • src/bundler/linker_context/postProcessJSChunk.rs
  • src/runtime/bake/dev_server/source_map_store.rs
  • src/runtime/webcore/Blob.rs
  • src/sourcemap/lib.rs

Comment thread src/bun_core/string/StringJoiner.rs

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any bugs, but this refactor touches memory-safety invariants across the bundler chunk lifecycle, Blob/FormData GC rooting, and introduces a new unsafe fn detach_lifetime() escape hatch — the per-site SAFETY claims look well-reasoned but deserve a human pass from someone familiar with the chunk/compile-results ownership model.

Extended reasoning...

Overview

This PR rewrites StringJoiner from a lifetime-erased RawSlice<u8> + owns_slice flag (with 4 blanket unsafe impl Send/Sync and a manual Drop) into a lifetime-parameterized enum Node<'a> { Borrowed(&'a [u8]), Owned(Box<[u8]>) }. It threads the new 'a parameter through 10 files: the core type, the bundler's Chunk/LinkerContext/postProcess{JS,CSS,HTML}Chunk/MetafileBuilder, the bake dev-server source-map store, Blob/FormData serialization, and sourcemap::append_source_map_chunk. For the cases where a lifetime can't be expressed (self-referential Chunk.intermediate_output borrowing sibling compile_results_for_chunk; GC-rooted JS buffers), it adds a new unsafe fn StringJoiner::detach_lifetime() and 6 bun_ptr::detach_lifetime slice re-tags, each with a SAFETY comment naming the owner and why it outlives the read. It also fixes a latent UAF in postProcessJSChunk's InternalBakeDev arm (block-local MutableString pushed by reference into a deferred joiner) and adds unit tests for the rewritten node storage.

Security risks

No injection/auth/data-exposure surface. The risk class here is memory safety: every detach_lifetime() call site asserts a non-local invariant about storage lifetime (e.g. "compile_results_for_chunk is filled in place before post-processing and never reassigned afterwards", "the prescan proved no remaining part can run user JS so this buffer stays attached until done()"). These are the same invariants the old code relied on implicitly, now written down — but they're still invariants that a reviewer with bundler-internals knowledge should validate, particularly the cross-thread Chunk.intermediate_output storage and the Blob constructor's parts_can_run_js prescan logic.

Level of scrutiny

High. This is production-critical code on the bundler hot path and in user-facing Blob/FormData construction. The change is a net safety improvement (blanket type-level unsafe impls → narrowly-scoped, documented call-site unsafe), and the bug-hunting system found nothing. But it's a ~300-line refactor of lifetime/ownership reasoning across 10 files, introduces a new unsafe API surface (detach_lifetime), and the correctness of each SAFETY comment depends on non-local invariants about the bundler's worker-pool join ordering and JS GC rooting. The detach_lifetime impl also does into_iter().map().collect() which reallocates the node Vec — likely fine given it runs once per chunk, but worth a glance.

Other factors

The PR description is exceptionally thorough (full unsafe accounting, rationale for why lifetimes alone can't cover the remaining sites, comprehensive test matrix). The UAF fix in the InternalBakeDev arm is a real correctness improvement. No prior reviews on the timeline. Given the scope and the design decision embodied in the detach_lifetime escape hatch, this warrants a human reviewer familiar with the chunk lifecycle rather than bot approval.

Comment thread src/bundler/Chunk.rs
Comment thread src/runtime/webcore/Blob.rs
…from an array

Also tidy a stale mention in a Chunk safety comment.
Comment thread src/runtime/webcore/Blob.rs Outdated

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing all the feedback — the Blob ordering fixes and the detach_lifetime test look good. I'm deferring to a human reviewer given the scope: this reworks unsafe lifetime handling across the bundler's deferred-joiner path and Blob/FormData GC-rooted borrows, and ships three Blob-constructor behavior changes alongside the refactor.

Extended reasoning...

Overview

This PR replaces StringJoiner's lifetime-erased RawSlice<u8> + ownership-bit storage with a lifetime-parameterized Node<'a> enum (Borrowed(&'a [u8]) | Owned(Box<[u8]>)), removing four blanket unsafe impl Send/Sync, the manual Drop, and the raw-pointer round-trips. It threads the new lifetime through ~10 call sites across the bundler (Chunk, LinkerContext, four postProcess* files, MetafileBuilder), the bake dev-server sourcemap store, append_source_map_chunk, and Blob/FormData construction. Where the borrow genuinely cannot be expressed (chunk self-reference, GC-rooted JS buffers), it introduces a documented unsafe fn detach_lifetime() plus 10 narrowly-scoped bun_ptr::detach_lifetime re-tags, each with a SAFETY comment naming the owner and lifetime justification.

Along the way it fixes one latent UAF (postProcessJSChunk's InternalBakeDev arm pushed a block-local MutableString's bytes by reference into a joiner stored on the chunk) and — in response to review feedback on this PR — three pre-existing Blob-constructor bugs in the inner array-iteration match (wrong variable + missing continue in the DOMWrapper else branch; LIFO mis-ordering in the _ arm; data loss from break in the nested-array arm). New unit tests cover StringJoiner and a JS test covers the Blob ordering fixes.

Security risks

No new attack surface. The change is a net reduction in unauditable unsafe (blanket Send/Sync impls → per-site documented re-tags), but the remaining detach_lifetime sites in the bundler and Blob/FormData are still soundness-load-bearing: each SAFETY comment encodes a non-local invariant (e.g. "compile_results_for_chunk is filled in place and never reassigned", "prescan proved no remaining part can run user JS", "DOMFormData keeps the entry's Blob alive past joiner.done()"). If any of those invariants drift, the result is a UAF in bundler output assembly or Blob construction. These are exactly the invariants the old code relied on implicitly, so risk is not increased — but verifying each SAFETY justification against the surrounding control flow is the kind of review a human should sign off on.

Level of scrutiny

High. This is unsafe-code restructuring in production-critical paths: bundler chunk assembly (runs on every bun build), Blob/FormData construction (web-facing, GC-interacting), and the HMR dev-server sourcemap path. The Blob fixes also change user-observable behavior (new Blob(["a", {}, "b"]) and nested-array parts now match Node/Chrome ordering), which is correct per spec but is a behavior change shipped alongside the refactor. The PR description's testing section is thorough and the bug-hunting pass found nothing, but the breadth (11 files, 4 subsystems) and the nature (per-site unsafe justifications) put this well outside what I'd auto-approve.

Other factors

All four review threads (one CodeRabbit nitpick, three of mine) were addressed with dedicated commits and tests; nothing is outstanding. CI shows one failure (AsyncLocalStorage-tracking.test.ts segfault on macOS x64) that appears unrelated to this diff but is worth a glance before merge. The detach_lifetime() implementation itself reallocates the nodes Vec via into_iter().collect() rather than transmuting in place — safe and simple, with negligible cost given it runs once per chunk.

@Jarred-Sumner Jarred-Sumner merged commit f20c8dc into main May 28, 2026
78 of 80 checks passed
@Jarred-Sumner Jarred-Sumner deleted the claude/stringjoiner-lifetime branch May 28, 2026 04:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants