Skip to content

fix(formdata): multipart parser corrupts parts located past 4 GiB#30609

Open
robobun wants to merge 6 commits into
mainfrom
farm/fc08f495/formdata-multipart-4gb-offsets
Open

fix(formdata): multipart parser corrupts parts located past 4 GiB#30609
robobun wants to merge 6 commits into
mainfrom
farm/fc08f495/formdata-multipart-4gb-offsets

Conversation

@robobun

@robobun robobun commented May 13, 2026

Copy link
Copy Markdown
Collaborator

Problem

req.formData() on a multipart body larger than 4 GiB returns corrupted field names/filenames for any part whose header sits past the 4 GiB mark. The part is then unreachable by name:

const form = await req.formData();
[...form.keys()]
// -> ["file", "\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000"]
form.get("description_field")
// -> null

This is the remaining half of #21490. The original symptom there — a 3.5 GB file coming through as 1.5 GB (exactly 2³¹ bytes lost) — was caused by Field.value being stored as bun.Semver.String, and was fixed as a side effect of #27483 (landed in 1.3.11) when value was switched to a raw []const u8 slice.

Root cause

forEachMultipartEntry still stored name, filename, and content_type as bun.Semver.String (Zig) / bun_semver::String (Rust), which packs a slice as Pointer { off: u32, len: u32 } bitcast into a u64 with bit 63 used as a tag. That gives:

  • len: effectively u31 (bit 31 is the tag) → silent truncation at 2 GiB
  • off: u32 via @truncate → silent wrap at 4 GiB

So once a part's Content-Disposition header sat past byte 2³², the offset wrapped and slice(buf) returned bytes from the middle of the preceding file body.

Fix

Store name, filename, and content_type as plain slices into the input buffer, the same way value already is, in both the Zig reference implementation and the Rust port:

  • FormData.zig: []const u8
  • FormData.rs: &'a [u8] on the existing lifetime-parameterized Field<'a> (no unsafe needed)

The subslicer and the buf callback parameter are no longer needed.

Also drop a leftover u32 cast in ArrayBuffer.fromBytes / ArrayBuffer::from_bytes — the target fields are already usize, so the cast just panicked on >4 GiB buffers in debug builds (hit on the client side when serializing a FormData containing a >4 GiB Bun.file()).

Rebase notes

Verification

New test in FormData.test.ts constructs a 4 GiB + 256 byte multipart body with a text field and a second file part after the big file, and asserts the trailing parts round-trip correctly. Spawned in a subprocess and skipped on Windows / machines with < 16 GiB effective memory (process.constrainedMemory() aware).

before after
keys ["big_upload", "", ""] ["big_upload", "description_field", "second_attachment"]
get("description_field") null "this part lives past the 4 GiB mark"
second file name undefined "also_past_4gb.txt"

Gate: test fails with src/ stashed, passes with the fix. cargo check / cargo clippy / cargo fmt --check clean on the touched crates; the adjacent FormData tests (control characters, duplicate numeric names) pass.

Fixes #21490
Related: #27483, #27443

@robobun

robobun commented May 13, 2026

Copy link
Copy Markdown
Collaborator Author

@coderabbitai

coderabbitai Bot commented May 13, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

FormData multipart parsing switches from buffer-offset bun.Semver.String slices to direct []const u8 slices for name, filename, and content_type. Public signatures are updated accordingly. ArrayBuffer.fromBytes initialization is simplified. A conditional regression test validates multipart parsing when headers are positioned past 4 GiB.

Changes

FormData multipart parsing with raw slices and large-offset support

Layer / File(s) Summary
FormData public API contracts
src/runtime/webcore/FormData.zig
FormData.Field.filename and FormData.Field.content_type change from bun.Semver.String to []const u8; forEachMultipartEntry iterator signature changes to fn (Ctx, []const u8, Field).
Entry handler and content-type checks
src/runtime/webcore/FormData.zig
Internal multipart entry handler accepts name: []const u8, reads field.filename and field.content_type as raw slices, and checks content-type presence via field.content_type.len > 0.
Multipart header parsing with raw slices
src/runtime/webcore/FormData.zig
Header parsing loop initializes name/filename as []const u8, extracts Content-Disposition params into raw slices, parses/trims content-type into a raw slice, normalizes Field, and invokes the iterator with (ctx, name, field).
ArrayBuffer direct byte initialization
src/jsc/array_buffer.zig
ArrayBuffer.fromBytes assigns .len and .byte_len directly from bytes.len without intermediate u32 casts.
Large-offset multipart parsing regression test
test/js/web/html/FormData.test.ts
Adds a conditional test (skipped on Windows or systems with <16 GiB RAM) that spawns a subprocess constructing a multipart body with headers beyond 2^32 bytes, calls request.formData(), and validates parsed keys, file metadata, and sizes.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed Title clearly and specifically describes the main fix: addressing multipart parser corruption for parts located past the 4 GiB boundary.
Linked Issues check ✅ Passed Changes successfully address issue #21490 by fixing corrupted field names/filenames for multipart parts with headers past 4 GiB by converting from bun.Semver.String to raw []const u8 slices.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing the 4 GiB multipart parsing issue: FormData.zig header field refactoring, ArrayBuffer.fromBytes optimization, and comprehensive regression test.
Description check ✅ Passed The PR description comprehensively covers the problem, root cause, fix, rebase notes, and verification with detailed before/after examples.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/runtime/webcore/FormData.zig`:
- Line 332: The parsing loop in FormData.zig currently exits early when both
filename and name are found (condition using header_chunk.len > 0 and (filename
== null or name.len == 0)), which skips subsequent headers like Content-Type and
leaves field.content_type empty; update the loop(s) (the header parsing while
using header_chunk, filename, name and the similar blocks around the other
occurrences) to continue scanning all header lines until header_chunk.len == 0
(i.e., remove the early-exit dependency on filename/name), and ensure that when
a "Content-Type" header is encountered you set field.content_type appropriately
so file parts retain their content type (apply same change to the other similar
blocks referenced).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 605edc8f-a6a8-4e88-9564-83e94b6b122e

📥 Commits

Reviewing files that changed from the base of the PR and between 3bf4b33 and 82448fb.

📒 Files selected for processing (3)
  • src/jsc/array_buffer.zig
  • src/runtime/webcore/FormData.zig
  • test/js/web/html/FormData.test.ts

Comment thread src/runtime/webcore/FormData.zig
@github-actions

Copy link
Copy Markdown
Contributor

This PR may be a duplicate of:

  1. fix(formdata): prevent silent truncation of files >= 4GB in req.formData() #27443 - Also replaces bun.Semver.String with []const u8 in the multipart FormData parser to fix 4 GiB corruption; PR fix(formdata): multipart parser corrupts parts located past 4 GiB #30609 is a more complete fix covering additional fields

🤖 Generated with Claude Code

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/js/web/html/FormData.test.ts`:
- Around line 2-3: The test currently uses os.totalmem() to decide skipping
which checks installed RAM; change the skip gating to use the effective memory
ceiling instead (e.g., use v8.getHeapStatistics().heap_size_limit or process
resource limits or a new harness helper) so the FormData.test.ts skip is
deterministic under cgroup/job memory caps; locate the check that imports
totalmem and replace the logic that references totalmem with a call to
v8.getHeapStatistics().heap_size_limit (or the harness helper) and base the
skip/skipIf on that value (update any references to totalmem in the test file
accordingly).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1c8fdad6-7d2c-4275-b642-19cc32e4e870

📥 Commits

Reviewing files that changed from the base of the PR and between 82448fb and 4b2f6ed.

📒 Files selected for processing (1)
  • test/js/web/html/FormData.test.ts

Comment thread test/js/web/html/FormData.test.ts
Comment thread src/runtime/webcore/FormData.zig

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — mechanical bun.Semver.String[]const u8 widening that mirrors what Field.value already went through, with a gated regression test.

Extended reasoning...

Overview

This PR touches three files: src/runtime/webcore/FormData.zig (multipart parser), src/jsc/array_buffer.zig (one-line cast removal), and test/js/web/html/FormData.test.ts (new regression test). The core change converts Field.filename, Field.content_type, and the local name from bun.Semver.String (which packs offset/length into u32 with a tag bit) to raw []const u8 slices, matching what was already done for Field.value in #27483. The subslicer, the buf callback parameter, and the unused string type alias are dropped accordingly. The ArrayBuffer.fromBytes change removes a vestigial @as(u32, @intCast(...)) whose targets (.len, .byte_len) are already usize.

Security risks

The multipart parser handles untrusted request bodies, but this change is strictly a type widening — slices that previously round-tripped through a lossy u32 offset/length encoding now stay as native fat pointers into the same input buffer. No new parsing logic, no new indexing arithmetic, no allocation changes. The fromBytes change removes a debug-build panic on >4 GiB inputs without altering release behavior. I see no injection, OOB, or auth surface introduced.

Level of scrutiny

Low–medium. The transformation is mechanical (.isEmpty().len == 0, .len().len, .slice(buf) → direct slice, subslicer.sub(x).value()x) and follows the exact pattern already applied to Field.value. Grep confirms forEachMultipartEntry and FormData.Field are only referenced within FormData.zig, so the callback signature change is self-contained.

Other factors

The bug-hunting system found no issues. All inline review threads are resolved: the os.totalmem() gating concern was addressed in 8de77d5 with process.constrainedMemory(), and both CodeRabbit and my own note about the pre-existing Content-Type early-exit on line 332 were acknowledged as out of scope (the diff there is purely .len().len). The new test is appropriately gated (skipped on Windows / <16 GiB) and runs in a subprocess. The musl build failures in CI appear to be infrastructure-related rather than caused by this diff. No CODEOWNERS apply.

@robobun

robobun commented May 13, 2026

Copy link
Copy Markdown
Collaborator Author

CI status

The diff is green. Every red lane is fleet-wide flake or infra, none intersecting the multipart parser.

Build #60818 (8500af0, current HEAD — rebased onto main after #31783)

65 lanes pass, cargo clippy passes. Red lanes:

Lane Failure Evidence it's unrelated
alpine/debian test lanes test/cli/install/bunx.test.ts ("requires node 24") failing on concurrent builds #60814#60817
debian-13-x64 @duckdb/node-api segfault (native addon) also failing on #60815
alpine-x64-baseline streams-leak.test.ts (RSS threshold) streams allocator stats; this PR's from_bytes change is an identity round-trip removal for <4 GiB buffers
darwin-14-aarch64 / darwin-14-x64 / darwin-26-aarch64 Expired no agent picked up the jobs
windows-x64-baseline-verify-baseline Intel SDE download was 0 bytes ("Cannot open the file as archive") third-party tool download corruption

Previous builds (#57552: 71 pass / 0 test failures; earlier ones documented in edit history) show the same pattern: the diff's own tests pass everywhere they run.


The PR touches FormData.{zig,rs} and array_buffer.{zig,rs} plus a memory-gated regression test. After the latest rebase the Rust side uses plain &'a [u8] borrows on the lifetime-parameterized Field<'a> (no unsafe). All review threads resolved; claude[bot] LGTM'd each revision.

Ready for merge — needs a maintainer to retry the flaky lanes or override.

@robobun

robobun commented May 14, 2026

Copy link
Copy Markdown
Collaborator Author

Rebased onto main (post-Rust-rewrite) and ported:

  • src/jsc/array_buffer.rs: from_bytes/from_owned_bytes drop the u32::try_from().expect() round-trip (fields were already usize)
  • src/runtime/webcore/FormData.rs: Field.filename/content_type changed from bun_semver::String (u32 offset/len) to *const [u8]; for_each_multipart_entry iterator now passes name: &[u8] directly, drops SlicedString and the buf: &[u8] back-reference

Gate verified: without the .rs change, parts past 4 GiB parse with empty names and null values (u32 offset wrap); with it, all parts round-trip. .zig kept as reference.

(Unrelated: should increment reference count of the name string on Blob times out on my debug+ASAN container both with and without this change — 100k-iteration GC stress, not touched by this diff.)

@robobun robobun force-pushed the farm/fc08f495/formdata-multipart-4gb-offsets branch from 1dc0c33 to 31d5a69 Compare May 14, 2026 17:28
Comment thread test/js/web/html/FormData.test.ts Outdated

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found and all prior feedback addressed — the fix is a clean mechanical swap from bun.Semver.String to native slices following the pattern Field.value already uses. Deferring to a maintainer only because this touches the multipart body parser (untrusted-input path) across both the Zig reference and the freshly-ported Rust implementation, and the latter adds new unsafe raw-pointer derefs (same contract as the pre-existing field.value, but worth a human glance).

Extended reasoning...

Overview

This PR fixes #21490: req.formData() corrupts part names/filenames for any multipart part whose header sits past byte 2³² in the body. Root cause is that forEachMultipartEntry stored name/filename/content_type as bun.Semver.String, which packs offset/length into a u32/u31 pair and silently wraps. The fix swaps these to raw []const u8 (Zig) / *const [u8] (Rust) slices into the input buffer — exactly the pattern Field.value already uses since #27483. It also drops a leftover u32 round-trip cast in ArrayBuffer.fromBytes/from_bytes whose target fields are already usize. A new memory-gated, Windows-skipped regression test in FormData.test.ts constructs a 4 GiB + 256 B body and asserts the trailing parts round-trip.

Files: src/runtime/webcore/FormData.{zig,rs}, src/jsc/array_buffer.{zig,rs}, test/js/web/html/FormData.test.ts.

Security risks

The multipart parser handles untrusted HTTP request bodies, so this is a security-adjacent code path. However, the change does not alter parsing logic, boundary handling, or header tokenization — it only changes how already-parsed subslices are stored (native fat pointer instead of u32-packed offset). The fix strictly removes a silent integer truncation; it cannot introduce new OOB reads because the slices are direct subslices of input produced by the same indexing as before. The Rust side's new unsafe { &*field.content_type } / &*field.filename derefs follow the identical SAFETY contract already in place for field.value (always either the b"" static or a subslice of the caller-owned input, valid for the iterator call). No auth/crypto/permissions code is touched.

Level of scrutiny

Moderate. The transformation is mechanical and the diff is small, but it spans (a) a parser on the untrusted-input path and (b) two parallel implementations — the Zig reference and the recently-landed Rust port. The Rust change adds three unsafe blocks. While each mirrors the pre-existing field.value: *const [u8] pattern and the TODO(port): lifetime comment explains why &'a [u8] isn't used yet, I'd prefer a maintainer confirm the Rust lifetime story rather than rubber-stamp it.

Other factors

  • forEachMultipartEntry / for_each_multipart_entry have no external callers (grep confirms both are file-local), so the iterator signature change is self-contained.
  • No CODEOWNERS apply to the touched paths.
  • All prior review feedback (CodeRabbit's cgroup-memory gate, my require("os") → static import nit, the pre-existing Content-Type early-exit) has been addressed or explicitly scoped out, and all inline threads are resolved.
  • CI was green on the previous push except for an unrelated fleet-wide Windows test-http-should-emit-close-when-connection-is-aborted flake; build #54342 for the latest commit is in progress.
  • The bug-hunting system found no issues on the current revision.

@robobun robobun force-pushed the farm/fc08f495/formdata-multipart-4gb-offsets branch from d11a1e1 to fdb922c Compare May 21, 2026 17:54

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — mechanical Semver.String → raw-slice swap that mirrors what #27483 already did for Field.value, plus a dead u32 cast removal; all prior nits addressed.

Extended reasoning...

Overview

Touches src/runtime/webcore/FormData.{zig,rs} (multipart parser), src/jsc/array_buffer.{zig,rs}, and test/js/web/html/FormData.test.ts. The core change converts Field.{filename,content_type} and the per-part name from bun.Semver.String / bun_semver::String (which packs offset/len as u32 with bit 63 stolen as a tag) to raw []const u8 / *const [u8] slices into the input buffer. The iterator callback signature drops the now-unused buf back-reference and SlicedString subslicer. The array_buffer change removes a vestigial @as(u32, @intCast(...)) / u32::try_from().expect() round-trip on fields that were already usize. A memory-gated, subprocess-isolated regression test exercises a 4 GiB+ multipart body.

Security risks

None introduced. The multipart parser does process untrusted HTTP bodies, but this change doesn't alter parsing logic or bounds computation — it only changes how already-computed subslices are stored (native fat pointer instead of a packed u32 offset/len that silently wrapped at 4 GiB). The slices point into the same caller-owned buffer with identical bounds; the only behavioral delta is that offsets past 2³² no longer truncate. The Rust *const [u8] + unsafe { &* } pattern is the same idiom Field.value already uses (ported in #27483), so no new unsafe surface is being introduced — two more fields adopt an existing, documented invariant.

Level of scrutiny

Low–medium. The transformation is mechanical and follows an established precedent (Field.value underwent the identical conversion in #27483). I verified via grep that forEachMultipartEntry / for_each_multipart_entry and the Field struct have no consumers outside these two files, so the signature change is fully self-contained. The array_buffer hunk is a strict simplification (the target fields are usize; the cast was a debug-build panic waiting to happen and a no-op in release). No CODEOWNERS cover these paths.

Other factors

All review threads are resolved: my pre-existing Content-Type early-exit note was acknowledged as out of scope; CodeRabbit's cgroup-memory gate suggestion was applied (process.constrainedMemory?.() || totalmem()); my require("os") → static-import nit was fixed in fdb922c. The bug-hunting system found nothing this round. CI on the prior HEAD was green modulo a fleet-wide fetch-tcp-keepalive flake unrelated to this diff, and the new regression test passed on every lane where it ran.

@robobun robobun force-pushed the farm/fc08f495/formdata-multipart-4gb-offsets branch from fdb922c to 28f675b Compare May 24, 2026 10:23

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — mechanical Semver.String → raw slice swap matching the existing Field.value pattern; all prior feedback addressed.

Extended reasoning...

Overview

This PR fixes #21490 by changing how the multipart FormData parser stores part metadata. forEachMultipartEntry previously stored name, filename, and content_type as bun.Semver.String (which packs offset/length into u32 fields, wrapping at 4 GiB), and now stores them as raw []const u8 / *const [u8] slices into the input buffer — exactly the same representation Field.value already uses since #27483. The buf callback parameter and SlicedString subslicer become unnecessary and are dropped. A separate one-line change in array_buffer.{zig,rs} removes a vestigial u32 round-trip on fields that are already usize. A new regression test in FormData.test.ts constructs a >4 GiB multipart body in a subprocess and asserts trailing parts round-trip correctly; it's gated on process.constrainedMemory()/totalmem() ≥ 16 GiB and skipped on Windows.

Security risks

None introduced. The multipart parser handles untrusted HTTP bodies, but this change strictly removes a lossy u32 packing step — the same subslices of the same caller-owned input buffer flow through the same logic, just without offset truncation. No new dereferences of attacker-controlled offsets; the Rust unsafe { &*field.filename } pattern is identical to the pre-existing unsafe { &*field.value } pattern with the same SAFETY invariant (points into the caller-owned input or b"").

Level of scrutiny

Low-to-medium. The diff is a mechanical type substitution (.slice(buf) → direct use, .isEmpty().len > 0, .len().len) following an already-landed pattern from #27483. forEachMultipartEntry / for_each_multipart_entry and FormData.Field have no callers outside the two modified files, so the signature change is fully self-contained. The array_buffer change is a strict simplification (removing @as(u32, @intCast(x)) / u32::try_from(x).expect() as usize where x is already usize and the target field is usize).

Other factors

All four review threads are resolved: the CodeRabbit loop-condition comment and my own pre-existing-bug note were correctly scoped out (the .len().len change is purely mechanical, the early-exit behavior is unchanged from main); the cgroup-memory gate was addressed with process.constrainedMemory?.(); and my require("os") nit was fixed with a static import. The author's CI summary reports the diff green across 65+ lanes with only a fleet-wide fetch-tcp-keepalive flake unrelated to this change. No CODEOWNERS cover these paths. The bug-hunting system found no issues.

@robobun robobun force-pushed the farm/fc08f495/formdata-multipart-4gb-offsets branch from 28f675b to 65dce7d Compare May 27, 2026 17:21

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found and both earlier nits are addressed — deferring only because this touches the multipart body parser (untrusted input) across both the Zig and Rust implementations, which I'd rather a human sign off on.

Extended reasoning...

Overview

This PR fixes #21490 by changing how forEachMultipartEntry stores part metadata (name, filename, content_type): from bun.Semver.String / bun_semver::String — which packs offset/length into 32-bit fields and silently wraps past 4 GiB — to native []const u8 / *const [u8] slices. It applies the same change in parallel to src/runtime/webcore/FormData.zig and src/runtime/webcore/FormData.rs, drops the now-unused subslicer/buf plumbing, removes a redundant u32 round-trip in ArrayBuffer.fromBytes / ArrayBuffer::from_bytes (target fields are already usize), and adds a memory-gated regression test in test/js/web/html/FormData.test.ts.

The transformation is mechanical and follows the exact pattern already applied to Field.value in #27483: subslicer.sub(x).value()x, .len().len, .isEmpty().len == 0 / .is_empty(), .slice(buf) → direct use. The Rust side adds the required per-block SAFETY: comments for the *const [u8] derefs, and the Default impl keeps them initialised to b"" so they are always valid fat pointers. The control-character Content-Type guard from the recent hardening PR is preserved.

Security risks

Multipart parsing operates on untrusted HTTP request bodies, so it is security-adjacent. However, this PR does not introduce or alter any parsing logic, bounds checks, or allocations — it only widens the storage representation of already-computed subslices from u32-packed offsets to native fat pointers. The array_buffer change removes a debug-build panic on >4 GiB buffers without changing release behaviour. I see no new injection, OOB, or lifetime exposure; the Rust *const [u8] fields borrow into the caller-owned input for exactly the duration of the iterator callback, matching the existing Field.value contract.

Level of scrutiny

Medium-high. The change itself is locally verifiable and semantically equivalent modulo the bug being fixed, but it spans the production multipart parser in two coordinated language implementations and changes the public forEachMultipartEntry iterator signature. That combination is past my threshold for auto-approval of code on an untrusted-input path.

Other factors

The bug-hunting system found nothing. Both of my earlier inline comments (the pre-existing Content-Type early-exit, scoped out as orthogonal; and the require("os") → static import nit) are resolved, as is CodeRabbit's cgroup-aware memory-gate suggestion. CI is reported green across 71 lanes with only expired darwin agents outstanding. The new test is sensibly gated (process.constrainedMemory() with totalmem() fallback, skip on Windows) and runs in a subprocess. No CODEOWNERS match the touched paths.

robobun and others added 6 commits June 5, 2026 20:31
The multipart parser stored part name, filename, and content-type as
bun.Semver.String, which packs offset and length into 32-bit fields
(with bit 31 of length stolen as a tag). For any part whose header sat
past 4 GiB in the request body, the offset wrapped and the parser read
garbage — field names came back as bytes from the middle of the
preceding file body and the part was unreachable by name.

This is the remaining half of the bug behind #21490. The file body
itself (Field.value) was already switched to a raw slice in #27483,
which fixed the 2 GiB length truncation, but name/filename/content_type
were still u32-indexed.

Switch all Field slices to raw []const u8 and drop the subslicer. Also
remove a leftover @as(u32, @intcast(bytes.len)) in ArrayBuffer.fromBytes
whose target fields are usize — it panicked on >4 GiB buffers in debug
builds on the client-side FormData serialization path.

Fixes #21490
os.totalmem() reports physical RAM, so a containerized runner with a
lower cgroup cap would still try to run the ~10 GiB test and get
OOM-killed. Use process.constrainedMemory() (cgroup-aware) with a
fallback to totalmem().
Per test/CLAUDE.md, only use require() when testing require behavior.
@robobun robobun force-pushed the farm/fc08f495/formdata-multipart-4gb-offsets branch from 65dce7d to 8500af0 Compare June 5, 2026 20:39

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — mechanical type swap (Semver.String → raw slice) mirroring the existing Field.value pattern, plus a gated regression test; all prior review threads resolved and CI green.

Extended reasoning...

Overview

This PR fixes #21490 by changing FormData.Field.filename/content_type and the multipart iterator's name parameter from bun.Semver.String / bun_semver::String (which packs offset/length as u32 and silently wraps past 4 GiB) to native []const u8 / &[u8] slices. The same fix was already applied to Field.value in #27483; this completes it for the remaining three fields. It also drops a vestigial u32 round-trip in ArrayBuffer.fromBytes / ArrayBuffer::from_bytes where the target fields are already usize. A memory-gated subprocess regression test in FormData.test.ts verifies parts past the 4 GiB mark round-trip correctly.

Security risks

The multipart parser handles untrusted HTTP bodies, but this change adds no new parsing logic — it only changes how subslices of the input buffer are represented (native fat pointer instead of u32 offset+len packed into a u64). All slices still point into the same caller-owned input buffer with the same bounds; in the Rust port the borrow checker enforces this via the 'a lifetime on Field. The pre-existing printable-ASCII guard on content_type (header-injection hardening) is preserved verbatim. Net effect is a correctness improvement: the old code could hand back bytes from the wrong offset within the buffer.

Level of scrutiny

Moderate. The transformation is mechanical and follows the exact pattern already established for Field.value in the same struct: subslicer.sub(x).value()x, .slice(buf) → direct use, .len()/.isEmpty().len/.is_empty(). forEachMultipartEntry/for_each_multipart_entry is only called from within the same file (verified via grep), so the iterator signature change has no external callers to break. The array_buffer change is a no-op simplification (removing u32::try_from(x).expect() as usize where x: usize).

Other factors

All prior review threads (CodeRabbit's Content-Type early-exit note, the cgroup-aware memory gate, and the static-import nit) are resolved — the early-exit is correctly identified as pre-existing and out of scope, and the test gate now uses process.constrainedMemory() with a totalmem() fallback. The bug-hunting pass found nothing. The robobun CI summary reports zero test failures on the current HEAD with only infra-expired darwin agents red. No CODEOWNERS cover these paths.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bun Server Uploads Stop at 1,58 GB Despite Configured Higher Limit

1 participant