Skip to content

blob: unaligned UTF-16LE decode for odd-length/odd-address views#31006

Closed
alii wants to merge 13 commits into
mainfrom
ali/blob-utf16-odd-length
Closed

blob: unaligned UTF-16LE decode for odd-length/odd-address views#31006
alii wants to merge 13 commits into
mainfrom
ali/blob-utf16-odd-length

Conversation

@alii

@alii alii commented May 18, 2026

Copy link
Copy Markdown
Member

Reading a UTF-16LE-with-BOM JSON/text Blob/Bun.file() aborts the whole process (exit 133, uncatchable — a JS try/catch around await blob.json() does not catch it) when the byte view is odd-address — e.g. .slice(odd) of a shared byte store (shared_view_raw() returns base + offset).

to_json_with_bytes / to_string_with_bytes truncate to even length (& !1) and then do bytemuck::cast_slice::<u8, u16>(buf), which still panic!s (not a debug assert) on TargetAlignmentGreaterAndInputNotAligned when the address is odd. Stripping the 2-byte BOM doesn't change address parity. Zig's bun.reinterpretSlice(u16, buf) does an unaligned @ptrCast and never checks alignment.

At both sites: after the & !1 truncation, use bytemuck::try_cast_slice — it succeeds for the common 2-aligned case (file reads, whole byte stores) and borrows &[u16] straight into clone_utf16 with no extra allocation. Only when the view is odd-address does it fall back to chunks_exact(2).map(u16::from_le_bytes).collect::<Vec<u16>>().

Regression test spawns a subprocess (the failure is an uncatchable abort), covers odd-length .json()/.text(), even-length control, and odd-address via new Blob([pad+BOM+data]).slice(1).text(), and asserts stderr first so the panic text surfaces on regression.

@robobun

robobun commented May 18, 2026

Copy link
Copy Markdown
Collaborator
Updated 7:11 AM PT - May 27th, 2026

@robobun, your commit f6d633ad62fed625c8e65cab198fbcb271046200 passed in Build #58340! 🎉


🧪   To try this PR locally:

bunx bun-pr 31006

That installs a local version of the PR into your bun-31006 executable, so you can run:

bun-31006 --bun

@coderabbitai

coderabbitai Bot commented May 18, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

Blob UTF-16LE BOM branches now truncate any trailing odd byte, attempt a safe bytemuck::try_cast_slice to &[u16], and fall back to decoding with chunks_exact(2) + u16::from_le_bytes into a Vec<u16> before creating strings; a regression test verifies Blob.json()/Blob.text() do not abort on odd-length or misaligned inputs. Separately, a C assert was replaced with the WTF ASSERT macro in uv__tty_make_raw.

Changes

Blob UTF-16LE BOM Odd-Length Handling

Layer / File(s) Summary
UTF-16LE BOM conversion change in Blob methods
src/runtime/webcore/Blob.rs
Replaces unconditional bytemuck::cast_slice with even-length truncation + bytemuck::try_cast_slice and a chunks_exact(2)u16::from_le_bytesVec<u16> fallback, then uses BunString::clone_utf16 / OwnedString::new(...) to produce final strings in to_string_with_bytes and to_json_with_bytes.
Regression test for odd/misaligned UTF-16LE Blobs
test/js/web/fetch/blob.test.ts
Adds a subprocess-based test that constructs odd/even-length and misaligned UTF-16LE+BOM buffers, invokes Blob.json()/Blob.text(), checks deterministic stdout, asserts empty stderr and exit code 0.

WTF assert macro use in tty helper

Layer / File(s) Summary
Replace assert with ASSERT in wtf-bindings
src/jsc/bindings/wtf-bindings.cpp
Changes the null-pointer check in uv__tty_make_raw from assert(tio != NULL) to ASSERT(tio != NULL).
  • Possibly related PRs:
    • oven-sh/bun#30926: Similar change replacing unsafe cast-based UTF-16 conversions with safe per-chunk construction.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: fixing unaligned UTF-16LE decoding for odd-length/odd-address byte views in Blob operations.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description comprehensively explains the problem, solution, and testing approach in detail.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread src/runtime/webcore/Blob.rs Outdated
Comment thread test/js/web/fetch/blob.test.ts
@alii

alii commented May 18, 2026

Copy link
Copy Markdown
Member Author

@robobun fix review comments

bytemuck::cast_slice panics on odd alignment as well as odd length;
.slice(odd) of a shared byte store hands an odd-address view straight to
the cast. Replace with chunks_exact(2) + from_le_bytes, which drops the
trailing odd byte and reads unaligned like Zig's reinterpretSlice. Test
now covers the .slice(1) case and asserts stderr so the panic text shows
up on regression.
@robobun robobun changed the title blob: drop trailing odd byte in UTF-16LE decode instead of aborting blob: unaligned UTF-16LE decode for odd-length/odd-address views May 18, 2026

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — both prior review points (odd-alignment abort + stderr assertion) are addressed; chunks_exact(2) + from_le_bytes handles unaligned/odd-length input and the test now covers .slice(1).

Extended reasoning...

Overview

This PR replaces bytemuck::cast_slice::<u8, u16>(buf) with an unaligned, length-tolerant decode (buf.chunks_exact(2).map(|c| u16::from_le_bytes([c[0], c[1]])).collect::<Vec<u16>>()) at the two UTF-16LE-BOM call sites in to_string_with_bytes (Blob.rs:2674) and to_json_with_bytes (Blob.rs:2865), and adds a subprocess regression test in test/js/web/fetch/blob.test.ts covering odd-length input, even-length sanity, and an odd-address .slice(1) view.

Prior review follow-up

In the previous round I flagged (🔴) that the original & !1 length-truncation fix still left the odd-address PodCastError::TargetAlignmentGreaterAndInputNotAligned abort reachable via Blob.slice(odd), and (🟡) that the test piped stderr without reading it. The new revision adopts the exact chunks_exact/from_le_bytes approach I suggested, which sidesteps both bytemuck checks entirely (no alignment requirement, trailing odd byte dropped by chunks_exact's remainder), and the test now reads/asserts stderr first and adds a .slice(1) misaligned case. Both points are fully resolved.

Security risks

None. The change moves from a panicking reinterpret-cast to a byte-by-byte copy into a fresh Vec<u16>; there is no unsafe, no pointer arithmetic, and no externally-controlled length used for allocation beyond what clone_utf16 already did. The behavioral change for malformed input (drop trailing odd byte) matches the prior Zig implementation's @divTrunc(len, 2) semantics, so this restores parity rather than introducing new lenience.

Level of scrutiny

Low–medium. The diff is ~20 effective Rust lines across two near-identical sites plus a test. It strictly removes a JS-triggerable uncatchable abort and replaces it with a safe copy that clone_utf16 was going to perform anyway (the comment correctly notes the extra Vec is one allocation either way — chunks_exact is ExactSizeIterator, so collect allocates once). The surrounding lifetime/cleanup logic (_free, Temporary/Transfer handling, OwnedString deref) is unchanged.

Other factors

No CODEOWNERS for this path. The bug-hunting system found no issues on this revision. The regression test is hermetic (subprocess via bun -e), asserts stderr == "" first per the repo's test convention, and exercises both the .text() and .json() paths plus the odd-alignment case that motivated the second commit.

<cassert> is not included by wtf-bindings.cpp, so whether 'assert' is
visible depends on unified-source batching. Use the always-available
WTF ASSERT macro instead.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — both prior concerns addressed (unaligned chunks_exact decode at both sites, stderr asserted first, odd-address case covered).

Extended reasoning...

Overview

This PR fixes an uncatchable process abort when Blob.json() / Blob.text() encounter UTF-16LE+BOM bytes that are odd-length or odd-address. Two call sites in src/runtime/webcore/Blob.rs (to_json_with_bytes, to_string_with_bytes) replace bytemuck::cast_slice::<u8, u16>(buf) — which panic!s on both odd length and odd alignment — with buf.chunks_exact(2).map(|c| u16::from_le_bytes([c[0], c[1]])).collect(), which drops any trailing odd byte and reads unaligned, matching the Zig bun.reinterpretSlice semantics it ports. A subprocess regression test in test/js/web/fetch/blob.test.ts covers odd-length .json()/.text(), an even-length control, and odd-address via .slice(1). There is also a one-line drive-by in src/jsc/bindings/wtf-bindings.cpp swapping libc assert for WTF ASSERT in uv__tty_make_raw.

Security risks

None identified. The change replaces a panicking reinterpret-cast with a safe byte-wise copy into a Vec<u16>, strictly removing a JS-triggerable abort. No new unsafe blocks, no parsing of untrusted input beyond what already existed, no auth/crypto/permissions surface. The ASSERT swap is debug-only on both sides and the sole caller passes a stack address that cannot be NULL.

Level of scrutiny

Low-to-moderate. This is a narrow, well-understood crash fix in a code path (BOM-prefixed UTF-16LE blobs) that is not on the common hot path. The new code is straightforward standard-library Rust with no unsafe; clone_utf16 already copies, so the intermediate Vec does not change the allocation count. The wtf-bindings change is a cosmetic consistency tweak (the file already uses ASSERT elsewhere and does not include <cassert> directly).

Other factors

I previously flagged (1) that the original revision only fixed odd length and still aborted on odd alignment, and (2) that the test piped stderr without reading it. Both have been addressed: the chunks_exact approach handles length and alignment together, the test now asserts stderr === "" first and adds a .slice(1) odd-address case. Both inline threads are resolved. No CODEOWNERS apply to the touched files. The bug-hunting system found no issues in the current revision.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — both review comments addressed: chunks_exact(2) + from_le_bytes now handles odd length and odd alignment, the test covers .slice(1) and asserts stderr first, and the assertASSERT swap is a trivial consistency fix.

Extended reasoning...

Overview

This PR fixes an uncatchable process abort when Blob.json()/.text() encounters a UTF-16LE-with-BOM byte slice that is odd-length or odd-address. It touches three files: src/runtime/webcore/Blob.rs (two identical ~10-line replacements of bytemuck::cast_slice::<u8,u16> with chunks_exact(2).map(u16::from_le_bytes).collect()), test/js/web/fetch/blob.test.ts (one new subprocess regression test), and src/jsc/bindings/wtf-bindings.cpp (a one-line assertASSERT macro swap in uv__tty_make_raw, whose only caller passes a stack address, so it's a no-op consistency change with the rest of the file).

Prior review follow-up

I left two comments on the first revision: a 🔴 noting that & !1 only fixed odd length and cast_slice would still abort on odd alignment via .slice(odd), and a 🟡 nit that the test piped stderr without reading it. Both are now resolved exactly as suggested — the chunks_exact approach handles length and alignment in one go, the test adds a .slice(1) odd-address case, and stderr is asserted first so a regression surfaces the panic text. Both threads are marked resolved on the PR.

Security risks

None. This is pure data-decoding of user-supplied blob bytes; the new path is strictly safer than the old one (no unsafe, no reinterpret cast, bounded iteration over the input slice). Worst case for malformed input is a dropped trailing byte, matching Zig's @divTrunc semantics.

Level of scrutiny

Low-to-moderate. The change is small, mechanical, and replaces a panicking cast with a well-known safe idiom. clone_utf16 already copies into a fresh WTFStringImpl (verified at src/bun_core/string/mod.rs:239), so the intermediate Vec<u16> adds one allocation on a cold path (BOM-prefixed UTF-16 blobs only) — an acceptable tradeoff for eliminating an uncatchable abort. No CODEOWNERS paths are touched.

Other factors

The robobun build-status comment shows a ❌ for d6b7f87, but the follow-up commit 56d2716 is an empty CI retrigger explicitly noting "darwin-aarch64 jobs expired in queue" — an infra timeout, not a test failure. No new bugs were found by the bug-hunting system on the current revision.

@robobun

robobun commented May 19, 2026

Copy link
Copy Markdown
Collaborator

This fix already landed on main via #31417 — the identical try_cast_slice zero-copy fast path + as_chunks::<2> unaligned fallback at both to_string_with_bytes / to_json_with_bytes sites, plus a regression test covering odd-address .slice(1) .text()/.json() and the aligned case. Rebasing this branch onto current main yields an empty diff. Closing as superseded — no merge needed.

@Jarred-Sumner Jarred-Sumner left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should instead detect if it's unaligned and only do this operation when it is unaligned. Otherwise, we are duplicating data unnecessarily.

…saligned

try_cast_slice succeeds for the common aligned case (file reads, whole
byte stores) and hands &[u16] straight to clone_utf16 with no
intermediate allocation; only odd-address views from .slice(odd) fall
through to the chunks_exact copy.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — both prior review points are addressed: try_cast_slice fast path with chunks_exact fallback covers odd-length and odd-address, and the test now asserts stderr first.

Extended reasoning...

Overview

This PR fixes an uncatchable process abort in Blob.json()/.text() when a UTF-16LE-with-BOM byte view is odd-length or odd-address. Two identical hunks in src/runtime/webcore/Blob.rs (to_json_with_bytes, to_string_with_bytes) replace bytemuck::cast_slice::<u8, u16> — which panic!s on either condition — with: truncate to even length (& !1), then try_cast_slice for the common 2-aligned case, falling back to a chunks_exact(2) → u16::from_le_bytes → Vec<u16> copy when misaligned. A subprocess regression test in test/js/web/fetch/blob.test.ts covers odd-length .json()/.text(), an even-length control, and an odd-address .slice(1) view. The unrelated one-liner in src/jsc/bindings/wtf-bindings.cpp swaps libc assert for WTF ASSERT in uv__tty_make_raw, matching the rest of the file.

Prior review

I previously flagged (1) that the original & !1 length-only fix still aborted on odd-address views via .slice(odd), and (2) that the test piped stderr but never read it. Both inline comments are resolved: d2f4866 switched to unconditional chunks_exact, then d8d9200 restored a zero-copy try_cast_slice fast path per Jarred's request while keeping the unaligned fallback. The test now reads stderr and asserts it empty first, and exercises the odd-address path. The current bug-hunt pass found nothing on the latest revision.

Security risks

None. No untrusted-input parsing surface is widened; the change strictly removes a JS-triggerable native panic. The native-endian reinterpret in the fast path matches both the pre-PR behavior and Zig's reinterpretSlice, and Bun targets only LE platforms.

Level of scrutiny

Moderate — this is on the Blob text/JSON read path, but the change is small, mechanical, and mirrors Zig semantics (@divTrunc + unaligned read). The deferred-init let unaligned: Vec<u16>; … &unaligned pattern is idiomatic Rust for conditional allocation and borrow-checks correctly. The assert → ASSERT swap is a no-op consistency change (other assertions in the file already use ASSERT).

Other factors

No CODEOWNERS cover these paths. All review threads are resolved, and the latest commit directly implements the optimization a maintainer asked for. The regression test is deterministic, follows the established Bun.spawn subprocess pattern in the same file, and would surface the panic text on regression.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — both prior review comments were addressed (try_cast_slice + chunks_exact fallback covers odd length and odd alignment, and the test now asserts stderr first).

Extended reasoning...

Overview

This PR fixes an uncatchable process abort (exit 133) when Blob.json()/.text() encounter UTF-16LE+BOM bytes that are odd-length or odd-address. The two affected sites in src/runtime/webcore/Blob.rs (to_json_with_bytes, to_string_with_bytes) replace bytemuck::cast_slice::<u8, u16> — which panic!s on either condition — with: truncate to even length (& !1), then try_cast_slice for the common 2-aligned zero-copy path, falling back to chunks_exact(2).map(u16::from_le_bytes).collect() only when misaligned. A subprocess regression test in test/js/web/fetch/blob.test.ts covers odd-length .json()/.text(), an even-length control, and odd-address via .slice(1). There's also a one-line drive-by assertASSERT in wtf-bindings.cpp (the file doesn't include <cassert> directly).

Security risks

None. This is purely defensive handling of byte-slice geometry on a read path; no new inputs are trusted, no auth/crypto/permissions touched. The change strictly removes a JS-triggerable native abort and otherwise preserves the prior decode semantics (matching Zig's reinterpretSlice truncate-and-read-unaligned behavior). The Ok branch reinterprets bytes as native-endian u16 exactly as the old cast_slice did, so no behavior change on Bun's LE-only targets.

Level of scrutiny

Low–medium. Small, mechanical, two near-identical hunks at well-understood call sites, with a regression test that exercises both failure modes in a subprocess (since the original failure was an uncatchable abort). The deferred-init let unaligned: Vec<u16>; ... &unaligned pattern is borrow-checker-sound and scoped correctly relative to the _free guard. No CODEOWNERS entries cover these paths.

Other factors

I previously left two inline comments on this PR — a 🔴 for the missing odd-alignment handling and a 🟡 nit on the test's unread stderr — and both have been fully addressed in d2f4866/d8d9200 (threads resolved). The bug-hunting system found no issues on the current revision. CI passed blob.test.ts on all lanes per the build-bot summary; the one red lane was an unrelated S3/network flake. The ASSERT swap is trivial and consistent with the rest of the file.

@alii

alii commented May 22, 2026

Copy link
Copy Markdown
Member Author

@robobun there are conflicts

Main landed the & !1 odd-length truncation independently; keep that and
layer the try_cast_slice + chunks_exact fallback on top for odd-address
views. Preserve main's ZIG PARITY comment in to_string_with_bytes.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — both prior comments addressed (try_cast_slice + chunks_exact fallback, stderr asserted in test).

Extended reasoning...

Overview

Replaces panicking bytemuck::cast_slice::<u8, u16> with & !1 even-length truncation followed by try_cast_slice + a chunks_exact(2).map(u16::from_le_bytes) fallback at the two UTF-16LE BOM sites in Blob::to_string_with_bytes / to_json_with_bytes. Adds a subprocess regression test covering odd-length and odd-address (.slice(1)) inputs. The wtf-bindings.cpp change mentioned in the description is no longer in the diff (presumably landed on main during the merge).

Security risks

None. This is pure decode-path hardening: it removes a JS-triggerable native panic! (process abort) and replaces it with a safe borrow-or-copy. No new untrusted-input parsing, no auth/crypto/permissions surface.

Level of scrutiny

Low–medium. ~30 lines of localized logic across two near-identical sites, mirroring Zig's bun.reinterpretSlice semantics (drop trailing odd byte, read unaligned). The deferred-init let unaligned: Vec<u16>; + match pattern is idiomatic and borrow-checks correctly. The Ok arm reinterprets as native-endian (same as the pre-existing code) while the Err arm uses explicit LE — these only diverge on big-endian hosts, which Bun does not target, and it is not a behavior change from the original cast_slice.

Other factors

Both inline comments I left on the first revision are resolved: the odd-address abort is now handled via try_cast_slice fallback, and the test reads/asserts stderr first so a future panic surfaces in CI output. The test also adds an explicit .slice(1) odd-address case. No CODEOWNERS cover src/runtime/webcore/. The most recent "there are conflicts" comment was followed by merge commit 2338403, which CI is now building.

Comment thread src/runtime/webcore/Blob.rs Outdated
Const-generic chunk size lets the compiler elide per-element bounds
checks and from_le_bytes takes the [u8; 2] array directly.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — both prior review points (odd-address fallback, stderr assertion) and the as_chunks request are addressed, and the regression test covers all four cases.

Extended reasoning...

Overview

Two identical edits in src/runtime/webcore/Blob.rs (to_string_with_bytes and to_json_with_bytes, UTF-16LE BOM arms) replace the panicking bytemuck::cast_slice::<u8, u16> with try_cast_slice + an unaligned fallback that collects as_chunks::<2>().0 through u16::from_le_bytes into a Vec<u16>. The & !1 even-length truncation is retained, so the only remaining Err case is odd address, which the fallback handles. The happy path (aligned, even-length — file reads, whole stores) still borrows directly with no extra allocation. A new subprocess test in test/js/web/fetch/blob.test.ts exercises odd-length .json()/.text(), an even-length control, and odd-address via .slice(1), asserting stderr first.

Security risks

None. This is a defensive replacement of a process-aborting cast with a non-panicking try + copy fallback on the Blob text/JSON decode path. No new inputs are accepted and no trust boundaries change; the only behavioral difference is that previously-aborting inputs now decode (matching the Zig reinterpretSlice semantics this code ports).

Level of scrutiny

Moderate — it's runtime code on a user-reachable path, but the change is mechanical and tightly scoped to two call sites with an identical pattern. The let unaligned: Vec<u16>; ... &unaligned deferred-init shape is sound, as_chunks::<2>().0 on an already-even slice yields exactly len/2 pairs, and from_le_bytes is the correct decode for UTF-16LE. The fast-path reinterpret remains native-endian like the original cast_slice and the Zig source, so no regression on Bun's LE targets.

Other factors

All three review threads are resolved: my earlier odd-address concern (now try_cast_slice + fallback, with a .slice(1) test case), my stderr-assertion nit (now read and asserted first), and CrazyboyQCD's as_chunks request (applied in f64750c). The bug-hunting pass found nothing on the current revision, and robobun reports all 17 blob tests passing.

robobun added 2 commits May 26, 2026 04:07
…ngth

# Conflicts:
#	test/js/web/fetch/blob.test.ts
bun-debug is ASAN + debug allocator, so RSS inflates the same way the
named bun-asan CI binary does; isASAN only matches the binary name.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — both my earlier points (odd-address fallback, stderr assertion) and the as_chunks suggestion are addressed; the try_cast_slice + as_chunks::<2> fallback is correct.

Extended reasoning...

Overview

Two-file change: src/runtime/webcore/Blob.rs replaces the panicking bytemuck::cast_slice::<u8, u16> at the two UTF-16LE-BOM sites (to_string_with_bytes, to_json_with_bytes) with even-length truncation → try_cast_slice (zero-copy when 2-aligned) → as_chunks::<2>().0.iter().map(u16::from_le_bytes).collect() fallback for odd-address views. test/js/web/fetch/blob.test.ts adds a subprocess regression test covering odd-length, even-length, and odd-address (.slice(1)) cases, plus a small unrelated relaxation of an RSS leak threshold under debug builds.

Security risks

None. The change converts a JS-triggerable native panic! (process abort) into correct decoding, which is strictly a robustness improvement. No new untrusted-input parsing surface; the fallback path only allocates a bounded Vec<u16> proportional to the existing buffer.

Level of scrutiny

Low–medium. This is a small, mechanical fix to a narrow code path (UTF-16LE BOM branch of Blob text/json), mirroring the prior Zig semantics. The deferred-init let unaligned: Vec<u16>; … &unaligned pattern is sound. The Ok path reinterprets as native-endian and the Err path uses from_le_bytes, which are equivalent on all Bun targets (LE-only) and match pre-PR behavior.

Other factors

I reviewed an earlier revision and flagged (1) the unhandled odd-alignment case and (2) the unread stderr in the test — both are fixed in the current diff, and the inline threads are resolved. CrazyboyQCD's as_chunks request is also applied. No CODEOWNERS cover these paths. The robobun binary-size failure is a stale-baseline artifact from the branch having been 81 commits behind before merging main; this ~30-line change cannot account for +11 MB and is not a code-review concern.

Comment thread test/js/web/fetch/blob.test.ts

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — all prior feedback addressed; the try_cast_slice + as_chunks::<2> fallback correctly handles both odd-length and odd-address views, and the regression test now covers the full matrix.

Extended reasoning...

Overview

Two near-identical hunks in src/runtime/webcore/Blob.rs (to_string_with_bytes and to_json_with_bytes) replace a panicking bytemuck::cast_slice::<u8, u16> with & !1 length truncation followed by bytemuck::try_cast_slice and an as_chunks::<2>().0.iter().map(u16::from_le_bytes).collect() fallback for odd-address views. A new subprocess-based regression test in test/js/web/fetch/blob.test.ts covers odd-length × {.json(), .text()}, an even-length control, and odd-address × {.json(), .text()} via .slice(1). There's also a small drive-by relaxation of an unrelated leak-test threshold to treat isDebug like isASAN.

Security risks

None. This is pure byte-decode logic that replaces an uncatchable process abort with a safe copy fallback. No user-controlled control flow, no auth/crypto/permissions, no new external surface.

Level of scrutiny

Low-to-medium. The change is a focused, well-understood crash fix in a Rust port of existing Zig behaviour (bun.reinterpretSlice does an unaligned reinterpret). The try_cast_sliceOk path preserves the prior zero-copy borrow for the common aligned case; only the odd-address edge case allocates a temporary Vec<u16>, which is then copied again by clone_utf16 exactly as before. The deferred-init let unaligned: Vec<u16>; pattern is sound and borrow-checked.

Other factors

This PR has been through three review rounds. My earlier 🔴 (odd-alignment still panics) and two 🟡 nits (stderr assertion, .slice(1).json() coverage) were all applied and resolved, as was a contributor suggestion to use as_chunks over chunks_exact. The bug-hunting system found nothing on the current revision (3afd712). The test follows the file's established Bun.spawn + stderr-first pattern. The isASAN || isDebug threshold tweak is a low-risk test-robustness change with an explanatory comment.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — all prior feedback addressed; the try_cast_slice + as_chunks::<2> fallback closes both the odd-length and odd-address abort paths and the subprocess test covers the full matrix.

Extended reasoning...

Overview

Two near-identical UTF-16LE-BOM branches in src/runtime/webcore/Blob.rs (to_string_with_bytes / to_json_with_bytes) replace bytemuck::cast_slice::<u8, u16> — which panic!s on odd length or odd address — with & !1 truncation, then bytemuck::try_cast_slice for the common 2-aligned zero-copy path, falling back to as_chunks::<2>().0.iter().map(u16::from_le_bytes).collect() only when the view is odd-address. A new subprocess regression test in test/js/web/fetch/blob.test.ts exercises odd-length .json()/.text(), an even control, and odd-address .slice(1) for both .text() and .json(), asserting stderr first so a regression surfaces the panic text. A small drive-by loosens an RSS threshold for isDebug builds in an adjacent leak test.

Security risks

None. This is a crash-hardening fix in a string-decode path; no auth, crypto, permissions, or untrusted-command surfaces are touched. The change strictly removes a JS-triggerable process abort.

Level of scrutiny

Low–medium. The diff is ~30 lines of mechanical Rust at two sites, mirroring Zig's unaligned reinterpretSlice semantics. The deferred-init let unaligned: Vec<u16>; … &unaligned pattern is standard and borrow-checks; as_chunks::<2> after & !1 has an empty remainder so no bytes are dropped beyond the intentional trailing odd byte. The Ok arm reinterprets native-endian and the Err arm decodes little-endian, but Bun only targets LE hosts and the pre-existing cast_slice already had native-endian semantics, so this is not a behavioral change.

Other factors

All three of my earlier inline comments (odd-address still panics; read/assert stderr; cover .json() odd-address) have been applied and the threads are resolved. CrazyboyQCD's as_chunks suggestion is also incorporated. No CODEOWNERS cover these paths, the bug-hunting pass found nothing new, and CI is building on the latest merge with main.

@alii

alii commented May 30, 2026

Copy link
Copy Markdown
Member Author

@robobun rebase and get this mergable

@robobun

robobun commented May 30, 2026

Copy link
Copy Markdown
Collaborator

@alii this is already fixed on main by #31417 — same try_cast_slice + as_chunks::<2> fix at both Blob UTF-16LE decode sites and an equivalent odd-offset .slice(1) .text()/.json() regression test. Rebasing onto current main gives an empty diff, so there's nothing left to merge here. Closing as superseded.

@robobun robobun closed this May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants