blob: unaligned UTF-16LE decode for odd-length/odd-address views#31006
blob: unaligned UTF-16LE decode for odd-length/odd-address views#31006alii wants to merge 13 commits into
Conversation
|
Updated 7:11 AM PT - May 27th, 2026
✅ @robobun, your commit f6d633ad62fed625c8e65cab198fbcb271046200 passed in 🧪 To try this PR locally: bunx bun-pr 31006That installs a local version of the PR into your bun-31006 --bun |
WalkthroughBlob UTF-16LE BOM branches now truncate any trailing odd byte, attempt a safe ChangesBlob UTF-16LE BOM Odd-Length Handling
WTF assert macro use in tty helper
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
|
@robobun fix review comments |
bytemuck::cast_slice panics on odd alignment as well as odd length; .slice(odd) of a shared byte store hands an odd-address view straight to the cast. Replace with chunks_exact(2) + from_le_bytes, which drops the trailing odd byte and reads unaligned like Zig's reinterpretSlice. Test now covers the .slice(1) case and asserts stderr so the panic text shows up on regression.
There was a problem hiding this comment.
LGTM — both prior review points (odd-alignment abort + stderr assertion) are addressed; chunks_exact(2) + from_le_bytes handles unaligned/odd-length input and the test now covers .slice(1).
Extended reasoning...
Overview
This PR replaces bytemuck::cast_slice::<u8, u16>(buf) with an unaligned, length-tolerant decode (buf.chunks_exact(2).map(|c| u16::from_le_bytes([c[0], c[1]])).collect::<Vec<u16>>()) at the two UTF-16LE-BOM call sites in to_string_with_bytes (Blob.rs:2674) and to_json_with_bytes (Blob.rs:2865), and adds a subprocess regression test in test/js/web/fetch/blob.test.ts covering odd-length input, even-length sanity, and an odd-address .slice(1) view.
Prior review follow-up
In the previous round I flagged (🔴) that the original & !1 length-truncation fix still left the odd-address PodCastError::TargetAlignmentGreaterAndInputNotAligned abort reachable via Blob.slice(odd), and (🟡) that the test piped stderr without reading it. The new revision adopts the exact chunks_exact/from_le_bytes approach I suggested, which sidesteps both bytemuck checks entirely (no alignment requirement, trailing odd byte dropped by chunks_exact's remainder), and the test now reads/asserts stderr first and adds a .slice(1) misaligned case. Both points are fully resolved.
Security risks
None. The change moves from a panicking reinterpret-cast to a byte-by-byte copy into a fresh Vec<u16>; there is no unsafe, no pointer arithmetic, and no externally-controlled length used for allocation beyond what clone_utf16 already did. The behavioral change for malformed input (drop trailing odd byte) matches the prior Zig implementation's @divTrunc(len, 2) semantics, so this restores parity rather than introducing new lenience.
Level of scrutiny
Low–medium. The diff is ~20 effective Rust lines across two near-identical sites plus a test. It strictly removes a JS-triggerable uncatchable abort and replaces it with a safe copy that clone_utf16 was going to perform anyway (the comment correctly notes the extra Vec is one allocation either way — chunks_exact is ExactSizeIterator, so collect allocates once). The surrounding lifetime/cleanup logic (_free, Temporary/Transfer handling, OwnedString deref) is unchanged.
Other factors
No CODEOWNERS for this path. The bug-hunting system found no issues on this revision. The regression test is hermetic (subprocess via bun -e), asserts stderr == "" first per the repo's test convention, and exercises both the .text() and .json() paths plus the odd-alignment case that motivated the second commit.
<cassert> is not included by wtf-bindings.cpp, so whether 'assert' is visible depends on unified-source batching. Use the always-available WTF ASSERT macro instead.
There was a problem hiding this comment.
LGTM — both prior concerns addressed (unaligned chunks_exact decode at both sites, stderr asserted first, odd-address case covered).
Extended reasoning...
Overview
This PR fixes an uncatchable process abort when Blob.json() / Blob.text() encounter UTF-16LE+BOM bytes that are odd-length or odd-address. Two call sites in src/runtime/webcore/Blob.rs (to_json_with_bytes, to_string_with_bytes) replace bytemuck::cast_slice::<u8, u16>(buf) — which panic!s on both odd length and odd alignment — with buf.chunks_exact(2).map(|c| u16::from_le_bytes([c[0], c[1]])).collect(), which drops any trailing odd byte and reads unaligned, matching the Zig bun.reinterpretSlice semantics it ports. A subprocess regression test in test/js/web/fetch/blob.test.ts covers odd-length .json()/.text(), an even-length control, and odd-address via .slice(1). There is also a one-line drive-by in src/jsc/bindings/wtf-bindings.cpp swapping libc assert for WTF ASSERT in uv__tty_make_raw.
Security risks
None identified. The change replaces a panicking reinterpret-cast with a safe byte-wise copy into a Vec<u16>, strictly removing a JS-triggerable abort. No new unsafe blocks, no parsing of untrusted input beyond what already existed, no auth/crypto/permissions surface. The ASSERT swap is debug-only on both sides and the sole caller passes a stack address that cannot be NULL.
Level of scrutiny
Low-to-moderate. This is a narrow, well-understood crash fix in a code path (BOM-prefixed UTF-16LE blobs) that is not on the common hot path. The new code is straightforward standard-library Rust with no unsafe; clone_utf16 already copies, so the intermediate Vec does not change the allocation count. The wtf-bindings change is a cosmetic consistency tweak (the file already uses ASSERT elsewhere and does not include <cassert> directly).
Other factors
I previously flagged (1) that the original revision only fixed odd length and still aborted on odd alignment, and (2) that the test piped stderr without reading it. Both have been addressed: the chunks_exact approach handles length and alignment together, the test now asserts stderr === "" first and adds a .slice(1) odd-address case. Both inline threads are resolved. No CODEOWNERS apply to the touched files. The bug-hunting system found no issues in the current revision.
There was a problem hiding this comment.
LGTM — both review comments addressed: chunks_exact(2) + from_le_bytes now handles odd length and odd alignment, the test covers .slice(1) and asserts stderr first, and the assert→ASSERT swap is a trivial consistency fix.
Extended reasoning...
Overview
This PR fixes an uncatchable process abort when Blob.json()/.text() encounters a UTF-16LE-with-BOM byte slice that is odd-length or odd-address. It touches three files: src/runtime/webcore/Blob.rs (two identical ~10-line replacements of bytemuck::cast_slice::<u8,u16> with chunks_exact(2).map(u16::from_le_bytes).collect()), test/js/web/fetch/blob.test.ts (one new subprocess regression test), and src/jsc/bindings/wtf-bindings.cpp (a one-line assert→ASSERT macro swap in uv__tty_make_raw, whose only caller passes a stack address, so it's a no-op consistency change with the rest of the file).
Prior review follow-up
I left two comments on the first revision: a 🔴 noting that & !1 only fixed odd length and cast_slice would still abort on odd alignment via .slice(odd), and a 🟡 nit that the test piped stderr without reading it. Both are now resolved exactly as suggested — the chunks_exact approach handles length and alignment in one go, the test adds a .slice(1) odd-address case, and stderr is asserted first so a regression surfaces the panic text. Both threads are marked resolved on the PR.
Security risks
None. This is pure data-decoding of user-supplied blob bytes; the new path is strictly safer than the old one (no unsafe, no reinterpret cast, bounded iteration over the input slice). Worst case for malformed input is a dropped trailing byte, matching Zig's @divTrunc semantics.
Level of scrutiny
Low-to-moderate. The change is small, mechanical, and replaces a panicking cast with a well-known safe idiom. clone_utf16 already copies into a fresh WTFStringImpl (verified at src/bun_core/string/mod.rs:239), so the intermediate Vec<u16> adds one allocation on a cold path (BOM-prefixed UTF-16 blobs only) — an acceptable tradeoff for eliminating an uncatchable abort. No CODEOWNERS paths are touched.
Other factors
The robobun build-status comment shows a ❌ for d6b7f87, but the follow-up commit 56d2716 is an empty CI retrigger explicitly noting "darwin-aarch64 jobs expired in queue" — an infra timeout, not a test failure. No new bugs were found by the bug-hunting system on the current revision.
|
This fix already landed on |
Jarred-Sumner
left a comment
There was a problem hiding this comment.
We should instead detect if it's unaligned and only do this operation when it is unaligned. Otherwise, we are duplicating data unnecessarily.
…saligned try_cast_slice succeeds for the common aligned case (file reads, whole byte stores) and hands &[u16] straight to clone_utf16 with no intermediate allocation; only odd-address views from .slice(odd) fall through to the chunks_exact copy.
There was a problem hiding this comment.
LGTM — both prior review points are addressed: try_cast_slice fast path with chunks_exact fallback covers odd-length and odd-address, and the test now asserts stderr first.
Extended reasoning...
Overview
This PR fixes an uncatchable process abort in Blob.json()/.text() when a UTF-16LE-with-BOM byte view is odd-length or odd-address. Two identical hunks in src/runtime/webcore/Blob.rs (to_json_with_bytes, to_string_with_bytes) replace bytemuck::cast_slice::<u8, u16> — which panic!s on either condition — with: truncate to even length (& !1), then try_cast_slice for the common 2-aligned case, falling back to a chunks_exact(2) → u16::from_le_bytes → Vec<u16> copy when misaligned. A subprocess regression test in test/js/web/fetch/blob.test.ts covers odd-length .json()/.text(), an even-length control, and an odd-address .slice(1) view. The unrelated one-liner in src/jsc/bindings/wtf-bindings.cpp swaps libc assert for WTF ASSERT in uv__tty_make_raw, matching the rest of the file.
Prior review
I previously flagged (1) that the original & !1 length-only fix still aborted on odd-address views via .slice(odd), and (2) that the test piped stderr but never read it. Both inline comments are resolved: d2f4866 switched to unconditional chunks_exact, then d8d9200 restored a zero-copy try_cast_slice fast path per Jarred's request while keeping the unaligned fallback. The test now reads stderr and asserts it empty first, and exercises the odd-address path. The current bug-hunt pass found nothing on the latest revision.
Security risks
None. No untrusted-input parsing surface is widened; the change strictly removes a JS-triggerable native panic. The native-endian reinterpret in the fast path matches both the pre-PR behavior and Zig's reinterpretSlice, and Bun targets only LE platforms.
Level of scrutiny
Moderate — this is on the Blob text/JSON read path, but the change is small, mechanical, and mirrors Zig semantics (@divTrunc + unaligned read). The deferred-init let unaligned: Vec<u16>; … &unaligned pattern is idiomatic Rust for conditional allocation and borrow-checks correctly. The assert → ASSERT swap is a no-op consistency change (other assertions in the file already use ASSERT).
Other factors
No CODEOWNERS cover these paths. All review threads are resolved, and the latest commit directly implements the optimization a maintainer asked for. The regression test is deterministic, follows the established Bun.spawn subprocess pattern in the same file, and would surface the panic text on regression.
There was a problem hiding this comment.
LGTM — both prior review comments were addressed (try_cast_slice + chunks_exact fallback covers odd length and odd alignment, and the test now asserts stderr first).
Extended reasoning...
Overview
This PR fixes an uncatchable process abort (exit 133) when Blob.json()/.text() encounter UTF-16LE+BOM bytes that are odd-length or odd-address. The two affected sites in src/runtime/webcore/Blob.rs (to_json_with_bytes, to_string_with_bytes) replace bytemuck::cast_slice::<u8, u16> — which panic!s on either condition — with: truncate to even length (& !1), then try_cast_slice for the common 2-aligned zero-copy path, falling back to chunks_exact(2).map(u16::from_le_bytes).collect() only when misaligned. A subprocess regression test in test/js/web/fetch/blob.test.ts covers odd-length .json()/.text(), an even-length control, and odd-address via .slice(1). There's also a one-line drive-by assert → ASSERT in wtf-bindings.cpp (the file doesn't include <cassert> directly).
Security risks
None. This is purely defensive handling of byte-slice geometry on a read path; no new inputs are trusted, no auth/crypto/permissions touched. The change strictly removes a JS-triggerable native abort and otherwise preserves the prior decode semantics (matching Zig's reinterpretSlice truncate-and-read-unaligned behavior). The Ok branch reinterprets bytes as native-endian u16 exactly as the old cast_slice did, so no behavior change on Bun's LE-only targets.
Level of scrutiny
Low–medium. Small, mechanical, two near-identical hunks at well-understood call sites, with a regression test that exercises both failure modes in a subprocess (since the original failure was an uncatchable abort). The deferred-init let unaligned: Vec<u16>; ... &unaligned pattern is borrow-checker-sound and scoped correctly relative to the _free guard. No CODEOWNERS entries cover these paths.
Other factors
I previously left two inline comments on this PR — a 🔴 for the missing odd-alignment handling and a 🟡 nit on the test's unread stderr — and both have been fully addressed in d2f4866/d8d9200 (threads resolved). The bug-hunting system found no issues on the current revision. CI passed blob.test.ts on all lanes per the build-bot summary; the one red lane was an unrelated S3/network flake. The ASSERT swap is trivial and consistent with the rest of the file.
|
@robobun there are conflicts |
Main landed the & !1 odd-length truncation independently; keep that and layer the try_cast_slice + chunks_exact fallback on top for odd-address views. Preserve main's ZIG PARITY comment in to_string_with_bytes.
There was a problem hiding this comment.
LGTM — both prior comments addressed (try_cast_slice + chunks_exact fallback, stderr asserted in test).
Extended reasoning...
Overview
Replaces panicking bytemuck::cast_slice::<u8, u16> with & !1 even-length truncation followed by try_cast_slice + a chunks_exact(2).map(u16::from_le_bytes) fallback at the two UTF-16LE BOM sites in Blob::to_string_with_bytes / to_json_with_bytes. Adds a subprocess regression test covering odd-length and odd-address (.slice(1)) inputs. The wtf-bindings.cpp change mentioned in the description is no longer in the diff (presumably landed on main during the merge).
Security risks
None. This is pure decode-path hardening: it removes a JS-triggerable native panic! (process abort) and replaces it with a safe borrow-or-copy. No new untrusted-input parsing, no auth/crypto/permissions surface.
Level of scrutiny
Low–medium. ~30 lines of localized logic across two near-identical sites, mirroring Zig's bun.reinterpretSlice semantics (drop trailing odd byte, read unaligned). The deferred-init let unaligned: Vec<u16>; + match pattern is idiomatic and borrow-checks correctly. The Ok arm reinterprets as native-endian (same as the pre-existing code) while the Err arm uses explicit LE — these only diverge on big-endian hosts, which Bun does not target, and it is not a behavior change from the original cast_slice.
Other factors
Both inline comments I left on the first revision are resolved: the odd-address abort is now handled via try_cast_slice fallback, and the test reads/asserts stderr first so a future panic surfaces in CI output. The test also adds an explicit .slice(1) odd-address case. No CODEOWNERS cover src/runtime/webcore/. The most recent "there are conflicts" comment was followed by merge commit 2338403, which CI is now building.
Const-generic chunk size lets the compiler elide per-element bounds checks and from_le_bytes takes the [u8; 2] array directly.
There was a problem hiding this comment.
LGTM — both prior review points (odd-address fallback, stderr assertion) and the as_chunks request are addressed, and the regression test covers all four cases.
Extended reasoning...
Overview
Two identical edits in src/runtime/webcore/Blob.rs (to_string_with_bytes and to_json_with_bytes, UTF-16LE BOM arms) replace the panicking bytemuck::cast_slice::<u8, u16> with try_cast_slice + an unaligned fallback that collects as_chunks::<2>().0 through u16::from_le_bytes into a Vec<u16>. The & !1 even-length truncation is retained, so the only remaining Err case is odd address, which the fallback handles. The happy path (aligned, even-length — file reads, whole stores) still borrows directly with no extra allocation. A new subprocess test in test/js/web/fetch/blob.test.ts exercises odd-length .json()/.text(), an even-length control, and odd-address via .slice(1), asserting stderr first.
Security risks
None. This is a defensive replacement of a process-aborting cast with a non-panicking try + copy fallback on the Blob text/JSON decode path. No new inputs are accepted and no trust boundaries change; the only behavioral difference is that previously-aborting inputs now decode (matching the Zig reinterpretSlice semantics this code ports).
Level of scrutiny
Moderate — it's runtime code on a user-reachable path, but the change is mechanical and tightly scoped to two call sites with an identical pattern. The let unaligned: Vec<u16>; ... &unaligned deferred-init shape is sound, as_chunks::<2>().0 on an already-even slice yields exactly len/2 pairs, and from_le_bytes is the correct decode for UTF-16LE. The fast-path reinterpret remains native-endian like the original cast_slice and the Zig source, so no regression on Bun's LE targets.
Other factors
All three review threads are resolved: my earlier odd-address concern (now try_cast_slice + fallback, with a .slice(1) test case), my stderr-assertion nit (now read and asserted first), and CrazyboyQCD's as_chunks request (applied in f64750c). The bug-hunting pass found nothing on the current revision, and robobun reports all 17 blob tests passing.
…ngth # Conflicts: # test/js/web/fetch/blob.test.ts
bun-debug is ASAN + debug allocator, so RSS inflates the same way the named bun-asan CI binary does; isASAN only matches the binary name.
There was a problem hiding this comment.
LGTM — both my earlier points (odd-address fallback, stderr assertion) and the as_chunks suggestion are addressed; the try_cast_slice + as_chunks::<2> fallback is correct.
Extended reasoning...
Overview
Two-file change: src/runtime/webcore/Blob.rs replaces the panicking bytemuck::cast_slice::<u8, u16> at the two UTF-16LE-BOM sites (to_string_with_bytes, to_json_with_bytes) with even-length truncation → try_cast_slice (zero-copy when 2-aligned) → as_chunks::<2>().0.iter().map(u16::from_le_bytes).collect() fallback for odd-address views. test/js/web/fetch/blob.test.ts adds a subprocess regression test covering odd-length, even-length, and odd-address (.slice(1)) cases, plus a small unrelated relaxation of an RSS leak threshold under debug builds.
Security risks
None. The change converts a JS-triggerable native panic! (process abort) into correct decoding, which is strictly a robustness improvement. No new untrusted-input parsing surface; the fallback path only allocates a bounded Vec<u16> proportional to the existing buffer.
Level of scrutiny
Low–medium. This is a small, mechanical fix to a narrow code path (UTF-16LE BOM branch of Blob text/json), mirroring the prior Zig semantics. The deferred-init let unaligned: Vec<u16>; … &unaligned pattern is sound. The Ok path reinterprets as native-endian and the Err path uses from_le_bytes, which are equivalent on all Bun targets (LE-only) and match pre-PR behavior.
Other factors
I reviewed an earlier revision and flagged (1) the unhandled odd-alignment case and (2) the unread stderr in the test — both are fixed in the current diff, and the inline threads are resolved. CrazyboyQCD's as_chunks request is also applied. No CODEOWNERS cover these paths. The robobun binary-size failure is a stale-baseline artifact from the branch having been 81 commits behind before merging main; this ~30-line change cannot account for +11 MB and is not a code-review concern.
There was a problem hiding this comment.
LGTM — all prior feedback addressed; the try_cast_slice + as_chunks::<2> fallback correctly handles both odd-length and odd-address views, and the regression test now covers the full matrix.
Extended reasoning...
Overview
Two near-identical hunks in src/runtime/webcore/Blob.rs (to_string_with_bytes and to_json_with_bytes) replace a panicking bytemuck::cast_slice::<u8, u16> with & !1 length truncation followed by bytemuck::try_cast_slice and an as_chunks::<2>().0.iter().map(u16::from_le_bytes).collect() fallback for odd-address views. A new subprocess-based regression test in test/js/web/fetch/blob.test.ts covers odd-length × {.json(), .text()}, an even-length control, and odd-address × {.json(), .text()} via .slice(1). There's also a small drive-by relaxation of an unrelated leak-test threshold to treat isDebug like isASAN.
Security risks
None. This is pure byte-decode logic that replaces an uncatchable process abort with a safe copy fallback. No user-controlled control flow, no auth/crypto/permissions, no new external surface.
Level of scrutiny
Low-to-medium. The change is a focused, well-understood crash fix in a Rust port of existing Zig behaviour (bun.reinterpretSlice does an unaligned reinterpret). The try_cast_slice → Ok path preserves the prior zero-copy borrow for the common aligned case; only the odd-address edge case allocates a temporary Vec<u16>, which is then copied again by clone_utf16 exactly as before. The deferred-init let unaligned: Vec<u16>; pattern is sound and borrow-checked.
Other factors
This PR has been through three review rounds. My earlier 🔴 (odd-alignment still panics) and two 🟡 nits (stderr assertion, .slice(1).json() coverage) were all applied and resolved, as was a contributor suggestion to use as_chunks over chunks_exact. The bug-hunting system found nothing on the current revision (3afd712). The test follows the file's established Bun.spawn + stderr-first pattern. The isASAN || isDebug threshold tweak is a low-risk test-robustness change with an explanatory comment.
There was a problem hiding this comment.
LGTM — all prior feedback addressed; the try_cast_slice + as_chunks::<2> fallback closes both the odd-length and odd-address abort paths and the subprocess test covers the full matrix.
Extended reasoning...
Overview
Two near-identical UTF-16LE-BOM branches in src/runtime/webcore/Blob.rs (to_string_with_bytes / to_json_with_bytes) replace bytemuck::cast_slice::<u8, u16> — which panic!s on odd length or odd address — with & !1 truncation, then bytemuck::try_cast_slice for the common 2-aligned zero-copy path, falling back to as_chunks::<2>().0.iter().map(u16::from_le_bytes).collect() only when the view is odd-address. A new subprocess regression test in test/js/web/fetch/blob.test.ts exercises odd-length .json()/.text(), an even control, and odd-address .slice(1) for both .text() and .json(), asserting stderr first so a regression surfaces the panic text. A small drive-by loosens an RSS threshold for isDebug builds in an adjacent leak test.
Security risks
None. This is a crash-hardening fix in a string-decode path; no auth, crypto, permissions, or untrusted-command surfaces are touched. The change strictly removes a JS-triggerable process abort.
Level of scrutiny
Low–medium. The diff is ~30 lines of mechanical Rust at two sites, mirroring Zig's unaligned reinterpretSlice semantics. The deferred-init let unaligned: Vec<u16>; … &unaligned pattern is standard and borrow-checks; as_chunks::<2> after & !1 has an empty remainder so no bytes are dropped beyond the intentional trailing odd byte. The Ok arm reinterprets native-endian and the Err arm decodes little-endian, but Bun only targets LE hosts and the pre-existing cast_slice already had native-endian semantics, so this is not a behavioral change.
Other factors
All three of my earlier inline comments (odd-address still panics; read/assert stderr; cover .json() odd-address) have been applied and the threads are resolved. CrazyboyQCD's as_chunks suggestion is also incorporated. No CODEOWNERS cover these paths, the bug-hunting pass found nothing new, and CI is building on the latest merge with main.
|
@robobun rebase and get this mergable |
|
@alii this is already fixed on main by #31417 — same |
Reading a UTF-16LE-with-BOM JSON/text Blob/
Bun.file()aborts the whole process (exit 133, uncatchable — a JStry/catcharoundawait blob.json()does not catch it) when the byte view is odd-address — e.g..slice(odd)of a shared byte store (shared_view_raw()returnsbase + offset).to_json_with_bytes/to_string_with_bytestruncate to even length (& !1) and then dobytemuck::cast_slice::<u8, u16>(buf), which stillpanic!s (not a debug assert) onTargetAlignmentGreaterAndInputNotAlignedwhen the address is odd. Stripping the 2-byte BOM doesn't change address parity. Zig'sbun.reinterpretSlice(u16, buf)does an unaligned@ptrCastand never checks alignment.At both sites: after the
& !1truncation, usebytemuck::try_cast_slice— it succeeds for the common 2-aligned case (file reads, whole byte stores) and borrows&[u16]straight intoclone_utf16with no extra allocation. Only when the view is odd-address does it fall back tochunks_exact(2).map(u16::from_le_bytes).collect::<Vec<u16>>().Regression test spawns a subprocess (the failure is an uncatchable abort), covers odd-length
.json()/.text(), even-length control, and odd-address vianew Blob([pad+BOM+data]).slice(1).text(), and asserts stderr first so the panic text surfaces on regression.