perf: use as_chunks* instead of chunks_exact* for static chunk sizes#31415
perf: use as_chunks* instead of chunks_exact* for static chunk sizes#31415robobun wants to merge 7 commits into
as_chunks* instead of chunks_exact* for static chunk sizes#31415Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (5)
WalkthroughReplaces runtime ChangesCompile-time chunk iteration optimization
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
|
Added test coverage in Since this is a byte-identical codegen change (
Verified the byte-for-byte equivalence of all four conversions (char_freq 32-byte, codecs 4-byte RGBA, encoding 2-byte, websocket 2-element) against their |
This comment was marked as resolved.
This comment was marked as resolved.
|
Good catch — you're right that
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/js/node/buffer-utf16.test.ts`:
- Around line 41-79: Replace the manual for...of loop that iterates the encoding
array with a Jest describe.each(...) table so tests are parameterized;
specifically, change the construct that uses for (const encoding of ["utf16le",
"utf-16le", "ucs2", "ucs-2"] as const) { describe(encoding, () => { ... }) } to
describe.each([...])('encoding %s', (encoding) => { ... }) and move the current
inner tests (the empty string, single byte, high bytes, every Latin-1 byte, and
long input tests that call Buffer.from(..., encoding) and
buf.toString("utf16le")) into that callback so each encoding value runs the same
suite. Ensure no other test logic is modified.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: d024782c-50e2-4d3d-9c93-7f5c014265b6
📒 Files selected for processing (7)
src/ast/char_freq.rssrc/http_jsc/websocket_client/WebSocketUpgradeClient.rssrc/install/lockfile/Package.rssrc/runtime/image/codecs.rssrc/runtime/webcore/encoding.rstest/cli/install/bun-lockb-field-validation.test.tstest/js/node/buffer-utf16.test.ts
This comment was marked as resolved.
This comment was marked as resolved.
|
Done — converted That's the last static-size |
This comment was marked as resolved.
This comment was marked as resolved.
b23559e to
c570f1b
Compare
|
Done — converted This site came from #31385 (landed after my branch point), so I rebased onto current main to pick it up. It's already covered by that PR's With this, every convertible static-size |
This comment was marked as duplicate.
This comment was marked as duplicate.
c570f1b to
81df47a
Compare
|
Applied — thanks, that's cleaner. Switched the tail to iterate the |
This comment was marked as resolved.
This comment was marked as resolved.
81df47a to
82596da
Compare
|
Good catch — no, the remainder-based tail would not match the original when In practice the contract is |
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/cli/install/bun-lockb-field-validation.test.ts`:
- Around line 63-88: Add a new sibling test in the same file that mirrors the
Meta.origin test but corrupts a Bin.tag byte and asserts parseLockfile throws
the "invalid bin tag" path; use the same helpers (tempDir, installFileDep,
metaColumn, parseLockfile) to create the test fixture, locate the lockb via
metaColumn, compute the byte offset for the Bin.tag within a record (targeting
the appropriate byte in the 88-byte record), flip it to an out-of-range value
(e.g. 0x42), write the file back, and expect parseLockfile(String(dir))
toThrow("Lockfile validation failed: invalid bin tag").
- Around line 37-40: The test currently treats any fmt other than 2 as a 72-byte
resolution record which can silently corrupt tests; change the logic that
computes resolutionSize (and any downstream use of fmt) to explicitly handle
known formats and throw on unsupported ones: validate that fmt === 2 (and add
other explicit branches if you support more formats) and throw a clear error if
fmt is unexpected instead of defaulting to 72; update the resolutionSize
assignment (and any dependent variables) to only set when the format is
recognized so failures surface immediately.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: f1be0bd1-9b70-4bb6-8509-a3e23c1d53d0
📒 Files selected for processing (9)
scripts/verify-baseline-static/src/main.rssrc/ast/char_freq.rssrc/bun_core/lib.rssrc/http_jsc/websocket_client/WebSocketUpgradeClient.rssrc/install/lockfile/Package.rssrc/runtime/image/codecs.rssrc/runtime/webcore/encoding.rstest/cli/install/bun-lockb-field-validation.test.tstest/js/node/buffer-utf16.test.ts
82596da to
af6ee00
Compare
|
Agreed — the |
af6ee00 to
5ef211a
Compare
Note for maintainers on test coverageThis PR is a behavior-preserving change — it swaps Because of that, the included tests can't be written to fail on the pre-change code and pass on the post-change code — the two produce the same result by construction. Instead they pin the exact observable behavior of the converted code paths so any future regression in those loops is caught:
|
There was a problem hiding this comment.
No bugs found and all review nits are addressed, but this touches seven hot-path sites (SWAR ASCII copy, UTF-16 widening, lockfile field validation, char-freq scan, image RGBA, WS headers, aarch64 scanner) plus a new test that hardcodes bun.lockb binary offsets — worth a quick human pass, especially on the copy_ascii_prefix remainder-iteration invariant and the musl build status.
Extended reasoning...
Overview
Mechanical chunks_exact* → as_chunks* conversion across 7 Rust sites where the chunk size is a compile-time constant, so the compiler can elide per-element bounds checks. Touches src/bun_core/lib.rs (SWAR ASCII fast path), src/runtime/webcore/encoding.rs (Latin-1→UTF-16 widening), src/install/lockfile/Package.rs (Meta/Bin discriminant validation), src/ast/char_freq.rs, src/runtime/image/codecs.rs, src/http_jsc/websocket_client/WebSocketUpgradeClient.rs, and scripts/verify-baseline-static/src/main.rs. Adds two new test files and Rust unit tests for the scanner.
Security risks
None identified. The lockfile validation logic is unchanged (only let stride → const STRIDE and chunks_exact → as_chunks); discriminant range checks are byte-identical. No auth/crypto/permissions surface.
Level of scrutiny
Medium. Each conversion is individually trivial and semantics-preserving, but in aggregate this is a cross-cutting change to performance-critical runtime code. Two specifics push it past my auto-approve threshold:
copy_ascii_prefixremainder handling (bun_core/lib.rs): the PR thread had a real correctness back-and-forth — iteratingdst_remainder/src_remainderis only equivalent to the originaldst[copied..]/src[copied..]reslice whendst.len() == src.len(). That holds via thedebug_assert_eq!and the sole caller's contract, but it's an invariant a maintainer should consciously sign off on.bun-lockb-field-validation.test.tshardcodes binary-format offsets (header fields at 42/86/110,META_SIZE = 88,BIN_SIZE = 20, column layout). It throws on unknown format versions, which is good, but someone who owns the lockfile format should confirm these constants and that this test won't be a maintenance trap.
Other factors
All prior inline comments (mine and CodeRabbit's) are resolved. The issue reporter actively reviewed and shaped the final copy_ascii_prefix form. No CODEOWNERS coverage on the touched paths. The last CI snapshot in the thread (af6ee00) showed x64-musl build failures; three commits have landed since and I can't confirm green CI from here.
There was a problem hiding this comment.
I didn't find any correctness issues — each chunks_exact → as_chunks conversion is semantics-preserving and the one subtle case (copy_ascii_prefix remainder handling) is sound given the dst.len() == src.len() invariant. Deferring to a maintainer because this touches seven distinct hot paths (SWAR ASCII copy, UTF-16 widening, lockfile field validation, char-freq scanner, RGBA normalize, WS headers, ARM64 scanner) and Build #58126 is currently red.
Extended reasoning...
Overview
Mechanical refactor replacing slice.chunks_exact*(N) with slice.as_chunks*::<N>() at every call site where N is a compile-time constant, so chunk elements become &[T; N] and per-element bounds checks are elided. Seven Rust sites across src/ + scripts/, plus new test coverage (buffer-utf16.test.ts cases, bun-lockb-field-validation.test.ts, and #[cfg(test)] unit tests for scan_aarch64). The one remaining chunks_exact (Package.rs Resolution) is documented as blocked on generic_const_exprs.
Security risks
None introduced. The lockfile Meta/Bin validation loops still check the same discriminant bytes at the same offset_of!-derived offsets before reinterpreting raw bytes as #[repr(u8)] enums; the only change is let stride = size_of() → const STRIDE: usize = size_of(). No new unsafe, no parser surface widened.
Level of scrutiny
Medium-high. Each individual rewrite is trivially equivalent, but the set spans several performance-critical paths: copy_ascii_prefix (the SWAR fast path for TextEncoder/Latin-1→UTF-8), construct_from_u8's UTF-16 widening (every Buffer.from(str, 'utf16le')), scan_big (minifier char-frequency), and binary-lockfile field validation. The copy_ascii_prefix change in particular went through three iterations on this PR over whether iterating dst_remainder/src_remainder is equivalent to reslicing from copied — it is, but only because debug_assert_eq!(dst.len(), src.len()) holds (confirmed at src/bun_core/lib.rs:1973), and that reasoning is worth a maintainer's second look.
Other factors
- The issue author actively reviewed across several rounds and signed off on the final
copy_ascii_prefixshape. - All prior inline comments (mine and CodeRabbit's) are resolved; the bug-hunting pass found nothing.
- robobun reports Build #58126 failing on commit 5ef211a, with a
ci: retriggercommit (044e995) on top — CI should be green before merge. - The new
bun-lockb-field-validation.test.tshardcodes binary-lockfile layout offsets (header fields at 42/86/110,META_SIZE=88,BIN_SIZE=20); the test self-guards with a sanityexpect(...).toBeLessThanOrEqual(...)before corrupting, and throws on unknown format versions, so layout drift will fail loudly rather than silently.
Why there's no fail-before test (for reviewers)This is a behavior-preserving refactor: A consequence worth stating plainly: no test can be written that fails on the pre-change code and passes on the post-change code, because there is no observable difference to detect. The included tests instead pin the exact behavior of every converted loop so a future regression is caught:
So a fully green CI plus these equivalence tests is the strongest guarantee available for this class of change; there's nothing further I can add on the test side without introducing an artificial behavioral difference (which would defeat the refactor). |
CI statusThe diff is green — the only red lane is I've used my one CI re-roll already (the |
`slice::as_chunks::<N>()` / `as_chunks_mut::<N>()` return a slice of fixed-length arrays `&[[T; N]]` plus a remainder, whereas `chunks_exact(n)` yields runtime-length `&[T]` slices. With the compile-time array length the compiler knows each `chunk[i]` access for `i in 0..N` is in bounds and elides the per-element bounds checks that `chunks_exact` leaves in. Converted the four sites where the chunk size is a compile-time constant: - ast/char_freq.rs: `scan_big` histogram loop over 32-byte chunks. Drops the now-redundant manual `split_at` since `as_chunks` returns the tail remainder directly. - runtime/image/codecs.rs: GIF alpha normalization over 4-byte RGBA pixels. - runtime/webcore/encoding.rs: Latin-1 -> UTF-16 widening over 2-byte chunks; the array element lets `*out = ...to_ne_bytes()` replace `copy_from_slice`. - http_jsc/websocket_client/WebSocketUpgradeClient.rs: header name/value pair iteration over 2-element chunks. The `chunks_exact(stride)` sites in install/lockfile/Package.rs are left as-is: `stride` is a runtime `mem::size_of::<T>()` value, not a const generic, so it can't use `as_chunks`.
Extends the existing buffer-utf16 test with explicit coverage of the
`Buffer.from(latin1String, "utf16le" | "ucs2")` widening loop that the
`as_chunks::<2>` conversion in encoding.rs rewrote.
The string stays Latin-1-backed (8-bit) for code points <= U+00FF, which
routes through `Bun__encoding__constructFromLatin1` and the byte-pair
widening loop. Cases: empty input, single byte, high-byte zero-extension
(0x80-0xFF must become `ff 00`, not `ff ff`), the full 0x00-0xFF range,
and a 1000-char input that spans many chunks and round-trips back through
`toString("utf16le")` — the long/multi-chunk cases exercise the rewritten
loop across every chunk.
This is a behavior-preserving codegen change (as_chunks yields fixed-length
arrays so per-element bounds checks are elided), so the assertions pin the
exact output rather than a before/after behavioral delta.
044e995 to
8e52c2b
Compare
|
Rebased onto current main (the branch had fallen ~118 commits behind). Conflicts were trivial: only |
Converts the `Meta` and `Bin` raw-byte validation loops in `Package::load_fields` from `chunks_exact(stride)` to `as_chunks::<STRIDE>().0`. Both types are concrete, so `size_of` and the `offset_of!` field offsets are promoted to `const` — the chunk length and the element indices (`raw[ORIGIN_AT]`, `raw[TAG_AT]`, ...) are then all statically known and their bounds checks are elided. The adjacent `Resolution` loop stays on `chunks_exact`: `size_of::<ResolutionType<SemverIntType>>()` depends on the generic `SemverIntType`, and a const-generic argument that uses a type parameter requires the unstable `generic_const_exprs`, which this crate does not enable. A comment documents this. Adds `bun-lockb-field-validation.test.ts`, which exercises the loader offline (a `file:` tarball dependency, parsed back via `install_test_helpers.parseLockfile`, so no registry is needed): - a valid binary lockfile round-trips (the meta/bin validation loops run over real column bytes and accept them); - a lockfile with `meta.origin` flipped to an out-of-range byte is rejected with "invalid package meta" — the per-element check inside the converted `as_chunks` loop.
Replaces the 1000-iteration per-byte .toBe loop (2000 assertions per encoding) with one .toEqual Buffer comparison, per test/CLAUDE.md's preference for .toEqual over many .toBe calls. Also gives a byte-level diff on failure instead of a bare index mismatch.
Per test conventions, parameterized tests use describe.each() rather than a manual for...of loop.
`scan_aarch64` iterates `.text` in 4-byte ARM64 instruction words. Switch `chunks_exact(4)` to `as_chunks::<4>().0`, which yields `&[u8; 4]` words so the per-element byte accesses are statically in bounds; the word decode then becomes `u32::from_le_bytes(*chunk)` instead of indexing `chunk[0..4]`. The trailing-1-3-byte drop is preserved (the remainder we ignore). Adds unit tests for `scan_aarch64` covering the converted loop: a post-baseline word is flagged as a violation, each 4-byte chunk maps to `sec_addr + i*4`, and a trailing partial word is dropped.
The scalar 8-byte ASCII fast path in `copy_ascii_prefix` reconstructed each word via `u64::from_ne_bytes(s.try_into().expect(...))` over `chunks_exact_mut(8)`/`chunks_exact(8)`. Switching to `as_chunks_mut::<8>()`/`as_chunks::<8>()` yields `&[u8; 8]` words plus an equal-length remainder (the two slices are the same length, asserted at the top of the function), so: - the word load is `u64::from_ne_bytes(*s)` with no fallible `try_into`, - the store is a fixed-size array assignment `*d = word.to_ne_bytes()`, - the sub-word tail iterates the `as_chunks` remainders directly. Covered by the existing `TextEncoder latin1 ASCII fast path boundaries` tests, which encode all-ASCII strings across the SWAR thresholds and place a non-ASCII byte at every 8-byte boundary — exercising both the word loop and its early-out.
8e52c2b to
8f154c7
Compare
There was a problem hiding this comment.
I didn't find any issues — each chunks_exact → as_chunks conversion is a clean semantics-preserving rewrite with good test coverage — but since this spans seven production hot paths (core SWAR string copy, Buffer encoding, the binary lockfile loader, bundler char-freq, image/GIF, WebSocket headers) and the copy_ascii_prefix remainder handling needed an invariant discussion to settle, it's worth a maintainer sign-off.
Extended reasoning...
Overview
This PR mechanically converts every static-chunk-size chunks_exact*(N) call site in src/ and scripts/ to as_chunks*::<N>(), so the compiler sees fixed-length &[u8; N] arrays and can elide per-element bounds checks. Seven production Rust files are touched: bun_core/lib.rs (copy_ascii_prefix SWAR loop), runtime/webcore/encoding.rs (Latin-1→UTF-16 widening), install/lockfile/Package.rs (Meta/Bin discriminant validation), ast/char_freq.rs (32-byte scan), runtime/image/codecs.rs (GIF α=0 normalization), http_jsc/.../WebSocketUpgradeClient.rs (header-pair iter), and the verify-baseline-static build tool. Two test files (one new) pin the observable behavior of the converted loops.
Security risks
None introduced. The lockfile Meta/Bin validation loops parse untrusted on-disk bytes, but the conversion preserves the exact same per-record discriminant range checks (now with const STRIDE/const OFFSET instead of let); the new bun-lockb-field-validation.test.ts exercises both accept and reject paths. No new unsafe blocks; one try_into().expect() is removed in favor of a direct array deref.
Level of scrutiny
Moderate. Each individual rewrite is a textbook as_chunks upgrade and is semantics-preserving by construction (verified byte-identical per the PR thread). However, the changes fan out across several critical hot paths in the runtime — core string SWAR (copy_ascii_prefix), Buffer encoding, and the binary lockfile parser — and the copy_ascii_prefix tail-remainder change went through a correctness back-and-forth before settling on the debug_assert_eq!(dst.len(), src.len()) invariant that makes the remainder form sound. That's exactly the kind of subtlety a maintainer familiar with the callers should confirm.
Other factors
All prior review comments (my three nits, CodeRabbit's four, and the issue author's three follow-up requests) are addressed and resolved. The bug-hunting system found nothing. CI is green except an expired darwin runner (infra). The new lockfile test hardcodes binary layout offsets (42/86/110, 88-byte Meta, 20-byte Bin) but explicitly throws on unknown format versions, so future format bumps fail loudly rather than silently. None of the touched files are CODEOWNERS-covered. Given the breadth across critical paths, I'm deferring rather than auto-approving.
CI: the
|
Closes #31414.
Ask
From #31414 (@CrazyboyQCD):
chunks_exact*yields runtime-length&[T]slices, whileas_chunks*returns a slice of fixed-length arrays&[[T; N]](plus a remainder). With the compile-time array length, the compiler knows eachchunk[i]access fori in 0..Nis in bounds and can elide the per-element bounds checks thatchunks_exactleaves in.slice::as_chunks/as_chunks_mutare stable since Rust 1.88, so they're available on the pinned toolchain (nightly-2026-05-06) as ordinary APIs.Changes
Converted every site in the repo where the chunk size is a compile-time constant:
ast/char_freq.rschunks.chunks_exact(SCAN_BIG_CHUNK_SIZE)(const= 32); inner loop indexeschunk[i]fori in 0..32text.as_chunks::<SCAN_BIG_CHUNK_SIZE>()—chunk: &[u8; 32]. Drops the now-redundant manualsplit_at/unrolledsinceas_chunksreturns the tail remainder directlyruntime/image/codecs.rsd.rgba.chunks_exact_mut(4)(RGBA pixels)d.rgba.as_chunks_mut::<4>().0—px: &mut [u8; 4]runtime/webcore/encoding.rsto.chunks_exact_mut(2).zip(input_slice)+copy_from_slice(Latin-1 to UTF-16 widening)to.as_chunks_mut::<2>().0.iter_mut().zip(...)—out: &mut [u8; 2], so*out = u16::from(b).to_ne_bytes()replacescopy_from_slicebun_core/lib.rs(copy_ascii_prefix)dst.chunks_exact_mut(8).zip(src.chunks_exact(8))+u64::from_ne_bytes(s.try_into().expect(...))(SWAR ASCII fast path)dst.as_chunks_mut::<8>().0.iter_mut().zip(src.as_chunks::<8>().0)—s: &[u8; 8], sou64::from_ne_bytes(*s)drops the fallibletry_intohttp_jsc/websocket_client/WebSocketUpgradeClient.rsself.slices.chunks_exact(2).map(...)(header name/value pairs)self.slices.as_chunks::<2>().0.iter().map(...)—pair: &[Utf8Slice; 2]install/lockfile/Package.rs(Meta,Binblocks)src.chunks_exact(stride)with runtimestride/offsetssrc.as_chunks::<STRIDE>().0withconst STRIDE/const ORIGIN_AT/const TAG_AT— both the chunk length and the element indices are staticscripts/verify-baseline-static/src/main.rsbytes.chunks_exact(4)over ARM64 instruction wordsbytes.as_chunks::<4>().0—chunk: &[u8; 4], sou32::from_le_bytes(*chunk)replaces thechunk[0..4]indexingThese are semantics-preserving rewrites — same iteration, same element access, same remainder handling.
Left on
chunks_exact:Package.rsResolutionThe
Resolutionloop inPackage.rskeepschunks_exact: its stride issize_of::<ResolutionType<SemverIntType>>(), which depends on the generic type parameterSemverIntTypeofload_fields. A const-generic argument that uses a type parameter requires the unstablegeneric_const_exprs, which theinstallcrate does not enable. A comment in the code records this. This is now the only remainingchunks_exactinsrc/+scripts/.Tests
Byte-identical codegen change (fixed-length arrays elide bounds checks; same output), verified equivalent to the
chunks_exactoriginals across empty/partial/remainder/high-byte edge cases. Coverage:test/js/node/buffer-utf16.test.ts(new cases) — theBuffer.from(latin1, "utf16le"|"ucs2")widening loop (encoding.rs). Empty, single byte, high-byte zero-extension, the full0x00-0xFFrange, a 1000-char multi-chunk round-trip; parameterized over the four encoding aliases withdescribe.each.TextEncoder latin1 ASCII fast path boundaries(existing, intext-encoder.test.js) — directly exercisescopy_ascii_prefix: all-ASCII strings across the SWAR thresholds and a non-ASCII byte at every 8-byte boundary, plusencodeIntotruncation cases.test/cli/install/bun-lockb-field-validation.test.ts(new) — thePackage.rsMeta/Binvalidation loops, exercised offline (afile:tarball dependency, re-parsed viainstall_test_helpers.parseLockfile— no registry). Valid lockfile round-trips;meta.originflipped out of range is rejected with"invalid package meta".scripts/verify-baseline-static/src/main.rs(new unit tests) —scan_aarch64: a post-baseline word is flagged, each 4-byte chunk maps tosec_addr + i*4, and a trailing partial word is dropped.Verified with
bun bd(bundler_minify, image GIF-alpha, buffer-utf16, TextEncoder incl. the ASCII-fast-path block, websocket-client, migrate-bun-lockb-v2, and both new JS test files) andcargo testfor the scanner tool.