Skip to content

perf: use as_chunks* instead of chunks_exact* for static chunk sizes#31415

Open
robobun wants to merge 7 commits into
mainfrom
claude/31414/as-chunks-static-size
Open

perf: use as_chunks* instead of chunks_exact* for static chunk sizes#31415
robobun wants to merge 7 commits into
mainfrom
claude/31414/as-chunks-static-size

Conversation

@robobun

@robobun robobun commented May 26, 2026

Copy link
Copy Markdown
Collaborator

Closes #31414.

Ask

From #31414 (@CrazyboyQCD): chunks_exact* yields runtime-length &[T] slices, while as_chunks* returns a slice of fixed-length arrays &[[T; N]] (plus a remainder). With the compile-time array length, the compiler knows each chunk[i] access for i in 0..N is in bounds and can elide the per-element bounds checks that chunks_exact leaves in.

slice::as_chunks / as_chunks_mut are stable since Rust 1.88, so they're available on the pinned toolchain (nightly-2026-05-06) as ordinary APIs.

Changes

Converted every site in the repo where the chunk size is a compile-time constant:

File Before After
ast/char_freq.rs chunks.chunks_exact(SCAN_BIG_CHUNK_SIZE) (const = 32); inner loop indexes chunk[i] for i in 0..32 text.as_chunks::<SCAN_BIG_CHUNK_SIZE>()chunk: &[u8; 32]. Drops the now-redundant manual split_at/unrolled since as_chunks returns the tail remainder directly
runtime/image/codecs.rs d.rgba.chunks_exact_mut(4) (RGBA pixels) d.rgba.as_chunks_mut::<4>().0px: &mut [u8; 4]
runtime/webcore/encoding.rs to.chunks_exact_mut(2).zip(input_slice) + copy_from_slice (Latin-1 to UTF-16 widening) to.as_chunks_mut::<2>().0.iter_mut().zip(...)out: &mut [u8; 2], so *out = u16::from(b).to_ne_bytes() replaces copy_from_slice
bun_core/lib.rs (copy_ascii_prefix) dst.chunks_exact_mut(8).zip(src.chunks_exact(8)) + u64::from_ne_bytes(s.try_into().expect(...)) (SWAR ASCII fast path) dst.as_chunks_mut::<8>().0.iter_mut().zip(src.as_chunks::<8>().0)s: &[u8; 8], so u64::from_ne_bytes(*s) drops the fallible try_into
http_jsc/websocket_client/WebSocketUpgradeClient.rs self.slices.chunks_exact(2).map(...) (header name/value pairs) self.slices.as_chunks::<2>().0.iter().map(...)pair: &[Utf8Slice; 2]
install/lockfile/Package.rs (Meta, Bin blocks) src.chunks_exact(stride) with runtime stride/offsets src.as_chunks::<STRIDE>().0 with const STRIDE/const ORIGIN_AT/const TAG_AT — both the chunk length and the element indices are static
scripts/verify-baseline-static/src/main.rs bytes.chunks_exact(4) over ARM64 instruction words bytes.as_chunks::<4>().0chunk: &[u8; 4], so u32::from_le_bytes(*chunk) replaces the chunk[0..4] indexing

These are semantics-preserving rewrites — same iteration, same element access, same remainder handling.

Left on chunks_exact: Package.rs Resolution

The Resolution loop in Package.rs keeps chunks_exact: its stride is size_of::<ResolutionType<SemverIntType>>(), which depends on the generic type parameter SemverIntType of load_fields. A const-generic argument that uses a type parameter requires the unstable generic_const_exprs, which the install crate does not enable. A comment in the code records this. This is now the only remaining chunks_exact in src/ + scripts/.

Tests

Byte-identical codegen change (fixed-length arrays elide bounds checks; same output), verified equivalent to the chunks_exact originals across empty/partial/remainder/high-byte edge cases. Coverage:

  • test/js/node/buffer-utf16.test.ts (new cases) — the Buffer.from(latin1, "utf16le"|"ucs2") widening loop (encoding.rs). Empty, single byte, high-byte zero-extension, the full 0x00-0xFF range, a 1000-char multi-chunk round-trip; parameterized over the four encoding aliases with describe.each.
  • TextEncoder latin1 ASCII fast path boundaries (existing, in text-encoder.test.js) — directly exercises copy_ascii_prefix: all-ASCII strings across the SWAR thresholds and a non-ASCII byte at every 8-byte boundary, plus encodeInto truncation cases.
  • test/cli/install/bun-lockb-field-validation.test.ts (new) — the Package.rs Meta/Bin validation loops, exercised offline (a file: tarball dependency, re-parsed via install_test_helpers.parseLockfile — no registry). Valid lockfile round-trips; meta.origin flipped out of range is rejected with "invalid package meta".
  • scripts/verify-baseline-static/src/main.rs (new unit tests) — scan_aarch64: a post-baseline word is flagged, each 4-byte chunk maps to sec_addr + i*4, and a trailing partial word is dropped.

Verified with bun bd (bundler_minify, image GIF-alpha, buffer-utf16, TextEncoder incl. the ASCII-fast-path block, websocket-client, migrate-bun-lockb-v2, and both new JS test files) and cargo test for the scanner tool.

@coderabbitai

coderabbitai Bot commented May 26, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 85e7810e-254f-45c9-95d0-223927439d46

📥 Commits

Reviewing files that changed from the base of the PR and between 8e52c2b and 8f154c7.

📒 Files selected for processing (5)
  • scripts/verify-baseline-static/src/main.rs
  • src/bun_core/lib.rs
  • src/install/lockfile/Package.rs
  • test/cli/install/bun-lockb-field-validation.test.ts
  • test/js/node/buffer-utf16.test.ts

Walkthrough

Replaces runtime chunks_exact* with as_chunks*/as_chunks_mut* fixed-size array iteration across scanners, image/GIF normalization, WebSocket header iteration, UTF-16 writes, and an ASCII fast-path; hardens lockfile Meta/Bin deserialization with const stride/offset checks and adds unit and integration tests.

Changes

Compile-time chunk iteration optimization

Layer / File(s) Summary
Core chunk-iteration rewrites
src/ast/char_freq.rs, src/runtime/image/codecs.rs, src/bun_core/lib.rs, scripts/verify-baseline-static/src/main.rs, src/runtime/webcore/encoding.rs, src/http_jsc/websocket_client/WebSocketUpgradeClient.rs
Replaced dynamic chunks_exact* loops with as_chunks*/as_chunks_mut* across big-text scanning, GIF RGBA pixel normalization, 8-byte ASCII copy fast-path, ARM64 word decoding (now using u32::from_le_bytes(*chunk)), UTF-16 output writes, and WebSocket header pair iteration; added ARM64 scanner unit tests.
Lockfile package Meta and Bin validation
src/install/lockfile/Package.rs
Package deserialization now iterates raw records with const STRIDE via as_chunks::<STRIDE>(), validates discriminant bytes (Meta.origin, Meta.has_install_script, Bin.tag) against allowed ranges before copying into typed columns; expands Resolution comments on const-generic limitations.
Binary lockfile field validation test suite
test/cli/install/bun-lockb-field-validation.test.ts
Adds tests that install a local tarball dependency with text lockfiles disabled, assert parseLockfile succeeds, and assert parse failures when meta.origin or bin.tag bytes are corrupted.
Latin-1 to UTF-16 buffer widening tests
test/js/node/buffer-utf16.test.ts
Adds parameterized tests over utf16le/utf-16le/ucs2/ucs-2 validating Buffer.from(latin1, encoding) widening for empty, single-byte, high-byte zero-extension, full-range expansion, and long-input chunking/round-trip.
  • Possibly related PRs:
    • oven-sh/bun#31385: Touches src/bun_core/lib.rs ASCII fast-path logic related to this PR's string-copy changes.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and specifically describes the main change: replacing chunks_exact* with as_chunks* for static chunk sizes across the codebase.
Description check ✅ Passed The PR description is comprehensive, covering the ask, detailed changes per file, rationale for leaving chunks_exact in one case, and extensive test coverage.
Linked Issues check ✅ Passed All code changes align with issue #31414 objectives: as_chunks* replacing chunks_exact* at static-size sites, semantics preservation, justification for leaving Resolution unconverted, and test coverage across edge cases.
Out of Scope Changes check ✅ Passed All changes directly address the performance optimization objective of issue #31414 by converting static-size chunks_exact* to as_chunks* and adding supporting tests; no unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@robobun

robobun commented May 26, 2026

Copy link
Copy Markdown
Collaborator Author

@robobun

robobun commented May 26, 2026

Copy link
Copy Markdown
Collaborator Author

Added test coverage in test/js/node/buffer-utf16.test.ts (commit a182729).

Since this is a byte-identical codegen change (as_chunks yields fixed-length arrays so the per-element bounds checks are elided — same iteration, same output), there's no before/after behavioral delta to assert. Instead the new cases pin the exact output of the Buffer.from(latin1, "utf16le"|"ucs2") widening loop that the encoding.rs site rewrote — the path reached when the source string is 8-bit/Latin-1 (code points ≤ U+00FF), which routes through Bun__encoding__constructFromLatin1:

  • empty input, single byte, high-byte zero-extension (0x80-0xFFff 00, not ff ff)
  • the full 0x00-0xFF range → 512-byte buffer
  • a 1000-char input spanning many 2-byte chunks that round-trips back through toString("utf16le")

Verified the byte-for-byte equivalence of all four conversions (char_freq 32-byte, codecs 4-byte RGBA, encoding 2-byte, websocket 2-element) against their chunks_exact originals across empty/partial/remainder/high-byte edge cases before landing.

@CrazyboyQCD

This comment was marked as resolved.

Comment thread test/js/node/buffer-utf16.test.ts Outdated
@robobun

robobun commented May 26, 2026

Copy link
Copy Markdown
Collaborator Author

Good catch — you're right that mem::size_of is const. I looked at all three Package.rs sites:

  • Meta and Bin — concrete types, so size_of::<Meta>()/size_of::<Bin>() (and the offset_of! field offsets) are compile-time constants. Converted both to as_chunks::<STRIDE>().0 with const STRIDE/const ORIGIN_AT/const TAG_AT, so the element indices are static too.
  • Resolution (the block you linked) — its stride is size_of::<ResolutionType<SemverIntType>>(), and SemverIntType is a generic parameter of load_fields. A const-generic argument that depends on a type parameter needs the unstable generic_const_exprs, which the install crate doesn't enable:
    error: generic parameters may not be used in const operations
       |
       |     for raw in src.as_chunks::<{ mem::size_of::<T>() }>().0 {}
       |                                                 ^ cannot perform const operation using `T`
       = help: add `#![feature(generic_const_exprs)]` to allow generic const expressions
    
    So that one stays on chunks_exact (with a comment explaining why). Pushed in 8176948.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/js/node/buffer-utf16.test.ts`:
- Around line 41-79: Replace the manual for...of loop that iterates the encoding
array with a Jest describe.each(...) table so tests are parameterized;
specifically, change the construct that uses for (const encoding of ["utf16le",
"utf-16le", "ucs2", "ucs-2"] as const) { describe(encoding, () => { ... }) } to
describe.each([...])('encoding %s', (encoding) => { ... }) and move the current
inner tests (the empty string, single byte, high bytes, every Latin-1 byte, and
long input tests that call Buffer.from(..., encoding) and
buf.toString("utf16le")) into that callback so each encoding value runs the same
suite. Ensure no other test logic is modified.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d024782c-50e2-4d3d-9c93-7f5c014265b6

📥 Commits

Reviewing files that changed from the base of the PR and between 146afa3 and 8176948.

📒 Files selected for processing (7)
  • src/ast/char_freq.rs
  • src/http_jsc/websocket_client/WebSocketUpgradeClient.rs
  • src/install/lockfile/Package.rs
  • src/runtime/image/codecs.rs
  • src/runtime/webcore/encoding.rs
  • test/cli/install/bun-lockb-field-validation.test.ts
  • test/js/node/buffer-utf16.test.ts

Comment thread test/js/node/buffer-utf16.test.ts Outdated
@CrazyboyQCD

This comment was marked as resolved.

Comment thread test/cli/install/bun-lockb-field-validation.test.ts
@robobun

robobun commented May 26, 2026

Copy link
Copy Markdown
Collaborator Author

Done — converted scan_aarch64 in scripts/verify-baseline-static to as_chunks::<4>().0 (chunk: &[u8; 4], so u32::from_le_bytes(*chunk) replaces the chunk[0..4] indexing; the trailing-byte drop is preserved). Added unit tests for the scanning loop — post-baseline word flagged, per-chunk address mapping, and trailing-partial-word drop. Pushed in b23559e.

That's the last static-size chunks_exact in the repo — the only remaining one is the Package.rs Resolution loop (generic stride, needs generic_const_exprs).

@CrazyboyQCD

This comment was marked as resolved.

@robobun robobun force-pushed the claude/31414/as-chunks-static-size branch from b23559e to c570f1b Compare May 26, 2026 02:34
@robobun

robobun commented May 26, 2026

Copy link
Copy Markdown
Collaborator Author

Done — converted copy_ascii_prefix in bun_core/lib.rs to as_chunks_mut::<8>() / as_chunks::<8>(). Since s is now &[u8; 8], the word load becomes u64::from_ne_bytes(*s) and the fallible .try_into().expect("infallible: size matches") goes away.

This site came from #31385 (landed after my branch point), so I rebased onto current main to pick it up. It's already covered by that PR's TextEncoder latin1 ASCII fast path boundaries tests (all-ASCII across the SWAR thresholds + a non-ASCII byte at every 8-byte boundary) — 33/33 green on my build. Pushed in c570f1b.

With this, every convertible static-size chunks_exact in src/ + scripts/ is done; the only one left is the Package.rs Resolution loop (generic stride → needs unstable generic_const_exprs).

@CrazyboyQCD

This comment was marked as duplicate.

@robobun robobun force-pushed the claude/31414/as-chunks-static-size branch from c570f1b to 81df47a Compare May 26, 2026 02:49
@robobun

robobun commented May 26, 2026

Copy link
Copy Markdown
Collaborator Author

Applied — thanks, that's cleaner. Switched the tail to iterate the as_chunks remainder (dst_remainder/src_remainder) directly instead of reslicing dst/src from copied, and used *d = word.to_ne_bytes() for the word store. Verified byte-identical across all lengths and non-ASCII positions; the TextEncoder latin1 ASCII fast path boundaries suite is still 33/33. Folded into the copy_ascii_prefix commit (81df47a).

@CrazyboyQCD

This comment was marked as resolved.

@robobun robobun force-pushed the claude/31414/as-chunks-static-size branch from 81df47a to 82596da Compare May 26, 2026 03:04
@robobun

robobun commented May 26, 2026

Copy link
Copy Markdown
Collaborator Author

Good catch — no, the remainder-based tail would not match the original when dst and src have different chunk counts. dst_remainder/src_remainder each start at their own slice'''s chunk boundary, so if (say) dst has more chunks than src, they begin at different offsets; the original dst[copied..].zip(&src[copied..]) instead starts both at copied and lets zip stop at the shorter tail.

In practice the contract is dst.len() == src.len() — the only caller (copy_latin1_into_utf8_stop_on_non_ascii) slices both to the same n — so the chunk counts always match and the two are equivalent (verified byte-identical across all equal-length cases). But the original'''s reslice-from-copied was defensively correct for free, so I reverted the tail to that form while keeping the as_chunks chunk-loop wins (array load, no try_into, array store), and added a comment explaining the offset divergence. Pushed in 82596da; TextEncoder latin1 ASCII fast path boundaries still 33/33.

@CrazyboyQCD

This comment was marked as resolved.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/cli/install/bun-lockb-field-validation.test.ts`:
- Around line 63-88: Add a new sibling test in the same file that mirrors the
Meta.origin test but corrupts a Bin.tag byte and asserts parseLockfile throws
the "invalid bin tag" path; use the same helpers (tempDir, installFileDep,
metaColumn, parseLockfile) to create the test fixture, locate the lockb via
metaColumn, compute the byte offset for the Bin.tag within a record (targeting
the appropriate byte in the 88-byte record), flip it to an out-of-range value
(e.g. 0x42), write the file back, and expect parseLockfile(String(dir))
toThrow("Lockfile validation failed: invalid bin tag").
- Around line 37-40: The test currently treats any fmt other than 2 as a 72-byte
resolution record which can silently corrupt tests; change the logic that
computes resolutionSize (and any downstream use of fmt) to explicitly handle
known formats and throw on unsupported ones: validate that fmt === 2 (and add
other explicit branches if you support more formats) and throw a clear error if
fmt is unexpected instead of defaulting to 72; update the resolutionSize
assignment (and any dependent variables) to only set when the format is
recognized so failures surface immediately.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f1be0bd1-9b70-4bb6-8509-a3e23c1d53d0

📥 Commits

Reviewing files that changed from the base of the PR and between 8176948 and 82596da.

📒 Files selected for processing (9)
  • scripts/verify-baseline-static/src/main.rs
  • src/ast/char_freq.rs
  • src/bun_core/lib.rs
  • src/http_jsc/websocket_client/WebSocketUpgradeClient.rs
  • src/install/lockfile/Package.rs
  • src/runtime/image/codecs.rs
  • src/runtime/webcore/encoding.rs
  • test/cli/install/bun-lockb-field-validation.test.ts
  • test/js/node/buffer-utf16.test.ts

Comment thread test/cli/install/bun-lockb-field-validation.test.ts Outdated
Comment thread test/cli/install/bun-lockb-field-validation.test.ts
@robobun robobun force-pushed the claude/31414/as-chunks-static-size branch from 82596da to af6ee00 Compare May 26, 2026 03:18
@robobun

robobun commented May 26, 2026

Copy link
Copy Markdown
Collaborator Author

Agreed — the debug_assert_eq!(dst.len(), src.len()) guarantees equal chunk counts, so the remainder form is sound. Restored it (iterating dst_remainder/src_remainder), and kept a one-line comment noting the equal-length invariant so the offset question is answered inline. Pushed in af6ee00; ASCII-fast-path tests still 33/33.

@robobun

robobun commented May 26, 2026

Copy link
Copy Markdown
Collaborator Author

Note for maintainers on test coverage

This PR is a behavior-preserving change — it swaps chunks_exact*(N) for as_chunks*::<N>() so the compiler sees fixed-length arrays and can elide per-element bounds checks. The output is byte-for-byte identical before and after (verified across empty/partial/remainder/high-byte edge cases for every converted site).

Because of that, the included tests can't be written to fail on the pre-change code and pass on the post-change code — the two produce the same result by construction. Instead they pin the exact observable behavior of the converted code paths so any future regression in those loops is caught:

  • test/js/node/buffer-utf16.test.ts + the existing TextEncoder latin1 ASCII fast path boundaries suite exercise the encoding.rs widening loop and bun_core/lib.rs copy_ascii_prefix SWAR loop.
  • test/cli/install/bun-lockb-field-validation.test.ts exercises the Package.rs Meta/Bin validation loops (both the accept and the invalid package meta / invalid bin tag reject paths), offline via a file: tarball dep + parseLockfile.
  • scan_aarch64 unit tests cover the scanner's 4-byte-word loop.

cargo clippy, cargo miri test, and Lint JavaScript are green on the diff. Every convertible static-size chunks_exact in src/ + scripts/ is converted; only Package.rs Resolution is left (its stride depends on a generic type param → would need unstable generic_const_exprs).

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No bugs found and all review nits are addressed, but this touches seven hot-path sites (SWAR ASCII copy, UTF-16 widening, lockfile field validation, char-freq scan, image RGBA, WS headers, aarch64 scanner) plus a new test that hardcodes bun.lockb binary offsets — worth a quick human pass, especially on the copy_ascii_prefix remainder-iteration invariant and the musl build status.

Extended reasoning...

Overview

Mechanical chunks_exact*as_chunks* conversion across 7 Rust sites where the chunk size is a compile-time constant, so the compiler can elide per-element bounds checks. Touches src/bun_core/lib.rs (SWAR ASCII fast path), src/runtime/webcore/encoding.rs (Latin-1→UTF-16 widening), src/install/lockfile/Package.rs (Meta/Bin discriminant validation), src/ast/char_freq.rs, src/runtime/image/codecs.rs, src/http_jsc/websocket_client/WebSocketUpgradeClient.rs, and scripts/verify-baseline-static/src/main.rs. Adds two new test files and Rust unit tests for the scanner.

Security risks

None identified. The lockfile validation logic is unchanged (only let strideconst STRIDE and chunks_exactas_chunks); discriminant range checks are byte-identical. No auth/crypto/permissions surface.

Level of scrutiny

Medium. Each conversion is individually trivial and semantics-preserving, but in aggregate this is a cross-cutting change to performance-critical runtime code. Two specifics push it past my auto-approve threshold:

  • copy_ascii_prefix remainder handling (bun_core/lib.rs): the PR thread had a real correctness back-and-forth — iterating dst_remainder/src_remainder is only equivalent to the original dst[copied..]/src[copied..] reslice when dst.len() == src.len(). That holds via the debug_assert_eq! and the sole caller's contract, but it's an invariant a maintainer should consciously sign off on.
  • bun-lockb-field-validation.test.ts hardcodes binary-format offsets (header fields at 42/86/110, META_SIZE = 88, BIN_SIZE = 20, column layout). It throws on unknown format versions, which is good, but someone who owns the lockfile format should confirm these constants and that this test won't be a maintenance trap.

Other factors

All prior inline comments (mine and CodeRabbit's) are resolved. The issue reporter actively reviewed and shaped the final copy_ascii_prefix form. No CODEOWNERS coverage on the touched paths. The last CI snapshot in the thread (af6ee00) showed x64-musl build failures; three commits have landed since and I can't confirm green CI from here.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any correctness issues — each chunks_exactas_chunks conversion is semantics-preserving and the one subtle case (copy_ascii_prefix remainder handling) is sound given the dst.len() == src.len() invariant. Deferring to a maintainer because this touches seven distinct hot paths (SWAR ASCII copy, UTF-16 widening, lockfile field validation, char-freq scanner, RGBA normalize, WS headers, ARM64 scanner) and Build #58126 is currently red.

Extended reasoning...

Overview

Mechanical refactor replacing slice.chunks_exact*(N) with slice.as_chunks*::<N>() at every call site where N is a compile-time constant, so chunk elements become &[T; N] and per-element bounds checks are elided. Seven Rust sites across src/ + scripts/, plus new test coverage (buffer-utf16.test.ts cases, bun-lockb-field-validation.test.ts, and #[cfg(test)] unit tests for scan_aarch64). The one remaining chunks_exact (Package.rs Resolution) is documented as blocked on generic_const_exprs.

Security risks

None introduced. The lockfile Meta/Bin validation loops still check the same discriminant bytes at the same offset_of!-derived offsets before reinterpreting raw bytes as #[repr(u8)] enums; the only change is let stride = size_of()const STRIDE: usize = size_of(). No new unsafe, no parser surface widened.

Level of scrutiny

Medium-high. Each individual rewrite is trivially equivalent, but the set spans several performance-critical paths: copy_ascii_prefix (the SWAR fast path for TextEncoder/Latin-1→UTF-8), construct_from_u8's UTF-16 widening (every Buffer.from(str, 'utf16le')), scan_big (minifier char-frequency), and binary-lockfile field validation. The copy_ascii_prefix change in particular went through three iterations on this PR over whether iterating dst_remainder/src_remainder is equivalent to reslicing from copied — it is, but only because debug_assert_eq!(dst.len(), src.len()) holds (confirmed at src/bun_core/lib.rs:1973), and that reasoning is worth a maintainer's second look.

Other factors

  • The issue author actively reviewed across several rounds and signed off on the final copy_ascii_prefix shape.
  • All prior inline comments (mine and CodeRabbit's) are resolved; the bug-hunting pass found nothing.
  • robobun reports Build #58126 failing on commit 5ef211a, with a ci: retrigger commit (044e995) on top — CI should be green before merge.
  • The new bun-lockb-field-validation.test.ts hardcodes binary-lockfile layout offsets (header fields at 42/86/110, META_SIZE=88, BIN_SIZE=20); the test self-guards with a sanity expect(...).toBeLessThanOrEqual(...) before corrupting, and throws on unknown format versions, so layout drift will fail loudly rather than silently.

@robobun

robobun commented May 26, 2026

Copy link
Copy Markdown
Collaborator Author

Why there's no fail-before test (for reviewers)

This is a behavior-preserving refactor: chunks_exact*(N)as_chunks*::<N>() at compile-time-constant N. The compiler gains a fixed-length array so it can elide per-element bounds checks, but the runtime output is byte-for-byte identical before and after — verified exhaustively across empty / partial / remainder / high-byte edge cases for all seven sites (and the two Package.rs validation loops reject the same discriminant bytes either way).

A consequence worth stating plainly: no test can be written that fails on the pre-change code and passes on the post-change code, because there is no observable difference to detect. The included tests instead pin the exact behavior of every converted loop so a future regression is caught:

  • test/js/node/buffer-utf16.test.ts + the existing TextEncoder latin1 ASCII fast path boundaries suite cover encoding.rs widening and bun_core/lib.rs copy_ascii_prefix.
  • test/cli/install/bun-lockb-field-validation.test.ts covers the Package.rs Meta/Bin accept + reject paths (invalid package meta / invalid bin tag), offline.
  • scan_aarch64 unit tests cover the scanner's 4-byte-word loop.

So a fully green CI plus these equivalence tests is the strongest guarantee available for this class of change; there's nothing further I can add on the test side without introducing an artificial behavioral difference (which would defeat the refactor). cargo clippy / cargo miri test / Lint JavaScript are green on the diff, all CodeRabbit pre-merge checks pass, and the bug-hunting review found no issues — this just needs a maintainer to make the merge call.

@robobun

robobun commented May 26, 2026

Copy link
Copy Markdown
Collaborator Author

CI status

The diff is green — the only red lane is darwin-14-aarch64-test-bun, which shows "Expired" (a CI agent timeout, not a test failure) on two consecutive builds (58126 and 58158). Every other lane passes, including the sibling darwin-26-aarch64-test-bun, darwin-14-x64-test-bun, and the ASAN lane debian-13-x64-asan-test-bun — so this is a flaky/expired runner, not anything in this behavior-preserving refactor. cargo clippy, cargo miri test, and Lint JavaScript are all green.

I've used my one CI re-roll already (the ci: retrigger commit), so I won't re-push again to avoid commit spam. A maintainer re-running that one expired lane (or merging past it) should be all that's needed.

robobun added 2 commits June 5, 2026 09:30
`slice::as_chunks::<N>()` / `as_chunks_mut::<N>()` return a slice of
fixed-length arrays `&[[T; N]]` plus a remainder, whereas
`chunks_exact(n)` yields runtime-length `&[T]` slices. With the
compile-time array length the compiler knows each `chunk[i]` access for
`i in 0..N` is in bounds and elides the per-element bounds checks that
`chunks_exact` leaves in.

Converted the four sites where the chunk size is a compile-time constant:

- ast/char_freq.rs: `scan_big` histogram loop over 32-byte chunks. Drops
  the now-redundant manual `split_at` since `as_chunks` returns the tail
  remainder directly.
- runtime/image/codecs.rs: GIF alpha normalization over 4-byte RGBA pixels.
- runtime/webcore/encoding.rs: Latin-1 -> UTF-16 widening over 2-byte
  chunks; the array element lets `*out = ...to_ne_bytes()` replace
  `copy_from_slice`.
- http_jsc/websocket_client/WebSocketUpgradeClient.rs: header name/value
  pair iteration over 2-element chunks.

The `chunks_exact(stride)` sites in install/lockfile/Package.rs are left
as-is: `stride` is a runtime `mem::size_of::<T>()` value, not a const
generic, so it can't use `as_chunks`.
Extends the existing buffer-utf16 test with explicit coverage of the
`Buffer.from(latin1String, "utf16le" | "ucs2")` widening loop that the
`as_chunks::<2>` conversion in encoding.rs rewrote.

The string stays Latin-1-backed (8-bit) for code points <= U+00FF, which
routes through `Bun__encoding__constructFromLatin1` and the byte-pair
widening loop. Cases: empty input, single byte, high-byte zero-extension
(0x80-0xFF must become `ff 00`, not `ff ff`), the full 0x00-0xFF range,
and a 1000-char input that spans many chunks and round-trips back through
`toString("utf16le")` — the long/multi-chunk cases exercise the rewritten
loop across every chunk.

This is a behavior-preserving codegen change (as_chunks yields fixed-length
arrays so per-element bounds checks are elided), so the assertions pin the
exact output rather than a before/after behavioral delta.
@robobun robobun force-pushed the claude/31414/as-chunks-static-size branch from 044e995 to 8e52c2b Compare June 5, 2026 09:39
@robobun

robobun commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator Author

Rebased onto current main (the branch had fallen ~118 commits behind). Conflicts were trivial: only src/ast/char_freq.rs needed a manual resolution — main had reworded a PERF comment on the line adjacent to my split_at/chunks_exactas_chunks change, so I took main's comment and kept my code change. The other six touched files auto-merged cleanly. All seven conversion sites are intact (and Package.rs Resolution remains on chunks_exact as documented). Rebuilt with bun bd and re-ran the suites: buffer-utf16 (21), bun-lockb-field-validation (3), TextEncoder incl. the ASCII-fast-path block (42), and the scanner cargo tests (14) — all green.

Comment thread test/cli/install/bun-lockb-field-validation.test.ts
robobun added 5 commits June 5, 2026 09:52
Converts the `Meta` and `Bin` raw-byte validation loops in
`Package::load_fields` from `chunks_exact(stride)` to
`as_chunks::<STRIDE>().0`. Both types are concrete, so `size_of` and the
`offset_of!` field offsets are promoted to `const` — the chunk length and
the element indices (`raw[ORIGIN_AT]`, `raw[TAG_AT]`, ...) are then all
statically known and their bounds checks are elided.

The adjacent `Resolution` loop stays on `chunks_exact`:
`size_of::<ResolutionType<SemverIntType>>()` depends on the generic
`SemverIntType`, and a const-generic argument that uses a type parameter
requires the unstable `generic_const_exprs`, which this crate does not
enable. A comment documents this.

Adds `bun-lockb-field-validation.test.ts`, which exercises the loader
offline (a `file:` tarball dependency, parsed back via
`install_test_helpers.parseLockfile`, so no registry is needed):
- a valid binary lockfile round-trips (the meta/bin validation loops run
  over real column bytes and accept them);
- a lockfile with `meta.origin` flipped to an out-of-range byte is
  rejected with "invalid package meta" — the per-element check inside the
  converted `as_chunks` loop.
Replaces the 1000-iteration per-byte .toBe loop (2000 assertions per
encoding) with one .toEqual Buffer comparison, per test/CLAUDE.md's
preference for .toEqual over many .toBe calls. Also gives a byte-level
diff on failure instead of a bare index mismatch.
Per test conventions, parameterized tests use describe.each() rather than
a manual for...of loop.
`scan_aarch64` iterates `.text` in 4-byte ARM64 instruction words. Switch
`chunks_exact(4)` to `as_chunks::<4>().0`, which yields `&[u8; 4]` words so
the per-element byte accesses are statically in bounds; the word decode then
becomes `u32::from_le_bytes(*chunk)` instead of indexing `chunk[0..4]`. The
trailing-1-3-byte drop is preserved (the remainder we ignore).

Adds unit tests for `scan_aarch64` covering the converted loop: a
post-baseline word is flagged as a violation, each 4-byte chunk maps to
`sec_addr + i*4`, and a trailing partial word is dropped.
The scalar 8-byte ASCII fast path in `copy_ascii_prefix` reconstructed each
word via `u64::from_ne_bytes(s.try_into().expect(...))` over
`chunks_exact_mut(8)`/`chunks_exact(8)`. Switching to
`as_chunks_mut::<8>()`/`as_chunks::<8>()` yields `&[u8; 8]` words plus an
equal-length remainder (the two slices are the same length, asserted at the
top of the function), so:
- the word load is `u64::from_ne_bytes(*s)` with no fallible `try_into`,
- the store is a fixed-size array assignment `*d = word.to_ne_bytes()`,
- the sub-word tail iterates the `as_chunks` remainders directly.

Covered by the existing `TextEncoder latin1 ASCII fast path boundaries`
tests, which encode all-ASCII strings across the SWAR thresholds and place
a non-ASCII byte at every 8-byte boundary — exercising both the word loop
and its early-out.
@robobun robobun force-pushed the claude/31414/as-chunks-static-size branch from 8e52c2b to 8f154c7 Compare June 5, 2026 09:52

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any issues — each chunks_exactas_chunks conversion is a clean semantics-preserving rewrite with good test coverage — but since this spans seven production hot paths (core SWAR string copy, Buffer encoding, the binary lockfile loader, bundler char-freq, image/GIF, WebSocket headers) and the copy_ascii_prefix remainder handling needed an invariant discussion to settle, it's worth a maintainer sign-off.

Extended reasoning...

Overview

This PR mechanically converts every static-chunk-size chunks_exact*(N) call site in src/ and scripts/ to as_chunks*::<N>(), so the compiler sees fixed-length &[u8; N] arrays and can elide per-element bounds checks. Seven production Rust files are touched: bun_core/lib.rs (copy_ascii_prefix SWAR loop), runtime/webcore/encoding.rs (Latin-1→UTF-16 widening), install/lockfile/Package.rs (Meta/Bin discriminant validation), ast/char_freq.rs (32-byte scan), runtime/image/codecs.rs (GIF α=0 normalization), http_jsc/.../WebSocketUpgradeClient.rs (header-pair iter), and the verify-baseline-static build tool. Two test files (one new) pin the observable behavior of the converted loops.

Security risks

None introduced. The lockfile Meta/Bin validation loops parse untrusted on-disk bytes, but the conversion preserves the exact same per-record discriminant range checks (now with const STRIDE/const OFFSET instead of let); the new bun-lockb-field-validation.test.ts exercises both accept and reject paths. No new unsafe blocks; one try_into().expect() is removed in favor of a direct array deref.

Level of scrutiny

Moderate. Each individual rewrite is a textbook as_chunks upgrade and is semantics-preserving by construction (verified byte-identical per the PR thread). However, the changes fan out across several critical hot paths in the runtime — core string SWAR (copy_ascii_prefix), Buffer encoding, and the binary lockfile parser — and the copy_ascii_prefix tail-remainder change went through a correctness back-and-forth before settling on the debug_assert_eq!(dst.len(), src.len()) invariant that makes the remainder form sound. That's exactly the kind of subtlety a maintainer familiar with the callers should confirm.

Other factors

All prior review comments (my three nits, CodeRabbit's four, and the issue author's three follow-up requests) are addressed and resolved. The bug-hunting system found nothing. CI is green except an expired darwin runner (infra). The new lockfile test hardcodes binary layout offsets (42/86/110, 88-byte Meta, 20-byte Bin) but explicitly throws on unknown format versions, so future format bumps fail loudly rather than silently. None of the touched files are CODEOWNERS-covered. Given the breadth across critical paths, I'm deferring rather than auto-approving.

@robobun

robobun commented Jun 5, 2026

Copy link
Copy Markdown
Collaborator Author

CI: the windows-x64-baseline-verify-baseline failure is unrelated to this diff

This PR's only change to verify-baseline-static is in scan_aarch64 (the ARM64 fixed-width word-decode loop: chunks_exact(4)as_chunks::<4>().0). It touches none of the x64 / iced-x86 linear-sweep path.

The failing lane is windows-x64-baseline-verify-baseline (exit status 2 = the scanner found a post-baseline instruction in the x64 Windows binary). That's determined by the Windows baseline build toolchain and the x64 code in main, not by an aarch64-only decode-loop refactor — my change cannot affect which instructions the x64 binary contains or how the x64 path scans it.

The scanner tool itself builds clean and all 14 of its unit tests pass (incl. the 3 new scan_aarch64 tests) via cargo test. So this is a real x64 baseline -march leak / unallowlisted runtime-dispatched symbol in main's Windows baseline build, surfaced here only because the rebase brought the branch up to current main. It's a separate concern from #31414 and needs whoever owns that Windows baseline regression (or an allowlist-x64-windows.txt update) — not something this refactor should carry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Perf: avoid using chunks_exact* when size is known statically

2 participants