Skip to content

Optimize TextEncoder.encode: restore SIMD ASCII fast paths lost in the Rust port#31385

Merged
Jarred-Sumner merged 3 commits into
mainfrom
farm/42e4aca0/textencoder-highway-simd
May 26, 2026
Merged

Optimize TextEncoder.encode: restore SIMD ASCII fast paths lost in the Rust port#31385
Jarred-Sumner merged 3 commits into
mainfrom
farm/42e4aca0/textencoder-highway-simd

Conversation

@robobun

@robobun robobun commented May 25, 2026

Copy link
Copy Markdown
Collaborator

What does this PR do?

TextEncoder.encode's Rust port was measurably slower than the Zig implementation it replaced (4-char ASCII ~2× slower, 12 KB ASCII ~40% slower; only the large UTF-8 case was a tie). This PR restores the lost fast paths and vectorization:

Cause

  • TextEncoder__encode8/16/encodeRopeString zero-filled a 2 KB stack buffer on every call ([0u8; 2048]) where the Zig code used undefined, then copied the output a second time from that buffer into JSC memory. For a 4-byte input the memset alone is most of the regression.
  • The 12 KB ASCII benchmark case is actually JSC rope iteration ("Hello World!".repeat(1024) stays a rope): ~1024 12-byte leaves hit copy_latin1_into_utf8_stop_on_non_ascii, whose port replaced the Zig fused @Vector(16,u8)/SWAR copy with a scan (first_non_ascii) followed by a memcpy per span — two passes plus per-leaf call overhead.
  • element_length_latin1_into_utf8 was ported as a scalar span loop instead of simdutf.length.utf8.from.latin1.
  • The small UTF-16 path paid an exact utf8_length_from_utf16le pass even when the destination buffer already provably fit the worst case (the Zig out_len shortcut was dropped).

Fix

  • New CopyAsciiPrefix highway kernel in highway_strings.cpp (same HWY_EXPORT + HWY_DYNAMIC_DISPATCH pattern as the existing kernels): fused scan+copy of the leading ASCII run that stops at the first byte ≥ 0x80 and writes only the bytes it reports. Exposed as bun_highway::copy_ascii_prefix, with per-target symbols added to the verify-baseline-static allowlists.
  • copy_latin1_into_utf8_stop_on_non_ascii now streams through a fused copy_ascii_prefix helper: SWAR u64 for short runs (rope leaves are usually a dozen bytes, so FFI dispatch isn't worth it), the highway kernel for runs ≥ 64 bytes. Only buf[..written] is ever written, same as before.
  • element_length_latin1_into_utf8 uses simdutf (scalar count for ≤ 32 bytes), matching the Zig original.
  • copy_utf16_into_utf8 skips the exact-length pass when buf.len() >= 3 * utf16.len() (worst case), mirroring the Zig out_len selection; a copy_utf16_into_utf8_with_utf8_len variant lets callers that already computed the length (to size the destination) avoid recomputing it.
  • TextEncoder__encode8 / __encode16 now encode straight into an exactly-sized, JSC-owned Uint8Array — no stack buffer, no second copy, and the former large-string path no longer hands JSC an external Vec. encode16 keeps a small 192-byte stack path for tiny strings and falls back to the allocating U+FFFD path when unpaired surrogates make the exact-size buffer insufficient. TextEncoder__encodeRopeString allocates the array up front and lets the rope iterator write each segment directly into it.

Output bytes, encodeInto read/written semantics, and the "never write past written" guarantee are unchanged.

Benchmarks

bench/snippets/text-encoder.mjs, median of 3 interleaved rounds on linux x64 (AVX2/AVX-512), comparing the last Zig-based release (1.3.9), the current Rust implementation, and this PR (both built with the release profile from this tree):

case Zig 1.3.9 Rust before Rust after
4 ascii 40.6 ns 61.8 ns 40.9 ns
4 utf8 59.4 ns 80.2 ns 59.6 ns
12 ascii 45.2 ns 66.0 ns 47.3 ns
12 utf8 71.0 ns 114.1 ns 71.2 ns
12288 ascii (rope) 16.95 µs 21.5 µs 16.58 µs
18432 utf8 10.54 µs 8.60 µs 8.37 µs

The tiny-string cases are now bounded by shared machinery (call dispatch, Uint8Array allocation), so they land at parity with Zig instead of 1.5–2× behind; the large cases win outright. Additional paths not in the stock benchmark, same methodology:

case Zig 1.3.9 Rust before Rust after
encode 12288 ascii, resolved (non-rope) string 1.98 µs 1.86 µs 1.51 µs
encode 12289 latin1 (1 non-ASCII byte) 2.69 µs 2.44 µs 2.04 µs
encodeInto 12288 ascii 838 ns 330 ns 201 ns
encode 1 MB ascii 165 µs 166 µs 161 µs

(The benchmark host is a shared/noisy container; medians of interleaved runs are reported. Ratios, not absolute numbers, are the signal.)

How did you verify your code works?

  • bun bd test test/js/web/encoding/ — all TextEncoder/TextEncoderStream/TextDecoder tests pass with the debug (ASAN) build. The only failures in that directory are the two pre-existing TextDecoder ... should not leak the output buffer RSS tests, which fail on any local ASAN debug build because the binary isn't named bun-asan (the measured delta is within the test's own ASAN allowance).
  • Added boundary coverage to test/js/web/encoding/text-encoder.test.js (10 new tests, ~10k assertions): every length around the SWAR/SIMD thresholds, a non-ASCII byte at every word/vector boundary position, encodeInto exact-fit and partial-fit behavior including "bytes past written are untouched", rope strings with >64-byte segments and with non-ASCII segments (bail path), and long UTF-16 strings with and without unpaired surrogates, all cross-checked against a pure-JS reference encoder.
  • cargo clippy -p bun_core -p bun_highway clean; websocket/inspect suites (which share the rewritten Latin-1 kernels) pass apart from tests that dial external network endpoints unavailable in the sandbox.
  • The baseline-ISA allowlist entries for the new kernel follow the existing FirstNonAscii8Impl/FillWithSkipMaskImpl ceilings; if verify-baseline-static reports different feature sets on the baseline/aarch64 CI builds I'll update the entries to match its report.
  • Note on test coverage: this is a deliberately behavior-preserving optimization, so the new tests are regression coverage for the rewritten paths (they pass before and after this change by design — there is no functional delta to assert). The before/after evidence for the optimization itself is the benchmark tables above, reproducible via bench/snippets/text-encoder.mjs.

…e Rust port

The Rust port of TextEncoder.encode was slower than the Zig implementation it
replaced: every call zero-filled a 2 KB stack buffer the Zig version left
undefined, encode8/encode16 copied the output twice (stack buffer then JSC
memory), and the Latin-1 ASCII copy loop lost its fused vector/SWAR
implementation (it re-scanned with simdutf and then memcpy'd each span, which
is especially costly for the per-leaf callbacks of the rope fast path).

- Add a CopyAsciiPrefix highway kernel (fused scan+copy of the leading ASCII
  run, dynamic dispatch like the rest of highway_strings.cpp) and expose it as
  bun_highway::copy_ascii_prefix; allowlist its per-target variants for the
  baseline ISA verifier.
- Rewrite copy_latin1_into_utf8_stop_on_non_ascii around a fused copy helper:
  SWAR u64 for short runs (rope leaves), the highway kernel for long runs,
  writing only buf[..written].
- element_length_latin1_into_utf8 now uses simdutf (as the Zig original did)
  with a scalar count for short inputs.
- copy_utf16_into_utf8 skips the exact-length pass when the destination can
  hold the 3-bytes-per-unit worst case, and a _with_utf8_len variant lets
  callers that already sized the buffer reuse that length.
- TextEncoder__encode8/encode16 now encode straight into an exactly-sized,
  JSC-owned Uint8Array (no stack buffer, no double copy); encode16 keeps a
  small 192-byte stack path for tiny strings and falls back to the allocating
  U+FFFD path for unpaired surrogates. TextEncoder__encodeRopeString allocates
  the array up front and lets the rope iterator write into it directly.
- Extend text-encoder.test.js with boundary coverage for the new paths (SWAR/
  SIMD threshold lengths, non-ASCII at word boundaries, encodeInto partial
  fits, large rope segments, long UTF-16 with unpaired surrogates).
@robobun

robobun commented May 25, 2026

Copy link
Copy Markdown
Collaborator Author
Updated 6:51 PM PT - May 25th, 2026

@robobun, your commit 83d3e32 has 1 failures in Build #58071 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 31385

That installs a local version of the PR into your bun-31385 executable, so you can run:

bun-31385 --bun

@github-actions

Copy link
Copy Markdown
Contributor

Found 1 issue this PR may fix:

  1. Switch from Zig’s @Vector to Google Highway SIMD lib #8782 - This PR adds a new Highway SIMD kernel (CopyAsciiPrefixImpl) for fused ASCII scan+copy, partially implementing the goal of adopting Google Highway for SIMD operations

If this is helpful, copy the block below into the PR description to auto-close this issue on merge.

Fixes #8782

🤖 Generated with Claude Code

@coderabbitai

coderabbitai Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@robobun, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 38 minutes and 32 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e7a7223e-230b-47f1-8f7f-e610ed1566d2

📥 Commits

Reviewing files that changed from the base of the PR and between 4f91b77 and 83d3e32.

📒 Files selected for processing (5)
  • src/bun_core/lib.rs
  • src/highway/lib.rs
  • src/jsc/bindings/highway_strings.cpp
  • src/runtime/webcore/TextEncoder.rs
  • test/js/web/encoding/text-encoder.test.js

Walkthrough

This PR adds a SIMD-accelerated CopyAsciiPrefixImpl routine that copies only the leading ASCII bytes from source to destination. The implementation is wired into the Rust encoding pipeline, refactoring UTF-16 and Latin-1 encoders to use the optimized ASCII prefix scanning. TextEncoder is refactored to allocate exact-sized buffers and use precise length computation. Symbol allowlists are updated across x64, x64-windows, and aarch64 platforms to include the new exports.

Changes

ASCII Prefix Copy SIMD Optimization

Layer / File(s) Summary
C++ SIMD implementation and dispatch
src/jsc/bindings/highway_strings.cpp
Introduces CopyAsciiPrefixImpl using Highway vector operations to scan for the first non-ASCII byte (≥ 0x7F) and copies only the ASCII prefix into dst, returning the copied byte count. Includes optimized full-vector stores, tail handling to avoid overwriting, and scalar fallback. Registered in HWY dispatch table and exposed via extern "C" highway_copy_ascii_prefix wrapper.
Rust FFI wrapper and public API
src/highway/lib.rs
Adds FFI binding for highway_copy_ascii_prefix and provides a public safe Rust wrapper copy_ascii_prefix that bounds the operation to min(src.len(), dst.len()), returns the copied count, and includes debug assertions.
Core encoding logic refactoring
src/bun_core/lib.rs
Introduces internal copy_ascii_prefix helper with SWAR/u64 optimization for short runs and highway dispatch for longer inputs. Updates copy_latin1_into_utf8_stop_on_non_ascii to fuse ASCII scanning with copying via repeated copy_ascii_prefix calls, then encode non-ASCII bytes as 2-byte UTF-8. Refactors copy_utf16_into_utf8 to compute conservative bounds and delegate to new copy_utf16_into_utf8_with_utf8_len helper. Optimizes element_length_latin1_into_utf8 with scalar fast path for short inputs and simdutf-backed computation for longer inputs.
TextEncoder implementation refactoring
src/runtime/webcore/TextEncoder.rs
Updates TextEncoder__encode8 to allocate exact-sized UTF-8 output and use ASCII fast path for byte-for-byte copy or computed UTF-8 length for non-ASCII paths. Consolidates TextEncoder__encode16 and the C entry point into shared encode16_impl helper using stack buffer for small inputs, exact-sized allocation for typical cases, and fallback for unpaired surrogates. Optimizes TextEncoder__encodeRopeString ASCII fast path to pre-allocate output buffer and write directly into backing bytes.
Test coverage for encoding paths
test/js/web/encoding/text-encoder.test.js
Adds pure-JS UTF-8 reference encoder for validation. Introduces tests for Latin-1 ASCII fast path boundaries, encodeInto() buffer sizing with non-ASCII characters, rope encoding with ASCII segments and non-ASCII character placement, and UTF-16 exact-size path tests with unpaired surrogate handling.
Symbol verification allowlists
scripts/verify-baseline-static/allowlist-aarch64.txt, scripts/verify-baseline-static/allowlist-x64-windows.txt, scripts/verify-baseline-static/allowlist-x64.txt
Updates platform-specific symbol allowlists to include CopyAsciiPrefixImpl and CopyAsciiPrefix exports across multiple SIMD tiers: aarch64 (SVE2, SVE variants), x64-windows (AVX10\_2, AVX2, AVX3, AVX3\_DL, AVX3\_SPR, AVX3\_ZEN4), and x64 (AVX SPR19, ZEN419, and other AVX variants).

Suggested reviewers

  • Jarred-Sumner
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main objective: restoring SIMD ASCII fast paths to TextEncoder.encode that were lost during the Rust port, which directly addresses the performance regressions detailed in the PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description is comprehensive and well-structured, following the required template with detailed explanations of the problem, fix, and verification.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@robobun

robobun commented May 25, 2026

Copy link
Copy Markdown
Collaborator Author

Not adding Fixes #8782 — that issue tracks the broader migration of Bun's SIMD code to Highway with runtime dispatch, and this PR only adds one more kernel (CopyAsciiPrefixImpl) to the Highway infrastructure that already exists in highway_strings.cpp. It's a step in that direction, not a completion of it.

@robobun

robobun commented May 25, 2026

Copy link
Copy Markdown
Collaborator Author

CI status for reviewers: 74 of the 76 checks are green, including every build, test, and verify-baseline lane that actually exercises this diff (linux x64/musl/baseline, linux+macOS aarch64, windows x64, ASAN, etc.).

The two remaining non-green entries are infrastructure, not code:

  • darwin-14-aarch64-test-bunExpired: no agent picked the job up, so it never ran. The same commit passed darwin-26-aarch64-test-bun and the other macOS lanes.
  • buildkite/bun (umbrella) is red only because of that expired job. windows-11-aarch64-test-bun failed once earlier and already passed on automatic retry.

Retrying just the expired job in Buildkite should turn the build fully green — I'm deliberately not pushing an empty commit for it, to avoid restarting the entire pipeline and throwing away the existing green results.


Update: the expired darwin-14-aarch64-test-bun job has since been retried and passed — every job-level check on this PR is now green (75 passing). Only the umbrella buildkite/bun status for build #57900 still reads "failed" because the build finalized before that retry completed.

@Jarred-Sumner Jarred-Sumner left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete all the code comments you added.

@Jarred-Sumner

Copy link
Copy Markdown
Collaborator

Delete all the code comments you added.

@robobun

robobun commented May 26, 2026

Copy link
Copy Markdown
Collaborator Author

Done — removed the added code comments in 83d3e32 (only the // SAFETY: annotations remain, since the workspace clippy lint undocumented_unsafe_blocks = deny requires them).

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No bugs found, but this rewrites hot-path encoding logic (new Highway SIMD kernel with overlapping tail stores, SWAR fallback, direct writes into uninitialized JSC-owned buffers sized by precomputed lengths, and a new surrogate fallback path in encode16) — worth a human pass on the buffer-sizing invariants and the rope/encode16 allocation changes.

Extended reasoning...

Overview

This PR restores SIMD fast paths in TextEncoder.encode that were lost in the Zig→Rust port. It touches eight files: a new ~50-line Highway SIMD kernel CopyAsciiPrefixImpl in highway_strings.cpp with overlapping tail-vector handling; a Rust FFI wrapper in src/highway/lib.rs; a rewrite of copy_latin1_into_utf8_stop_on_non_ascii and element_length_latin1_into_utf8 plus a new copy_utf16_into_utf8_with_utf8_len variant in src/bun_core/lib.rs; a substantial rewrite of TextEncoder__encode8/16/encodeRopeString in TextEncoder.rs to allocate exact-sized uninitialized Uint8Arrays and write directly into them; mechanical allowlist additions for the new per-target symbols; and ~170 lines of new boundary tests.

Security risks

The main risk surface is memory safety rather than classic injection/auth: the new code creates uninitialized JSC Uint8Array buffers sized by element_length_latin1_into_utf8 / element_length_utf16_into_utf8 and then relies on the encoder filling exactly that many bytes. If the length computation and the writer ever disagree, uninitialized heap bytes would be exposed to JavaScript. The Highway kernel does an overlapping StoreU at dst + (len - N) for the tail, and HWY_RESTRICT is applied to src/dst — both look correct (src and dst don't alias each other; the overlap is dst-with-dst, which restrict doesn't forbid), but this is exactly the kind of low-level invariant that benefits from a second pair of eyes. No auth, crypto, or permission code is touched.

Level of scrutiny

High. TextEncoder.encode is a Web-standard hot path called from arbitrary user JS; the change introduces hand-written SIMD with manual tail handling, SWAR bit tricks (trailing_zeros()/8 on a high-bit mask), unsafe FFI, and a new control-flow branch in encode16_impl where an under-sized exact buffer triggers a second allocation via to_utf8_alloc_with_type (the first Uint8Array is abandoned to GC). encodeRopeString now always heap-allocates up front and discards the array if a non-ASCII segment is hit. These are all plausible and well-tested, but they're design/behavior changes in a critical path, not mechanical edits.

Other factors

CI is fully green across all platforms including ASAN. The PR adds thorough boundary tests (every length around the SWAR/SIMD thresholds, non-ASCII at every vector boundary, encodeInto past-written checks, rope segments, unpaired-surrogate fallback) cross-checked against a pure-JS reference encoder. The maintainer has already engaged (requested comment removal, which was done in 83d3e32). The bug-hunting system found nothing. Given the scope and the fact that a human reviewer is already in the loop, deferring rather than auto-approving is the right call.

@Jarred-Sumner Jarred-Sumner merged commit 0b1e7e5 into main May 26, 2026
76 of 78 checks passed
@Jarred-Sumner Jarred-Sumner deleted the farm/42e4aca0/textencoder-highway-simd branch May 26, 2026 01:37
Comment thread src/bun_core/lib.rs

const HIGH_BITS: u64 = 0x8080_8080_8080_8080;
let mut copied = 0usize;
for (d, s) in dst.chunks_exact_mut(8).zip(src.chunks_exact(8)) {

@CrazyboyQCD CrazyboyQCD May 26, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robobun
Avoid using chunks_exact_* methods here, see #31415

springmin pushed a commit to springmin/bun that referenced this pull request May 26, 2026
* oven/main (10 new commits):
  Optimize TextEncoder.encode: restore SIMD ASCII fast paths lost in the Rust port (oven-sh#31385)
  js_parser: sanitize auto-generated default export name for digit-named modules (oven-sh#31403)
  fetch: run checkServerIdentity before writing the request (oven-sh#31325)
  ffi: avoid copying the threadsafe callback wrapper on the calling thread (oven-sh#31332)
  install: gate the exit-callback cache teardown to the main thread (oven-sh#31376)
  fix(node:module): don't register native helpers as their own constructors (oven-sh#31393)
  css: escape custom pseudo-class/element names when printing (oven-sh#31404)
  Deepen the lots-of-for-loop fixture so the transpiler stack-overflow tests throw on Windows (oven-sh#31382)
  Hardening: input validation and bounds tightening across 36 subsystems (round 4) (oven-sh#31339)
  Speed up FormData multipart serialization (oven-sh#31379)

Auto-merged: src/install/PackageManager.rs, src/runtime/cli/upgrade_command.rs, src/runtime/webcore/Blob.rs, src/sys/lib.rs
Jarred-Sumner pushed a commit that referenced this pull request May 26, 2026
### What does this PR do?

Restores the block-comment SIMD skip that the Rust port of the JS lexer
lost, by moving it to the same Google Highway kernels the lexer already
uses for single-line comments.

**Before this PR**

- `scan_single_line_comment` kept its SIMD path in the port
(`bun_highway::index_of_newline_or_non_ascii_or_hash_or_at`).
- `scan_multi_line_comment_body` kept the Zig structure (only take the
fast path while the current code point is ASCII and ≥ 512 bytes remain),
but the inner scan — Zig's
`skipToInterestingCharacterInMultilineComment`, a 16-byte `@Vector` loop
— was ported as a scalar byte-at-a-time loop with a `TODO(port): SIMD
reimplementation` note (`src/js_parser/lexer.rs`). Large license headers
/ JSDoc blocks were scanned one byte per iteration.

**This PR**

- Adds an `IndexOfInterestingCharacterInMultilineCommentImpl` kernel to
`src/jsc/bindings/highway_strings.cpp` (same `HWY_EXPORT` +
`HWY_DYNAMIC_DISPATCH` pattern as the existing kernels). It returns the
index of the first `*` (potential `*/` terminator), `\r`, `\n` (newline
tracking for ASI), or non-ASCII byte (so U+2028/U+2029 and other
multi-byte sequences are still decoded by the scalar path) — exactly the
byte classes the Zig `@Vector` version stopped at.
- Exposes it as
`bun_highway::index_of_interesting_character_in_multiline_comment` (with
the same debug-only result validation as the neighboring wrappers).
- `skip_to_interesting_character_in_multiline_comment` in the lexer now
calls it instead of the scalar loop. The `Environment::ENABLE_SIMD` gate
at the call site is dropped: it exists for portable-vector codegen, but
Highway dispatches per-CPU at runtime, so baseline builds take the fast
path too — matching how `scan_single_line_comment` and the
string-literal scan already call Highway unconditionally. The ≥ 512-byte
threshold and the ASCII-code-point check are unchanged.
- Adds the new per-target kernel symbols to the `verify-baseline-static`
allowlists (x64, x64-windows, aarch64), with feature ceilings copied
from the structurally identical `IndexOfNewlineOrNonASCIIImpl` entries.
If the baseline CI scan reports different feature sets, I'll update the
entries to match its report.

### Benchmarks

`Bun.Transpiler.transformSync` on linux x64 (AVX2), comparing release
builds of main (`19dd34df3`) and this branch (`84fe8aa26`) built
back-to-back from this tree. Interleaved runs, median of 9 inner rounds,
two outer rounds per binary (values were stable to ~1%); transpiled
output hashes are identical between the two binaries for every case.
Shared container, so ratios are the signal.

| case | main | this PR | Δ |
| --- | --- | --- | --- |
| 200 × ~2 KB JSDoc block comments + 200 small fns (385 KB) | 1.071 ms |
0.834 ms | **1.28× faster** |
| 1 MB block comment, ~100-char lines | 1.331 ms | 0.822 ms | **1.62×
faster** |
| 1 MB block comment, ~1 KB lines | 1.300 ms | 0.681 ms | **1.91×
faster** |
| 16 KB license header + 200 fns (31 KB total) | 0.476 ms | 0.464 ms |
~2% (run time dominated by the code, not the comment) |
| control: same 200 fns, no comments | 0.443 ms | 0.438 ms | parity |
| control: 200 fns each behind a `//` line comment | 0.439 ms | 0.439 ms
| parity |

The two controls confirm the delta is attributable to the block-comment
path (nothing else in the lexer changed between the two revisions).

### How did you verify your code works?

All of the following ran against the debug (ASAN) build with this
change:

- `bun bd test test/bundler/transpiler/transpiler.test.js` — 165 pass /
0 fail, including a new `multi-line comment scanning` group:
comment-size sweep across the 512-byte threshold and vector-width
boundaries, a lone `*` at every offset in the first 80 bytes of a large
comment, all-`*` bodies, `\n` / `\r` / `\r\n` / U+2028 / U+2029 inside
large comments observable through ASI, non-ASCII bodies (2-byte, 4-byte,
mixed), comments ending exactly at EOF, and unterminated large comments
producing `Expected "*/" to terminate multi-line comment`.
- `bun bd test test/bundler/bundler_comments.test.ts` — 45 pass / 0
fail, including new end-to-end bundles: a large `/*! ... */` legal
comment (CRLF + non-ASCII) preserved verbatim with the code after it
intact, and ASI behavior across a large block comment verified by
running the bundled output.
- `bun bd test test/js/bun/transpiler/transpiler-truncated-utf8.test.ts`
— the guard-page fixture now also places ≥ 512-byte block comments
(terminated, unterminated, trailing `*`, trailing truncated UTF-8 lead)
so that the input ends exactly at a `PROT_NONE` page; any read past the
end of the source by the new kernel faults deterministically.
- `cargo clippy -p bun_highway -p bun_js_parser` clean, `cargo fmt`
applied, `highway_strings.cpp` is clang-format clean.
- The expected outputs for every new test were cross-checked against the
current scalar implementation before making the change, so this is
verified behavior-preserving. Like #31385, the new tests are regression
coverage for the rewritten path rather than a
failing-before/passing-after proof — there is no functional delta to
assert for a pure performance restoration; the before/after evidence for
the optimization itself is the benchmark table above.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants