Optimize TextEncoder.encode: restore SIMD ASCII fast paths lost in the Rust port by robobun · Pull Request #31385 · oven-sh/bun

robobun · 2026-05-25T10:01:55Z

What does this PR do?

TextEncoder.encode's Rust port was measurably slower than the Zig implementation it replaced (4-char ASCII ~2× slower, 12 KB ASCII ~40% slower; only the large UTF-8 case was a tie). This PR restores the lost fast paths and vectorization:

Cause

TextEncoder__encode8/16/encodeRopeString zero-filled a 2 KB stack buffer on every call ([0u8; 2048]) where the Zig code used undefined, then copied the output a second time from that buffer into JSC memory. For a 4-byte input the memset alone is most of the regression.
The 12 KB ASCII benchmark case is actually JSC rope iteration ("Hello World!".repeat(1024) stays a rope): ~1024 12-byte leaves hit copy_latin1_into_utf8_stop_on_non_ascii, whose port replaced the Zig fused @Vector(16,u8)/SWAR copy with a scan (first_non_ascii) followed by a memcpy per span — two passes plus per-leaf call overhead.
element_length_latin1_into_utf8 was ported as a scalar span loop instead of simdutf.length.utf8.from.latin1.
The small UTF-16 path paid an exact utf8_length_from_utf16le pass even when the destination buffer already provably fit the worst case (the Zig out_len shortcut was dropped).

Fix

New CopyAsciiPrefix highway kernel in highway_strings.cpp (same HWY_EXPORT + HWY_DYNAMIC_DISPATCH pattern as the existing kernels): fused scan+copy of the leading ASCII run that stops at the first byte ≥ 0x80 and writes only the bytes it reports. Exposed as bun_highway::copy_ascii_prefix, with per-target symbols added to the verify-baseline-static allowlists.
copy_latin1_into_utf8_stop_on_non_ascii now streams through a fused copy_ascii_prefix helper: SWAR u64 for short runs (rope leaves are usually a dozen bytes, so FFI dispatch isn't worth it), the highway kernel for runs ≥ 64 bytes. Only buf[..written] is ever written, same as before.
element_length_latin1_into_utf8 uses simdutf (scalar count for ≤ 32 bytes), matching the Zig original.
copy_utf16_into_utf8 skips the exact-length pass when buf.len() >= 3 * utf16.len() (worst case), mirroring the Zig out_len selection; a copy_utf16_into_utf8_with_utf8_len variant lets callers that already computed the length (to size the destination) avoid recomputing it.
TextEncoder__encode8 / __encode16 now encode straight into an exactly-sized, JSC-owned Uint8Array — no stack buffer, no second copy, and the former large-string path no longer hands JSC an external Vec. encode16 keeps a small 192-byte stack path for tiny strings and falls back to the allocating U+FFFD path when unpaired surrogates make the exact-size buffer insufficient. TextEncoder__encodeRopeString allocates the array up front and lets the rope iterator write each segment directly into it.

Output bytes, encodeInto read/written semantics, and the "never write past written" guarantee are unchanged.

Benchmarks

bench/snippets/text-encoder.mjs, median of 3 interleaved rounds on linux x64 (AVX2/AVX-512), comparing the last Zig-based release (1.3.9), the current Rust implementation, and this PR (both built with the release profile from this tree):

case	Zig 1.3.9	Rust before	Rust after
4 ascii	40.6 ns	61.8 ns	40.9 ns
4 utf8	59.4 ns	80.2 ns	59.6 ns
12 ascii	45.2 ns	66.0 ns	47.3 ns
12 utf8	71.0 ns	114.1 ns	71.2 ns
12288 ascii (rope)	16.95 µs	21.5 µs	16.58 µs
18432 utf8	10.54 µs	8.60 µs	8.37 µs

The tiny-string cases are now bounded by shared machinery (call dispatch, Uint8Array allocation), so they land at parity with Zig instead of 1.5–2× behind; the large cases win outright. Additional paths not in the stock benchmark, same methodology:

case	Zig 1.3.9	Rust before	Rust after
encode 12288 ascii, resolved (non-rope) string	1.98 µs	1.86 µs	1.51 µs
encode 12289 latin1 (1 non-ASCII byte)	2.69 µs	2.44 µs	2.04 µs
encodeInto 12288 ascii	838 ns	330 ns	201 ns
encode 1 MB ascii	165 µs	166 µs	161 µs

(The benchmark host is a shared/noisy container; medians of interleaved runs are reported. Ratios, not absolute numbers, are the signal.)

How did you verify your code works?

bun bd test test/js/web/encoding/ — all TextEncoder/TextEncoderStream/TextDecoder tests pass with the debug (ASAN) build. The only failures in that directory are the two pre-existing TextDecoder ... should not leak the output buffer RSS tests, which fail on any local ASAN debug build because the binary isn't named bun-asan (the measured delta is within the test's own ASAN allowance).
Added boundary coverage to test/js/web/encoding/text-encoder.test.js (10 new tests, ~10k assertions): every length around the SWAR/SIMD thresholds, a non-ASCII byte at every word/vector boundary position, encodeInto exact-fit and partial-fit behavior including "bytes past written are untouched", rope strings with >64-byte segments and with non-ASCII segments (bail path), and long UTF-16 strings with and without unpaired surrogates, all cross-checked against a pure-JS reference encoder.
cargo clippy -p bun_core -p bun_highway clean; websocket/inspect suites (which share the rewritten Latin-1 kernels) pass apart from tests that dial external network endpoints unavailable in the sandbox.
The baseline-ISA allowlist entries for the new kernel follow the existing FirstNonAscii8Impl/FillWithSkipMaskImpl ceilings; if verify-baseline-static reports different feature sets on the baseline/aarch64 CI builds I'll update the entries to match its report.
Note on test coverage: this is a deliberately behavior-preserving optimization, so the new tests are regression coverage for the rewritten paths (they pass before and after this change by design — there is no functional delta to assert). The before/after evidence for the optimization itself is the benchmark tables above, reproducible via bench/snippets/text-encoder.mjs.

…e Rust port The Rust port of TextEncoder.encode was slower than the Zig implementation it replaced: every call zero-filled a 2 KB stack buffer the Zig version left undefined, encode8/encode16 copied the output twice (stack buffer then JSC memory), and the Latin-1 ASCII copy loop lost its fused vector/SWAR implementation (it re-scanned with simdutf and then memcpy'd each span, which is especially costly for the per-leaf callbacks of the rope fast path). - Add a CopyAsciiPrefix highway kernel (fused scan+copy of the leading ASCII run, dynamic dispatch like the rest of highway_strings.cpp) and expose it as bun_highway::copy_ascii_prefix; allowlist its per-target variants for the baseline ISA verifier. - Rewrite copy_latin1_into_utf8_stop_on_non_ascii around a fused copy helper: SWAR u64 for short runs (rope leaves), the highway kernel for long runs, writing only buf[..written]. - element_length_latin1_into_utf8 now uses simdutf (as the Zig original did) with a scalar count for short inputs. - copy_utf16_into_utf8 skips the exact-length pass when the destination can hold the 3-bytes-per-unit worst case, and a _with_utf8_len variant lets callers that already sized the buffer reuse that length. - TextEncoder__encode8/encode16 now encode straight into an exactly-sized, JSC-owned Uint8Array (no stack buffer, no double copy); encode16 keeps a small 192-byte stack path for tiny strings and falls back to the allocating U+FFFD path for unpaired surrogates. TextEncoder__encodeRopeString allocates the array up front and lets the rope iterator write into it directly. - Extend text-encoder.test.js with boundary coverage for the new paths (SWAR/ SIMD threshold lengths, non-ASCII at word boundaries, encodeInto partial fits, large rope segments, long UTF-16 with unpaired surrogates).

robobun · 2026-05-25T10:02:07Z

^{Updated 6:51 PM PT - May 25th, 2026}

❌ @robobun, your commit 83d3e32 has 1 failures in Build #58071 (All Failures):

test/bundler/transpiler/transpiler.test.js - code 1 on 🪟 2019 x64
test/bundler/transpiler/transpiler.test.js - code 1 on 🪟 2019 x64-baseline
test/bundler/transpiler/transpiler.test.js - code 1 on 🪟 11 aarch64

🧪 To try this PR locally:

bunx bun-pr 31385

That installs a local version of the PR into your bun-31385 executable, so you can run:

bun-31385 --bun

github-actions · 2026-05-25T10:06:14Z

Found 1 issue this PR may fix:

Switch from Zig’s @Vector to Google Highway SIMD lib #8782 - This PR adds a new Highway SIMD kernel (CopyAsciiPrefixImpl) for fused ASCII scan+copy, partially implementing the goal of adopting Google Highway for SIMD operations

If this is helpful, copy the block below into the PR description to auto-close this issue on merge.

Fixes #8782

🤖 Generated with Claude Code

coderabbitai · 2026-05-25T10:06:28Z

Warning

Review limit reached

@robobun, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 38 minutes and 32 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e7a7223e-230b-47f1-8f7f-e610ed1566d2

📥 Commits

Reviewing files that changed from the base of the PR and between 4f91b77 and 83d3e32.

📒 Files selected for processing (5)

src/bun_core/lib.rs
src/highway/lib.rs
src/jsc/bindings/highway_strings.cpp
src/runtime/webcore/TextEncoder.rs
test/js/web/encoding/text-encoder.test.js

Walkthrough

This PR adds a SIMD-accelerated CopyAsciiPrefixImpl routine that copies only the leading ASCII bytes from source to destination. The implementation is wired into the Rust encoding pipeline, refactoring UTF-16 and Latin-1 encoders to use the optimized ASCII prefix scanning. TextEncoder is refactored to allocate exact-sized buffers and use precise length computation. Symbol allowlists are updated across x64, x64-windows, and aarch64 platforms to include the new exports.

Changes

ASCII Prefix Copy SIMD Optimization

Layer / File(s)	Summary
C++ SIMD implementation and dispatch `src/jsc/bindings/highway_strings.cpp`	Introduces `CopyAsciiPrefixImpl` using Highway vector operations to scan for the first non-ASCII byte (≥ 0x7F) and copies only the ASCII prefix into `dst`, returning the copied byte count. Includes optimized full-vector stores, tail handling to avoid overwriting, and scalar fallback. Registered in HWY dispatch table and exposed via `extern "C"` `highway_copy_ascii_prefix` wrapper.
Rust FFI wrapper and public API `src/highway/lib.rs`	Adds FFI binding for `highway_copy_ascii_prefix` and provides a public safe Rust wrapper `copy_ascii_prefix` that bounds the operation to `min(src.len(), dst.len())`, returns the copied count, and includes debug assertions.
Core encoding logic refactoring `src/bun_core/lib.rs`	Introduces internal `copy_ascii_prefix` helper with SWAR/u64 optimization for short runs and highway dispatch for longer inputs. Updates `copy_latin1_into_utf8_stop_on_non_ascii` to fuse ASCII scanning with copying via repeated `copy_ascii_prefix` calls, then encode non-ASCII bytes as 2-byte UTF-8. Refactors `copy_utf16_into_utf8` to compute conservative bounds and delegate to new `copy_utf16_into_utf8_with_utf8_len` helper. Optimizes `element_length_latin1_into_utf8` with scalar fast path for short inputs and simdutf-backed computation for longer inputs.
TextEncoder implementation refactoring `src/runtime/webcore/TextEncoder.rs`	Updates `TextEncoder__encode8` to allocate exact-sized UTF-8 output and use ASCII fast path for byte-for-byte copy or computed UTF-8 length for non-ASCII paths. Consolidates `TextEncoder__encode16` and the C entry point into shared `encode16_impl` helper using stack buffer for small inputs, exact-sized allocation for typical cases, and fallback for unpaired surrogates. Optimizes `TextEncoder__encodeRopeString` ASCII fast path to pre-allocate output buffer and write directly into backing bytes.
Test coverage for encoding paths `test/js/web/encoding/text-encoder.test.js`	Adds pure-JS UTF-8 reference encoder for validation. Introduces tests for Latin-1 ASCII fast path boundaries, `encodeInto()` buffer sizing with non-ASCII characters, rope encoding with ASCII segments and non-ASCII character placement, and UTF-16 exact-size path tests with unpaired surrogate handling.
Symbol verification allowlists `scripts/verify-baseline-static/allowlist-aarch64.txt`, `scripts/verify-baseline-static/allowlist-x64-windows.txt`, `scripts/verify-baseline-static/allowlist-x64.txt`	Updates platform-specific symbol allowlists to include `CopyAsciiPrefixImpl` and `CopyAsciiPrefix` exports across multiple SIMD tiers: aarch64 (SVE2, SVE variants), x64-windows (AVX10\_2, AVX2, AVX3, AVX3\_DL, AVX3\_SPR, AVX3\_ZEN4), and x64 (AVX SPR19, ZEN419, and other AVX variants).

Suggested reviewers

Jarred-Sumner

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main objective: restoring SIMD ASCII fast paths to TextEncoder.encode that were lost during the Rust port, which directly addresses the performance regressions detailed in the PR.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The PR description is comprehensive and well-structured, following the required template with detailed explanations of the problem, fix, and verification.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

robobun · 2026-05-25T10:07:56Z

Not adding Fixes #8782 — that issue tracks the broader migration of Bun's SIMD code to Highway with runtime dispatch, and this PR only adds one more kernel (CopyAsciiPrefixImpl) to the Highway infrastructure that already exists in highway_strings.cpp. It's a step in that direction, not a completion of it.

robobun · 2026-05-25T11:08:09Z

CI status for reviewers: 74 of the 76 checks are green, including every build, test, and verify-baseline lane that actually exercises this diff (linux x64/musl/baseline, linux+macOS aarch64, windows x64, ASAN, etc.).

The two remaining non-green entries are infrastructure, not code:

darwin-14-aarch64-test-bun — Expired: no agent picked the job up, so it never ran. The same commit passed darwin-26-aarch64-test-bun and the other macOS lanes.
buildkite/bun (umbrella) is red only because of that expired job. windows-11-aarch64-test-bun failed once earlier and already passed on automatic retry.

Retrying just the expired job in Buildkite should turn the build fully green — I'm deliberately not pushing an empty commit for it, to avoid restarting the entire pipeline and throwing away the existing green results.

Update: the expired darwin-14-aarch64-test-bun job has since been retried and passed — every job-level check on this PR is now green (75 passing). Only the umbrella buildkite/bun status for build #57900 still reads "failed" because the build finalized before that retry completed.

Jarred-Sumner

Delete all the code comments you added.

Jarred-Sumner · 2026-05-26T00:04:23Z

Delete all the code comments you added.

robobun · 2026-05-26T00:19:15Z

Done — removed the added code comments in 83d3e32 (only the // SAFETY: annotations remain, since the workspace clippy lint undocumented_unsafe_blocks = deny requires them).

claude

No bugs found, but this rewrites hot-path encoding logic (new Highway SIMD kernel with overlapping tail stores, SWAR fallback, direct writes into uninitialized JSC-owned buffers sized by precomputed lengths, and a new surrogate fallback path in encode16) — worth a human pass on the buffer-sizing invariants and the rope/encode16 allocation changes.

Extended reasoning...

Overview

This PR restores SIMD fast paths in TextEncoder.encode that were lost in the Zig→Rust port. It touches eight files: a new ~50-line Highway SIMD kernel CopyAsciiPrefixImpl in highway_strings.cpp with overlapping tail-vector handling; a Rust FFI wrapper in src/highway/lib.rs; a rewrite of copy_latin1_into_utf8_stop_on_non_ascii and element_length_latin1_into_utf8 plus a new copy_utf16_into_utf8_with_utf8_len variant in src/bun_core/lib.rs; a substantial rewrite of TextEncoder__encode8/16/encodeRopeString in TextEncoder.rs to allocate exact-sized uninitialized Uint8Arrays and write directly into them; mechanical allowlist additions for the new per-target symbols; and ~170 lines of new boundary tests.

Security risks

The main risk surface is memory safety rather than classic injection/auth: the new code creates uninitialized JSC Uint8Array buffers sized by element_length_latin1_into_utf8 / element_length_utf16_into_utf8 and then relies on the encoder filling exactly that many bytes. If the length computation and the writer ever disagree, uninitialized heap bytes would be exposed to JavaScript. The Highway kernel does an overlapping StoreU at dst + (len - N) for the tail, and HWY_RESTRICT is applied to src/dst — both look correct (src and dst don't alias each other; the overlap is dst-with-dst, which restrict doesn't forbid), but this is exactly the kind of low-level invariant that benefits from a second pair of eyes. No auth, crypto, or permission code is touched.

Level of scrutiny

High. TextEncoder.encode is a Web-standard hot path called from arbitrary user JS; the change introduces hand-written SIMD with manual tail handling, SWAR bit tricks (trailing_zeros()/8 on a high-bit mask), unsafe FFI, and a new control-flow branch in encode16_impl where an under-sized exact buffer triggers a second allocation via to_utf8_alloc_with_type (the first Uint8Array is abandoned to GC). encodeRopeString now always heap-allocates up front and discards the array if a non-ASCII segment is hit. These are all plausible and well-tested, but they're design/behavior changes in a critical path, not mechanical edits.

Other factors

CI is fully green across all platforms including ASAN. The PR adds thorough boundary tests (every length around the SWAR/SIMD thresholds, non-ASCII at every vector boundary, encodeInto past-written checks, rope segments, unpaired-surrogate fallback) cross-checked against a pure-JS reference encoder. The maintainer has already engaged (requested comment removal, which was done in 83d3e32). The bug-hunting system found nothing. Given the scope and the fact that a human reviewer is already in the loop, deferring rather than auto-approving is the right call.

CrazyboyQCD · 2026-05-26T01:47:03Z

+
+        const HIGH_BITS: u64 = 0x8080_8080_8080_8080;
+        let mut copied = 0usize;
+        for (d, s) in dst.chunks_exact_mut(8).zip(src.chunks_exact(8)) {


@robobun
Avoid using chunks_exact_* methods here, see #31415

* oven/main (10 new commits): Optimize TextEncoder.encode: restore SIMD ASCII fast paths lost in the Rust port (oven-sh#31385) js_parser: sanitize auto-generated default export name for digit-named modules (oven-sh#31403) fetch: run checkServerIdentity before writing the request (oven-sh#31325) ffi: avoid copying the threadsafe callback wrapper on the calling thread (oven-sh#31332) install: gate the exit-callback cache teardown to the main thread (oven-sh#31376) fix(node:module): don't register native helpers as their own constructors (oven-sh#31393) css: escape custom pseudo-class/element names when printing (oven-sh#31404) Deepen the lots-of-for-loop fixture so the transpiler stack-overflow tests throw on Windows (oven-sh#31382) Hardening: input validation and bounds tightening across 36 subsystems (round 4) (oven-sh#31339) Speed up FormData multipart serialization (oven-sh#31379) Auto-merged: src/install/PackageManager.rs, src/runtime/cli/upgrade_command.rs, src/runtime/webcore/Blob.rs, src/sys/lib.rs

### What does this PR do? Restores the block-comment SIMD skip that the Rust port of the JS lexer lost, by moving it to the same Google Highway kernels the lexer already uses for single-line comments. **Before this PR** - `scan_single_line_comment` kept its SIMD path in the port (`bun_highway::index_of_newline_or_non_ascii_or_hash_or_at`). - `scan_multi_line_comment_body` kept the Zig structure (only take the fast path while the current code point is ASCII and ≥ 512 bytes remain), but the inner scan — Zig's `skipToInterestingCharacterInMultilineComment`, a 16-byte `@Vector` loop — was ported as a scalar byte-at-a-time loop with a `TODO(port): SIMD reimplementation` note (`src/js_parser/lexer.rs`). Large license headers / JSDoc blocks were scanned one byte per iteration. **This PR** - Adds an `IndexOfInterestingCharacterInMultilineCommentImpl` kernel to `src/jsc/bindings/highway_strings.cpp` (same `HWY_EXPORT` + `HWY_DYNAMIC_DISPATCH` pattern as the existing kernels). It returns the index of the first `*` (potential `*/` terminator), `\r`, `\n` (newline tracking for ASI), or non-ASCII byte (so U+2028/U+2029 and other multi-byte sequences are still decoded by the scalar path) — exactly the byte classes the Zig `@Vector` version stopped at. - Exposes it as `bun_highway::index_of_interesting_character_in_multiline_comment` (with the same debug-only result validation as the neighboring wrappers). - `skip_to_interesting_character_in_multiline_comment` in the lexer now calls it instead of the scalar loop. The `Environment::ENABLE_SIMD` gate at the call site is dropped: it exists for portable-vector codegen, but Highway dispatches per-CPU at runtime, so baseline builds take the fast path too — matching how `scan_single_line_comment` and the string-literal scan already call Highway unconditionally. The ≥ 512-byte threshold and the ASCII-code-point check are unchanged. - Adds the new per-target kernel symbols to the `verify-baseline-static` allowlists (x64, x64-windows, aarch64), with feature ceilings copied from the structurally identical `IndexOfNewlineOrNonASCIIImpl` entries. If the baseline CI scan reports different feature sets, I'll update the entries to match its report. ### Benchmarks `Bun.Transpiler.transformSync` on linux x64 (AVX2), comparing release builds of main (`19dd34df3`) and this branch (`84fe8aa26`) built back-to-back from this tree. Interleaved runs, median of 9 inner rounds, two outer rounds per binary (values were stable to ~1%); transpiled output hashes are identical between the two binaries for every case. Shared container, so ratios are the signal. | case | main | this PR | Δ | | --- | --- | --- | --- | | 200 × ~2 KB JSDoc block comments + 200 small fns (385 KB) | 1.071 ms | 0.834 ms | **1.28× faster** | | 1 MB block comment, ~100-char lines | 1.331 ms | 0.822 ms | **1.62× faster** | | 1 MB block comment, ~1 KB lines | 1.300 ms | 0.681 ms | **1.91× faster** | | 16 KB license header + 200 fns (31 KB total) | 0.476 ms | 0.464 ms | ~2% (run time dominated by the code, not the comment) | | control: same 200 fns, no comments | 0.443 ms | 0.438 ms | parity | | control: 200 fns each behind a `//` line comment | 0.439 ms | 0.439 ms | parity | The two controls confirm the delta is attributable to the block-comment path (nothing else in the lexer changed between the two revisions). ### How did you verify your code works? All of the following ran against the debug (ASAN) build with this change: - `bun bd test test/bundler/transpiler/transpiler.test.js` — 165 pass / 0 fail, including a new `multi-line comment scanning` group: comment-size sweep across the 512-byte threshold and vector-width boundaries, a lone `*` at every offset in the first 80 bytes of a large comment, all-`*` bodies, `\n` / `\r` / `\r\n` / U+2028 / U+2029 inside large comments observable through ASI, non-ASCII bodies (2-byte, 4-byte, mixed), comments ending exactly at EOF, and unterminated large comments producing `Expected "*/" to terminate multi-line comment`. - `bun bd test test/bundler/bundler_comments.test.ts` — 45 pass / 0 fail, including new end-to-end bundles: a large `/*! ... */` legal comment (CRLF + non-ASCII) preserved verbatim with the code after it intact, and ASI behavior across a large block comment verified by running the bundled output. - `bun bd test test/js/bun/transpiler/transpiler-truncated-utf8.test.ts` — the guard-page fixture now also places ≥ 512-byte block comments (terminated, unterminated, trailing `*`, trailing truncated UTF-8 lead) so that the input ends exactly at a `PROT_NONE` page; any read past the end of the source by the new kernel faults deterministically. - `cargo clippy -p bun_highway -p bun_js_parser` clean, `cargo fmt` applied, `highway_strings.cpp` is clang-format clean. - The expected outputs for every new test were cross-checked against the current scalar implementation before making the change, so this is verified behavior-preserving. Like #31385, the new tests are regression coverage for the rewritten path rather than a failing-before/passing-after proof — there is no functional delta to assert for a pure performance restoration; the before/after evidence for the optimization itself is the benchmark table above.

github-actions Bot added the claude label May 25, 2026

[autofix.ci] apply automated fixes

4f91b77

Jarred-Sumner requested changes May 25, 2026

View reviewed changes

Remove added code comments

83d3e32

claude Bot reviewed May 26, 2026

View reviewed changes

Jarred-Sumner merged commit 0b1e7e5 into main May 26, 2026
76 of 78 checks passed

Jarred-Sumner deleted the farm/42e4aca0/textencoder-highway-simd branch May 26, 2026 01:37

CrazyboyQCD reviewed May 26, 2026

View reviewed changes

robobun mentioned this pull request May 26, 2026

perf: use as_chunks* instead of chunks_exact* for static chunk sizes #31415

Open

This was referenced May 26, 2026

Strip comment blocks longer than 3 lines from .rs files #31201

Closed

js_parser: restore SIMD block comment skipping via Highway #31433

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize TextEncoder.encode: restore SIMD ASCII fast paths lost in the Rust port#31385

Optimize TextEncoder.encode: restore SIMD ASCII fast paths lost in the Rust port#31385
Jarred-Sumner merged 3 commits into
mainfrom
farm/42e4aca0/textencoder-highway-simd

robobun commented May 25, 2026 •

edited

Loading

Uh oh!

robobun commented May 25, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 25, 2026

Uh oh!

coderabbitai Bot commented May 25, 2026 •

edited

Loading

Review limit reached

Uh oh!

robobun commented May 25, 2026

Uh oh!

robobun commented May 25, 2026 •

edited

Loading

Uh oh!

Jarred-Sumner left a comment

Uh oh!

Jarred-Sumner commented May 26, 2026

Uh oh!

robobun commented May 26, 2026

Uh oh!

claude Bot left a comment

Uh oh!

Uh oh!

CrazyboyQCD May 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

robobun commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Benchmarks

How did you verify your code works?

Uh oh!

robobun commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 25, 2026

Uh oh!

coderabbitai Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Suggested reviewers

Uh oh!

robobun commented May 25, 2026

Uh oh!

robobun commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Jarred-Sumner left a comment

Choose a reason for hiding this comment

Uh oh!

Jarred-Sumner commented May 26, 2026

Uh oh!

robobun commented May 26, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

Uh oh!

CrazyboyQCD May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

robobun commented May 25, 2026 •

edited

Loading

robobun commented May 25, 2026 •

edited

Loading

coderabbitai Bot commented May 25, 2026 •

edited

Loading

robobun commented May 25, 2026 •

edited

Loading

CrazyboyQCD May 26, 2026 •

edited

Loading