Improve performance of toUpperCase and toLowerCase#194
Merged
Conversation
Fast-path unchanged ASCII runs inside translateCodePoints so case conversion no longer decodes every ASCII byte after crossing into the generic non-ASCII path. This specifically improves mixed ASCII/non-ASCII inputs such as öhello/ÖHELLO, where a non-ASCII code point is followed by a long ASCII tail that is already in the target case. In the targeted JMH benchmark with repeatCount=1024, mixed_non_ascii_ascii_noop improved from 13.9 us/op to 6.4 us/op for both toLowerCase and toUpperCase, about a 53% speedup, while mixed_non_ascii_ascii_change stayed roughly flat at around 20.6 us/op.
electrum
approved these changes
Apr 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The prior ascii fast path optimization was only applied to
toLowerCase, and this PR extends this totoUpperCase. Additionally, when there is mixed ascii and non-ascii, we can add a similar optimization where we skip the expensive per character decode for the ascii sequences. For inputs like öhello this results in a ~50% speed up.