Fix-6509 Improve collator strength benchmarking performance#7589
Open
Jayant-kernel wants to merge 4 commits intounicode-org:mainfrom
Open
Fix-6509 Improve collator strength benchmarking performance#7589Jayant-kernel wants to merge 4 commits intounicode-org:mainfrom
Jayant-kernel wants to merge 4 commits intounicode-org:mainfrom
Conversation
Compress short benchmark entries to single lines per cargo fmt
Renamed all_strength to _all_strength to satisfy clippy deny(warnings)
Contributor
Author
|
@hsivonen @sffc @echeran @Manishearth @robertbastian |
Member
|
Please don't cause GitHub to re-send notication emails. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #6509
The Problem
Right now, the collator benchmarks run through all 5 strength levels (Primary, Secondary, Tertiary, Quaternary, Identical) on every test. The issue is that our test data (like the Polish and Latin names) only really differs at the Primary level, so we're basically running the same comparison 5 times and getting the same result each time. This makes the benchmark suite slow without giving us useful information about the higher strength levels.
What I Changed
1. Made existing benchmarks faster
all_strengtharray definition in the code (in case it's needed elsewhere), but it's no longer used in the benchmark loops2. Added realistic tests for higher strengths
Created three new test data files where the different strength levels actually matter:
Each file has its own dedicated benchmark that runs at the appropriate strength level, so we're actually testing scenarios where those strength differences matter.
Testing
cargo bench -p icu_collatorlocally and everything compiles and runs correctlyall_strengthas an unused variable, which confirms our changes are working as intended