Fix-6509 Improve collator strength benchmarking performance by Jayant-kernel · Pull Request #7589 · unicode-org/icu4x

Jayant-kernel · 2026-02-05T05:54:05Z

The Problem

Right now, the collator benchmarks run through all 5 strength levels (Primary, Secondary, Tertiary, Quaternary, Identical) on every test. The issue is that our test data (like the Polish and Latin names) only really differs at the Primary level, so we're basically running the same comparison 5 times and getting the same result each time. This makes the benchmark suite slow without giving us useful information about the higher strength levels.

What I Changed

1. Made existing benchmarks faster

Updated general benchmarks to use just tertiary strength (the default configuration) instead of all 5 levels
This should speed things up by roughly 5x
I kept the all_strength array definition in the code (in case it's needed elsewhere), but it's no longer used in the benchmark loops

2. Added realistic tests for higher strengths
Created three new test data files where the different strength levels actually matter:

TestNames_Secondary.txt – Names with accent differences (José vs Jose, Café vs Cafe, René vs Rene)
TestNames_Tertiary.txt – Names with case differences (McDonald vs MCDONALD, banana vs Banana)
TestNames_Quaternary.txt – Names with punctuation differences (can't vs cant, co-op vs coop, e-mail vs email)

Each file has its own dedicated benchmark that runs at the appropriate strength level, so we're actually testing scenarios where those strength differences matter.

Testing

Ran cargo bench -p icu_collator locally and everything compiles and runs correctly
The compiler shows all_strength as an unused variable, which confirms our changes are working as intended
The benchmark suite runs noticeably faster while providing better coverage of the different strength levels

Compress short benchmark entries to single lines per cargo fmt

Renamed all_strength to _all_strength to satisfy clippy deny(warnings)

Jayant-kernel · 2026-02-15T19:14:28Z

@hsivonen @sffc @echeran @Manishearth @robertbastian
plaese review the solution when you are free .

hsivonen · 2026-02-16T09:23:09Z

Please don't cause GitHub to re-send notication emails.

Jayant-kernel added 2 commits February 5, 2026 11:17

bench: Add targeted strength test data

27b333c

Fix unicode-org#6509: Improve collator strength benchmarking

17deb0b

Jayant-kernel requested review from echeran and hsivonen as code owners February 5, 2026 05:54

Jayant-kernel added 2 commits February 5, 2026 12:00

Fix formatting in bench.rs for CI

a02232e

Compress short benchmark entries to single lines per cargo fmt

Fix: Silence unused variable warning for CI

96bad6e

Renamed all_strength to _all_strength to satisfy clippy deny(warnings)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix-6509 Improve collator strength benchmarking performance#7589

Fix-6509 Improve collator strength benchmarking performance#7589
Jayant-kernel wants to merge 4 commits intounicode-org:mainfrom
Jayant-kernel:fix-6509-collator-strength

Jayant-kernel commented Feb 5, 2026 •

edited

Loading

Uh oh!

Jayant-kernel commented Feb 15, 2026

Uh oh!

hsivonen commented Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Jayant-kernel commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The Problem

What I Changed

Testing

Uh oh!

Jayant-kernel commented Feb 15, 2026

Uh oh!

hsivonen commented Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Jayant-kernel commented Feb 5, 2026 •

edited

Loading