8382482: Remove and reorder instructions in x86 scalar floating point min/max reduction loops by missa-prime · Pull Request #30806 · openjdk/jdk

missa-prime · 2026-04-17T23:46:59Z

With AVX10, we can branch on more common cases (e.g., "==") before parity that's currently not possible with previous instruction sets. These changes split the code paths between AVX10 and non-AVX10, so that basic blocks are ordered differently in the scalar floating point min/max reduction rules. Additionally, some unnecessary instructions for zero and non-zero input values in the "==" case are now removed. The JTREG tests listed below were used to verify correctness with the recommended JVM options mentioned in corresponding source files. All modifications and tests used OpenJDK v27-b18 as the baseline build.

jtreg:test/hotspot/jtreg/compiler/igvn/TestMinMaxIdentity.java
jtreg:test/hotspot/jtreg/compiler/intrinsics/math/TestFpMinMaxIntrinsics.java
jtreg:test/hotspot/jtreg/compiler/intrinsics/math/TestFpMinMaxReductions.java

To observe the performance uplift, the benchmarks in the FpMinMaxIntrinsics.java JMH source should be run with identical values in the arrays. The table below contains data captured under these conditions with an Intel® Xeon 6767P. Only the benchmarks affected by the code changes are included. Overall, there is a 20% improvement in the geomean runtime when the changes are applied.

Benchmark	Baseline runtime (ns/op)	Target runtime (ns/op)	Speedup
dMaxReduceGlobalAccumulator	1038.222	796.514	1.30x
dMaxReduceInOuterLoop	3166.615	3130.784	1.01x
dMaxReduceNonCounted	1039.919	796.675	1.31x
dMinReduceGlobalAccumulator	1040.935	796.212	1.31x
dMinReduceInOuterLoop	3094.092	3053.189	1.01x
dMinReduceNonCounted	1039.488	797.242	1.30x
fMaxReduceGlobalAccumulator	1038.396	795.881	1.30x
fMaxReduceInOuterLoop	3183.612	3123.759	1.02x
fMaxReduceNonCounted	1039.947	797.281	1.30x
fMinReduceGlobalAccumulator	1040.677	795.979	1.31x
fMinReduceInOuterLoop	3138.846	3113.044	1.01x
fMinReduceNonCounted	1040.162	797.090	1.30x
Geomean	1503.76	1253.678	1.20x

It's important to note that performance doesn't regress with more varied data though. The changes in this PR update the array values in FpMinMaxIntrinsics.java to include random and structured patterns. Specifically, 50% are random, 20% are zeroes, 10% are descending, 10% are ascending, and 10% are NaNs. The entries are interspersed throughout the arrays in uniform fashion. The table below shows results collected with this new scheme. Overall, the geomean runtime remains flat when the changes are applied.

Benchmark	Baseline runtime (ns/op)	Target runtime (ns/op)	Speedup
dMaxReduceGlobalAccumulator	668.804	668.761	1.00x
dMaxReduceInOuterLoop	3036.979	2987.393	1.02x
dMaxReduceNonCounted	673.008	672.938	1.00x
dMinReduceGlobalAccumulator	668.403	668.937	1.00x
dMinReduceInOuterLoop	2986.121	2987.771	1.00x
dMinReduceNonCounted	673.293	672.864	1.00x
fMaxReduceGlobalAccumulator	668.324	668.465	1.00x
fMaxReduceInOuterLoop	3141.699	3138.469	1.00x
fMaxReduceNonCounted	669.225	669.407	1.00x
fMinReduceGlobalAccumulator	668.584	668.199	1.00x
fMinReduceInOuterLoop	3139.533	3114.516	1.01x
fMinReduceNonCounted	672.990	672.786	1.00x
Geomean	1113.838	1111.488	1.00x

I confirm that I make this contribution in accordance with the OpenJDK Interim AI Policy.

Progress

Change must not contain extraneous whitespace
Commit message must refer to an issue
Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)

Issue

JDK-8382482: Remove and reorder instructions in x86 scalar floating point min/max reduction loops (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/30806/head:pull/30806
$ git checkout pull/30806

Update a local copy of the PR:
$ git checkout pull/30806
$ git pull https://git.openjdk.org/jdk.git pull/30806/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 30806

View PR using the GUI difftool:
$ git pr show -t 30806

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/30806.diff

Using Webrev

Link to Webrev Comment

…uction rules.

bridgekeeper · 2026-04-17T23:48:04Z

👋 Welcome back missa! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2026-04-17T23:48:25Z

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

openjdk · 2026-04-17T23:49:34Z

@missa-prime The following label will be automatically applied to this pull request:

hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

openjdk · 2026-04-17T23:49:36Z

The total number of required reviews for this PR has been set to 2 based on the presence of this label: hotspot-compiler. This can be overridden with the /reviewers command.

mlbridge · 2026-04-18T00:20:36Z

Webrevs

galderz

In #30844, I'm expanding the MinMaxVector benchmark to test fp values. Some of the benchmarks target reduction use cases. Maybe you could run the fp benchmarks there and see what impact your changes have on them when using avx10? IIRC you might have to disable superword to get the branching assembly to kick in.

…eparate code paths.

…Intrinsics.java file.

missa-prime · 2026-04-27T09:43:05Z

In #30844, I'm expanding the MinMaxVector benchmark to test fp values. Some of the benchmarks target reduction use cases. Maybe you could run the fp benchmarks there and see what impact your changes have on them when using avx10? IIRC you might have to disable superword to get the branching assembly to kick in.

I ran MinMaxVector with default benchmark parameters using -XX:-UseSuperWord VM argument on builds with and without the changes. Overall, I don't see much change in performance apart from some outliers in the non-AVX10 and AVX10 paths. I believe most of the time we'll get -XX:+UseSuperWord and loop auto-vectorization will work as expected, so these scalar floating point reduction min/max rules won't be used very often.

Adjust instruction sequence used by scalar floating point min/max red…

bd86cbe

…uction rules.

openjdk Bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Apr 17, 2026

Add missing character at the beginning of function definition.

5a8543e

missa-prime marked this pull request as ready for review April 18, 2026 00:14

openjdk Bot added the rfr Pull request is ready for review label Apr 18, 2026

Extend dashed line to cover all commented text above emit_fp_min_max.

b7b2ed9

galderz reviewed Apr 24, 2026

View reviewed changes

missa-prime added 2 commits April 24, 2026 15:55

Split AVX10.2 and non-AVX10.2 min/max floating point reduction into s…

23ae199

…eparate code paths.

Change the composition of the float and double arrays in the FpMinMax…

29e7018

…Intrinsics.java file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8382482: Remove and reorder instructions in x86 scalar floating point min/max reduction loops#30806

8382482: Remove and reorder instructions in x86 scalar floating point min/max reduction loops#30806
missa-prime wants to merge 5 commits intoopenjdk:masterfrom
missa-prime:user/missa-prime/avx10_2

missa-prime commented Apr 17, 2026 •

edited by openjdk Bot

Loading

Uh oh!

bridgekeeper Bot commented Apr 17, 2026

Uh oh!

openjdk Bot commented Apr 17, 2026

Uh oh!

openjdk Bot commented Apr 17, 2026 •

edited

Loading

Uh oh!

openjdk Bot commented Apr 17, 2026

Uh oh!

mlbridge Bot commented Apr 18, 2026 •

edited

Loading

Uh oh!

galderz left a comment

Uh oh!

missa-prime commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Conversation

missa-prime commented Apr 17, 2026 • edited by openjdk Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewing

Uh oh!

bridgekeeper Bot commented Apr 17, 2026

Uh oh!

openjdk Bot commented Apr 17, 2026

Uh oh!

openjdk Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openjdk Bot commented Apr 17, 2026

Uh oh!

mlbridge Bot commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

galderz left a comment

Choose a reason for hiding this comment

Uh oh!

missa-prime commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

missa-prime commented Apr 17, 2026 •

edited by openjdk Bot

Loading

openjdk Bot commented Apr 17, 2026 •

edited

Loading

mlbridge Bot commented Apr 18, 2026 •

edited

Loading