8382482: Remove and reorder instructions in x86 scalar floating point min/max reduction loops#30806
8382482: Remove and reorder instructions in x86 scalar floating point min/max reduction loops#30806missa-prime wants to merge 5 commits intoopenjdk:masterfrom
Conversation
|
👋 Welcome back missa! A progress list of the required criteria for merging this PR into |
|
❗ This change is not yet ready to be integrated. |
|
@missa-prime The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
|
The total number of required reviews for this PR has been set to 2 based on the presence of this label: |
Webrevs
|
galderz
left a comment
There was a problem hiding this comment.
In #30844, I'm expanding the MinMaxVector benchmark to test fp values. Some of the benchmarks target reduction use cases. Maybe you could run the fp benchmarks there and see what impact your changes have on them when using avx10? IIRC you might have to disable superword to get the branching assembly to kick in.
…eparate code paths.
…Intrinsics.java file.
I ran |
With AVX10, we can branch on more common cases (e.g., "==") before parity that's currently not possible with previous instruction sets. These changes split the code paths between AVX10 and non-AVX10, so that basic blocks are ordered differently in the scalar floating point min/max reduction rules. Additionally, some unnecessary instructions for zero and non-zero input values in the "==" case are now removed. The JTREG tests listed below were used to verify correctness with the recommended JVM options mentioned in corresponding source files. All modifications and tests used OpenJDK v27-b18 as the baseline build.
jtreg:test/hotspot/jtreg/compiler/igvn/TestMinMaxIdentity.javajtreg:test/hotspot/jtreg/compiler/intrinsics/math/TestFpMinMaxIntrinsics.javajtreg:test/hotspot/jtreg/compiler/intrinsics/math/TestFpMinMaxReductions.javaTo observe the performance uplift, the benchmarks in the
FpMinMaxIntrinsics.javaJMH source should be run with identical values in the arrays. The table below contains data captured under these conditions with an Intel® Xeon 6767P. Only the benchmarks affected by the code changes are included. Overall, there is a 20% improvement in the geomean runtime when the changes are applied.It's important to note that performance doesn't regress with more varied data though. The changes in this PR update the array values in
FpMinMaxIntrinsics.javato include random and structured patterns. Specifically, 50% are random, 20% are zeroes, 10% are descending, 10% are ascending, and 10% are NaNs. The entries are interspersed throughout the arrays in uniform fashion. The table below shows results collected with this new scheme. Overall, the geomean runtime remains flat when the changes are applied.Progress
Issue
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/30806/head:pull/30806$ git checkout pull/30806Update a local copy of the PR:
$ git checkout pull/30806$ git pull https://git.openjdk.org/jdk.git pull/30806/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 30806View PR using the GUI difftool:
$ git pr show -t 30806Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/30806.diff
Using Webrev
Link to Webrev Comment