[SPARK-56508][SQL][TESTS][FOLLOWUP] Stabilize VectorizedRleValuesReaderBenchmark Group `runBooleanBenchmark/runIntegerBenchmark` with a pre-warm pass by LuciferYang · Pull Request #55497 · apache/spark

LuciferYang · 2026-04-23T02:34:26Z

What changes were proposed in this pull request?

Add a pre-warm pass (3 iterations of fresh-reader + initFromPage + decode) before the cold-reader benchmark.addCase call in runBooleanBenchmark and runIntegerBenchmark of VectorizedRleValuesReaderBenchmark.

Why are the changes needed?

Reviewer feedback on SPARK-56522 (PR #55386) flagged first-case Best Time(ms) = 0 variance in Groups runBooleanBenchmark/runIntegerBenchmark: the first case in each group pays for tiered-compilation transitions on sub-millisecond iterations, producing inconsistent baseline numbers between re-runs.

Groups runNullableBatchBenchmark don't show this because their setup reuses a pre-warmed reader before each addCase. The cold-reader variants in Groups runBooleanBenchmark/runIntegerBenchmark instantiate a fresh reader per iteration, so the shared pre-warm (warmReader.readBooleans / warmReader.readIntegers) doesn't fully cover the allocation + initFromPage path that cold reader exercises. Running the cold-reader code path explicitly 3 times before addCase lets HotSpot settle on C2 before measurement.

Does this PR introduce any user-facing change?

No. Benchmark-only change.

How was this patch tested?

Pass Github Actions

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.7

…erBenchmark Group A/B with a pre-warm pass ### What changes were proposed in this pull request? Add a pre-warm pass (3 iterations of fresh-reader + initFromPage + decode) before the cold-reader `benchmark.addCase` call in `runBooleanBenchmark` (Group A) and `runIntegerBenchmark` (Group B) of `VectorizedRleValuesReaderBenchmark`. ### Why are the changes needed? Reviewer feedback on SPARK-56522 (PR apache#55386) flagged first-case `Best Time(ms) = 0` variance in Groups A/B: the first case in each group pays for tiered-compilation transitions on sub-millisecond iterations, producing inconsistent baseline numbers between re-runs. Groups C/D don't show this because their setup reuses a pre-warmed reader before each `addCase`. The cold-reader variants in Groups A/B instantiate a fresh reader per iteration, so the shared pre-warm (`warmReader.readBooleans` / `warmReader.readIntegers`) doesn't fully cover the allocation + `initFromPage` path that `cold reader` exercises. Running the cold-reader code path explicitly 3 times before `addCase` lets HotSpot settle on C2 before measurement. ### Does this PR introduce _any_ user-facing change? No. Benchmark-only change. ### How was this patch tested? - Compile: `build/sbt sql/Test/compile` clean - Will regenerate result files on GHA to verify reduced first-case variance ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Opus 4.7

…uet.VectorizedRleValuesReaderBenchmark (JDK 17, Scala 2.13, split 1 of 1)

…uet.VectorizedRleValuesReaderBenchmark (JDK 25, Scala 2.13, split 1 of 1)

…uet.VectorizedRleValuesReaderBenchmark (JDK 21, Scala 2.13, split 1 of 1)

…uet.VectorizedRleValuesReaderBenchmark (JDK 17, Scala 2.13, split 1 of 1)

LuciferYang and others added 2 commits April 22, 2026 22:51

Benchmark results for org.apache.spark.sql.execution.datasources.parq…

0589994

…uet.VectorizedRleValuesReaderBenchmark (JDK 17, Scala 2.13, split 1 of 1)

LuciferYang marked this pull request as draft April 23, 2026 02:34

LuciferYang added 3 commits April 23, 2026 02:55

Benchmark results for org.apache.spark.sql.execution.datasources.parq…

8dce8a0

…uet.VectorizedRleValuesReaderBenchmark (JDK 25, Scala 2.13, split 1 of 1)

Benchmark results for org.apache.spark.sql.execution.datasources.parq…

93ddd72

…uet.VectorizedRleValuesReaderBenchmark (JDK 21, Scala 2.13, split 1 of 1)

Benchmark results for org.apache.spark.sql.execution.datasources.parq…

d16e8c2

…uet.VectorizedRleValuesReaderBenchmark (JDK 17, Scala 2.13, split 1 of 1)

LuciferYang closed this Apr 23, 2026

LuciferYang deleted the SPARK-56508-prewarm branch April 23, 2026 03:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56508][SQL][TESTS][FOLLOWUP] Stabilize VectorizedRleValuesReaderBenchmark Group `runBooleanBenchmark/runIntegerBenchmark` with a pre-warm pass#55497

[SPARK-56508][SQL][TESTS][FOLLOWUP] Stabilize VectorizedRleValuesReaderBenchmark Group `runBooleanBenchmark/runIntegerBenchmark` with a pre-warm pass#55497
LuciferYang wants to merge 5 commits intoapache:masterfrom
LuciferYang:SPARK-56508-prewarm

LuciferYang commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LuciferYang commented Apr 23, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant