Skip to content

[SPARK-56508][SQL][TESTS][FOLLOWUP] Stabilize VectorizedRleValuesReaderBenchmark Group runBooleanBenchmark/runIntegerBenchmark with a pre-warm pass#55497

Closed
LuciferYang wants to merge 5 commits intoapache:masterfrom
LuciferYang:SPARK-56508-prewarm
Closed

Conversation

@LuciferYang
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Add a pre-warm pass (3 iterations of fresh-reader + initFromPage + decode) before the cold-reader benchmark.addCase call in runBooleanBenchmark and runIntegerBenchmark of VectorizedRleValuesReaderBenchmark.

Why are the changes needed?

Reviewer feedback on SPARK-56522 (PR #55386) flagged first-case Best Time(ms) = 0 variance in Groups runBooleanBenchmark/runIntegerBenchmark: the first case in each group pays for tiered-compilation transitions on sub-millisecond iterations, producing inconsistent baseline numbers between re-runs.

Groups runNullableBatchBenchmark don't show this because their setup reuses a pre-warmed reader before each addCase. The cold-reader variants in Groups runBooleanBenchmark/runIntegerBenchmark instantiate a fresh reader per iteration, so the shared pre-warm (warmReader.readBooleans / warmReader.readIntegers) doesn't fully cover the allocation + initFromPage path that cold reader exercises. Running the cold-reader code path explicitly 3 times before addCase lets HotSpot settle on C2 before measurement.

Does this PR introduce any user-facing change?

No. Benchmark-only change.

How was this patch tested?

  • Pass Github Actions

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.7

LuciferYang and others added 2 commits April 22, 2026 22:51
…erBenchmark Group A/B with a pre-warm pass

### What changes were proposed in this pull request?

Add a pre-warm pass (3 iterations of fresh-reader + initFromPage + decode)
before the cold-reader `benchmark.addCase` call in `runBooleanBenchmark`
(Group A) and `runIntegerBenchmark` (Group B) of
`VectorizedRleValuesReaderBenchmark`.

### Why are the changes needed?

Reviewer feedback on SPARK-56522 (PR apache#55386) flagged first-case `Best
Time(ms) = 0` variance in Groups A/B: the first case in each group
pays for tiered-compilation transitions on sub-millisecond iterations,
producing inconsistent baseline numbers between re-runs.

Groups C/D don't show this because their setup reuses a pre-warmed
reader before each `addCase`. The cold-reader variants in Groups A/B
instantiate a fresh reader per iteration, so the shared pre-warm
(`warmReader.readBooleans` / `warmReader.readIntegers`) doesn't
fully cover the allocation + `initFromPage` path that `cold reader`
exercises. Running the cold-reader code path explicitly 3 times
before `addCase` lets HotSpot settle on C2 before measurement.

### Does this PR introduce _any_ user-facing change?

No. Benchmark-only change.

### How was this patch tested?

  - Compile: `build/sbt sql/Test/compile` clean
  - Will regenerate result files on GHA to verify reduced first-case variance

### Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.7
…uet.VectorizedRleValuesReaderBenchmark (JDK 17, Scala 2.13, split 1 of 1)
@LuciferYang LuciferYang marked this pull request as draft April 23, 2026 02:34
…uet.VectorizedRleValuesReaderBenchmark (JDK 25, Scala 2.13, split 1 of 1)
…uet.VectorizedRleValuesReaderBenchmark (JDK 21, Scala 2.13, split 1 of 1)
…uet.VectorizedRleValuesReaderBenchmark (JDK 17, Scala 2.13, split 1 of 1)
@LuciferYang LuciferYang deleted the SPARK-56508-prewarm branch April 23, 2026 03:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant