feat(bench): add CPU prover benchmark harness and baseline profiling by tamirhemo · Pull Request #2726 · succinctlabs/sp1

tamirhemo · 2026-04-20T18:44:52Z

Summary

Add bench/ directory with evaluation infrastructure for measuring CPU prover latency
bench/run.sh — one-command benchmark (N=3 median, leaderboard logging, optional samply profiling)
bench/fixtures/fetch.sh — S3 fixture downloader for fib/keccak/big workloads
bench/compare.py — delta table with Wilcoxon rank-sum p-values
Add --json output mode to sp1-perf (PerfSummary serialization)
Baseline profiling results in bench/hotspots.md — reveals ~30% crossbeam overhead, ~24% extension field mul, ~8.5% Poseidon2

Context

This is the first PR in a stack for CPU prover optimization (see AUTORESEARCH.md). It builds the evaluation harness needed to measure and validate the optimization experiments in subsequent PRs.

Stack

This PR — bench infra + baselines
tamir/perf-physical-cores — E1: rayon defaults to physical cores (-19.8% on big)
tamir/perf-mimalloc — E2: optional mimalloc allocator (-7.2% fib)

Test plan

bench/fixtures/fetch.sh downloads fixtures from S3
bench/run.sh fib produces a median prove_ms and appends to leaderboard
bench/compare.py prints delta table for two SHAs
sp1-perf --json emits valid JSON to stdout
Back-to-back fib runs show <2% variance

🤖 Generated with Claude Code

Add bench/ directory with evaluation infrastructure for measuring CPU prover latency. Includes run.sh (N=3 median with leaderboard logging), fixtures/fetch.sh (S3 downloader for fib/keccak/big workloads), compare.py (delta table with rank-sum p-values), and --json output mode on sp1-perf. Baseline profiling on AMD Threadripper 7970X reveals ~30% of prove time is crossbeam/rayon work-stealing overhead from SMT over-subscription, ~24% is BinomialExtensionField::mul, and ~8.5% is Poseidon2 (already AVX-512 optimized). See bench/hotspots.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-21T17:06:24Z

Test	Old	New	Diff
bn_test_bn_test_fq_inverse_100	834812	834812	0.0000 %
rust_crypto_rsa_test_pkcs_verify_100	28986711	29098191	0.3846 %
secp256k1_program_test_verify_v0_30_0_rand_lte_100	17124656	17143004	0.1071 %
bn_test_bn_test_g1_msm_edge	406721	406721	0.0000 %
k256_test_recover_rand_lte_100	4452707	4452794	0.0020 %
bls12_381_tests_test_sqrt_fp_100	955479	979778	2.5431 %
curve25519_dalek_test_ed25519_verify	13288440	13289504	0.0080 %
bls12_381_tests_test_inverse_fp_100	1416558	1416558	0.0000 %
k256_test_recover_high_hash_high_recid	2021243	1971051	-2.4832 %
bn_test_bn_test_fr_inverse_100	851812	851812	0.0000 %
bls12_381_tests_test_sqrt_fp2_100	1762854	1826037	3.5841 %
rustcrypto_bigint_test_bigint_mul_add_residue	1736512	1736514	0.0001 %
bls12_381_tests_test_bls_add_100	10502108	10502108	0.0000 %
p256_test_recover_rand_lte_100	15962610	15947840	-0.0925 %
secp256k1_program_test_recover_rand_lte_100	5496779	5498240	0.0266 %
curve25519_dalek_test_zero_mul	72086	72086	0.0000 %
k256_test_schnorr_verify	5737756	5751054	0.2318 %
curve25519_dalek_test_decompressed_noncanonical	7660	7660	0.0000 %
curve25519_dalek_test_add_then_multiply	2770084	2906156	4.9122 %
bn_test_bn_test_fq_partial_ord	184120	184120	0.0000 %
curve25519_dalek_test_decompressed_expected_value	4583403	4503210	-1.7496 %
p256_test_recover_pubkey_infinity	102285	102285	0.0000 %
secp256k1_program_test_verify_rand_lte_100	17155126	17152207	-0.0170 %
k256_test_point_ops_edge_cases	33843	33843	0.0000 %
bls12_381_tests_test_bls_double_100	6348821	6348821	0.0000 %
bn_test_bn_test_fq_sqrt_100	833212	833212	0.0000 %
bn_test_bn_test_g1_mul_zero	46120	46120	0.0000 %
k256_test_recover_pubkey_infinity	98274	98274	0.0000 %
k256_test_verify_rand_lte_100	11899083	11898872	-0.0018 %
p256_test_recover_high_hash_high_recid	5840384	5836152	-0.0725 %
bn_test_bn_test_g1_double_100	727516	727495	-0.0029 %
secp256k1_program_test_recover_v0_30_0_rand_lte_100	5494549	5491809	-0.0499 %
curve25519_dalek_ng_test_zero_msm	125560	125560	0.0000 %
sha_test_sha2_v0_10_6_expected_digest_lte_100_times	1761518	1766884	0.3046 %
bls12_381_tests_test_inverse_fp2_100	2766653	2766653	0.0000 %
curve25519_dalek_ng_test_zero_mul	108069	108069	0.0000 %
sha_test_sha2_v0_10_8_expected_digest_lte_100_times	1768972	1766989	-0.1121 %
sha_test_sha3_expected_digest_lte_100_times	1609079	1609294	0.0134 %
p256_test_verify_rand_lte_100	11895117	11878814	-0.1371 %
sha_test_sha2_v0_10_9_expected_digest_lte_100_times	1767261	1766103	-0.0655 %
bn_test_bn_test_g1_add_neg	306879	306879	0.0000 %
curve25519_dalek_ng_test_decompressed_noncanonical	195590	195590	0.0000 %
bn_test_bn_test_g1_add_100	986830	986816	-0.0014 %
rustcrypto_bigint_test_bigint_mul_mod_special	1753913	1753913	0.0000 %
curve25519_dalek_test_zero_msm	83636	83636	0.0000 %
curve25519_dalek_ng_test_add_then_multiply	3974090	3799796	-4.3858 %
sha_test_sha2_v0_9_9_expected_digest_lte_100_times	1264866	1265391	0.0415 %
keccack_test_expected_digest_lte_100	1715113	1713263	-0.1079 %

This was referenced Apr 20, 2026

perf(prover): default rayon to available_parallelism #2727

Open

perf(bench): add optional mimalloc allocator for sp1-perf (-7.2% fib) #2728

Open

perf(sumcheck): replace par_bridge with par_chunks/par_iter (-7.1% big) #2729

Open

tamirhemo force-pushed the tamir/bench-infra branch from 1ca8644 to 377d9f6 Compare April 21, 2026 16:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(bench): add CPU prover benchmark harness and baseline profiling#2726

feat(bench): add CPU prover benchmark harness and baseline profiling#2726
tamirhemo wants to merge 1 commit into
mainfrom
tamir/bench-infra

tamirhemo commented Apr 20, 2026

Uh oh!

github-actions Bot commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tamirhemo commented Apr 20, 2026

Summary

Context

Stack

Test plan

Uh oh!

github-actions Bot commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant