Skip to content

feat(bench): add CPU prover benchmark harness and baseline profiling#2726

Open
tamirhemo wants to merge 1 commit into
mainfrom
tamir/bench-infra
Open

feat(bench): add CPU prover benchmark harness and baseline profiling#2726
tamirhemo wants to merge 1 commit into
mainfrom
tamir/bench-infra

Conversation

@tamirhemo
Copy link
Copy Markdown
Contributor

Summary

  • Add bench/ directory with evaluation infrastructure for measuring CPU prover latency
  • bench/run.sh — one-command benchmark (N=3 median, leaderboard logging, optional samply profiling)
  • bench/fixtures/fetch.sh — S3 fixture downloader for fib/keccak/big workloads
  • bench/compare.py — delta table with Wilcoxon rank-sum p-values
  • Add --json output mode to sp1-perf (PerfSummary serialization)
  • Baseline profiling results in bench/hotspots.md — reveals ~30% crossbeam overhead, ~24% extension field mul, ~8.5% Poseidon2

Context

This is the first PR in a stack for CPU prover optimization (see AUTORESEARCH.md). It builds the evaluation harness needed to measure and validate the optimization experiments in subsequent PRs.

Stack

  1. This PR — bench infra + baselines
  2. tamir/perf-physical-cores — E1: rayon defaults to physical cores (-19.8% on big)
  3. tamir/perf-mimalloc — E2: optional mimalloc allocator (-7.2% fib)

Test plan

  • bench/fixtures/fetch.sh downloads fixtures from S3
  • bench/run.sh fib produces a median prove_ms and appends to leaderboard
  • bench/compare.py prints delta table for two SHAs
  • sp1-perf --json emits valid JSON to stdout
  • Back-to-back fib runs show <2% variance

🤖 Generated with Claude Code

Add bench/ directory with evaluation infrastructure for measuring CPU
prover latency. Includes run.sh (N=3 median with leaderboard logging),
fixtures/fetch.sh (S3 downloader for fib/keccak/big workloads),
compare.py (delta table with rank-sum p-values), and --json output
mode on sp1-perf.

Baseline profiling on AMD Threadripper 7970X reveals ~30% of prove
time is crossbeam/rayon work-stealing overhead from SMT
over-subscription, ~24% is BinomialExtensionField::mul, and ~8.5% is
Poseidon2 (already AVX-512 optimized). See bench/hotspots.md.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Test Old New Diff
bn_test_bn_test_fq_inverse_100 834812 834812 0.0000 %
rust_crypto_rsa_test_pkcs_verify_100 28986711 29098191 0.3846 %
secp256k1_program_test_verify_v0_30_0_rand_lte_100 17124656 17143004 0.1071 %
bn_test_bn_test_g1_msm_edge 406721 406721 0.0000 %
k256_test_recover_rand_lte_100 4452707 4452794 0.0020 %
bls12_381_tests_test_sqrt_fp_100 955479 979778 2.5431 %
curve25519_dalek_test_ed25519_verify 13288440 13289504 0.0080 %
bls12_381_tests_test_inverse_fp_100 1416558 1416558 0.0000 %
k256_test_recover_high_hash_high_recid 2021243 1971051 -2.4832 %
bn_test_bn_test_fr_inverse_100 851812 851812 0.0000 %
bls12_381_tests_test_sqrt_fp2_100 1762854 1826037 3.5841 %
rustcrypto_bigint_test_bigint_mul_add_residue 1736512 1736514 0.0001 %
bls12_381_tests_test_bls_add_100 10502108 10502108 0.0000 %
p256_test_recover_rand_lte_100 15962610 15947840 -0.0925 %
secp256k1_program_test_recover_rand_lte_100 5496779 5498240 0.0266 %
curve25519_dalek_test_zero_mul 72086 72086 0.0000 %
k256_test_schnorr_verify 5737756 5751054 0.2318 %
curve25519_dalek_test_decompressed_noncanonical 7660 7660 0.0000 %
curve25519_dalek_test_add_then_multiply 2770084 2906156 4.9122 %
bn_test_bn_test_fq_partial_ord 184120 184120 0.0000 %
curve25519_dalek_test_decompressed_expected_value 4583403 4503210 -1.7496 %
p256_test_recover_pubkey_infinity 102285 102285 0.0000 %
secp256k1_program_test_verify_rand_lte_100 17155126 17152207 -0.0170 %
k256_test_point_ops_edge_cases 33843 33843 0.0000 %
bls12_381_tests_test_bls_double_100 6348821 6348821 0.0000 %
bn_test_bn_test_fq_sqrt_100 833212 833212 0.0000 %
bn_test_bn_test_g1_mul_zero 46120 46120 0.0000 %
k256_test_recover_pubkey_infinity 98274 98274 0.0000 %
k256_test_verify_rand_lte_100 11899083 11898872 -0.0018 %
p256_test_recover_high_hash_high_recid 5840384 5836152 -0.0725 %
bn_test_bn_test_g1_double_100 727516 727495 -0.0029 %
secp256k1_program_test_recover_v0_30_0_rand_lte_100 5494549 5491809 -0.0499 %
curve25519_dalek_ng_test_zero_msm 125560 125560 0.0000 %
sha_test_sha2_v0_10_6_expected_digest_lte_100_times 1761518 1766884 0.3046 %
bls12_381_tests_test_inverse_fp2_100 2766653 2766653 0.0000 %
curve25519_dalek_ng_test_zero_mul 108069 108069 0.0000 %
sha_test_sha2_v0_10_8_expected_digest_lte_100_times 1768972 1766989 -0.1121 %
sha_test_sha3_expected_digest_lte_100_times 1609079 1609294 0.0134 %
p256_test_verify_rand_lte_100 11895117 11878814 -0.1371 %
sha_test_sha2_v0_10_9_expected_digest_lte_100_times 1767261 1766103 -0.0655 %
bn_test_bn_test_g1_add_neg 306879 306879 0.0000 %
curve25519_dalek_ng_test_decompressed_noncanonical 195590 195590 0.0000 %
bn_test_bn_test_g1_add_100 986830 986816 -0.0014 %
rustcrypto_bigint_test_bigint_mul_mod_special 1753913 1753913 0.0000 %
curve25519_dalek_test_zero_msm 83636 83636 0.0000 %
curve25519_dalek_ng_test_add_then_multiply 3974090 3799796 -4.3858 %
sha_test_sha2_v0_9_9_expected_digest_lte_100_times 1264866 1265391 0.0415 %
keccack_test_expected_digest_lte_100 1715113 1713263 -0.1079 %

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant