Skip to content

bench: arithmetic operation micro-benchmarks#805

Open
FBumann wants to merge 2 commits into
masterfrom
bench/arithmetic-ops
Open

bench: arithmetic operation micro-benchmarks#805
FBumann wants to merge 2 commits into
masterfrom
bench/arithmetic-ops

Conversation

@FBumann

@FBumann FBumann commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

I really want to make sure the v1 convention doesnt introduce regressions to linopy. And i think as soon as we decide on what to do with the multiindex stuff, there will be room for improval. THis should lay a nie foundation for that.

Codspeed cost is really small, but coverage (and granularity) really improves.

Note

AI-assisted (Claude Code): implementation and this description; reviewed by me.

Adds an op-level benchmark tier to benchmarks/, alongside the whole-model build benchmarks. One benchmark per operation, with operands built outside the measured region so a run isolates a single arithmetic op.

Why. Whole-build benchmarks catch a regression but can't attribute it — a build says "kvl got heavier", an op benchmark says "expr+expr broadcast got heavier". (Motivated by a real regression hunt where attribution needed exactly this granularity.)

What.

  • benchmarks/ops.py — op registry (OpSpec) + a single 3-D grid size profile (dims 3×4×1000; the asymmetric shape also catches dim-order/transpose bugs) + 35 ops across scaling, var/expr arithmetic, quadratic, reductions, masking, groupby, merge, and constraint construction. Binary labelled ops carry match/broadcast variants — the alignment-path axis where the interesting regressions live.
  • benchmarks/drivers/test_ops.py — parametrized driver, one benchmark per op.
  • conftest.pytest_ops added to CODSPEED_MODULES (tracked; memory advisory).

Cost. 35 benchmarks; the memory run stays ~2–2.5 min including the model builds that dominate the job — cheap.

Signal validated. broadcast ≈ 5× match on the alignment axis (the §9 cross-product) — well above the noise floor.

Memory is report-only to start (op-scale memory can be noisy); op-time is the natural gate candidate once the signal proves stable.

An op-level tier alongside the whole-model builds: one benchmark per
(operation, size profile), operands built outside the measured region so a run
isolates a single op rather than a whole build. This attributes perf changes to
a specific arithmetic path — a build benchmark says "kvl got heavier", an op
benchmark says "expr+expr broadcast got heavier".

- benchmarks/ops.py: op registry (OpSpec) + size profiles (small 1D×2000;
  large 3D×3×4×1000 — differ in element count *and* dim count; the asymmetric
  shape also catches dim-order bugs) + ~30 ops across scaling, var/expr
  arithmetic, quadratic, reductions and constraint construction. Binary labelled
  ops carry match/broadcast variants — the alignment-path axis where the
  interesting regressions live.
- benchmarks/drivers/test_ops.py: parametrized driver, one benchmark per
  (op, profile).
- conftest: add test_ops to CODSPEED_MODULES (tracked; memory advisory).

60 benchmarks, ~80s/run with memory. Signal validates: large ≈ 6× small,
broadcast ≈ 5× match (the §9 cross-product).
@codspeed-hq

codspeed-hq Bot commented Jul 1, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

✅ 138 untouched benchmarks
🆕 35 new benchmarks
⏩ 138 skipped benchmarks1

Performance Changes

Mode Benchmark BASE HEAD Efficiency
🆕 Memory test_op[con_eq_expr] N/A 1.1 MB N/A
🆕 Memory test_op[con_le_array] N/A 480.5 KB N/A
🆕 Memory test_op[con_le_scalar] N/A 480.5 KB N/A
🆕 Memory test_op[expr_add_array_bcast] N/A 1 MB N/A
🆕 Memory test_op[expr_add_array_match] N/A 281.3 KB N/A
🆕 Memory test_op[expr_add_expr_bcast] N/A 4.8 MB N/A
🆕 Memory test_op[expr_add_expr_match] N/A 985.9 KB N/A
🆕 Memory test_op[expr_add_masked] N/A 985.9 KB N/A
🆕 Memory test_op[expr_add_scalar] N/A 187.5 KB N/A
🆕 Memory test_op[expr_add_var] N/A 1.1 MB N/A
🆕 Memory test_op[expr_fillna] N/A 468.8 KB N/A
🆕 Memory test_op[expr_groupby_sum] N/A 606.6 KB N/A
🆕 Memory test_op[expr_mul_array_bcast] N/A 1.6 MB N/A
🆕 Memory test_op[expr_mul_array_match] N/A 468.8 KB N/A
🆕 Memory test_op[expr_mul_scalar] N/A 468.8 KB N/A
🆕 Memory test_op[expr_mul_var] N/A 891.6 KB N/A
🆕 Memory test_op[expr_sub_expr_match] N/A 1.1 MB N/A
🆕 Memory test_op[expr_sum_all] N/A 212.3 KB N/A
🆕 Memory test_op[expr_sum_dim] N/A 212.3 KB N/A
🆕 Memory test_op[expr_where] N/A 294.4 KB N/A
... ... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.


Comparing bench/arithmetic-ops (a99bcb4) with master (fe798b1)

Open in CodSpeed

Footnotes

  1. 138 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Collapse to one 3-D profile (3×4×1000, ~12 K elements) — CodSpeed records time
*and* memory per benchmark, so a second size wasn't buying a separate signal;
one multi-dim profile keeps broadcast/alignment coverage with MB-scale ops above
the noise floor, and halves the matrix. Benchmark ids drop the size suffix.

Add three categories: absence/masking (expr.where / fillna / absence
propagation — §4–§7, the semantics-heavy surface), groupby.sum, and an N-way
merge (constraint-assembly cost). 35 ops, ~45 s/run with memory.
@FBumann

FBumann commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator Author

Note

The following content was generated by AI.

CodSpeed cost check. This PR adds 35 arithmetic micro-benchmarks (one GRID
profile each, dims 3×4×1000). Under CodSpeed's per-PR memory instrument the
full benchmarks/ run (build + ops) stays at ~2.5 min — no cost blow-up:

Run Instrument Duration
fork PR (ops only) memory 2m36s
fork PR #40 (build + ops) memory 2m44s

The bare-metal walltime job remains gated to master + the trigger:benchmark
label, so PRs don't incur it. Verified on the fork: fluxopt#40 ran the ops
under CodSpeed and the memory comparison worked without inflating cost.

@FBumann FBumann marked this pull request as ready for review July 1, 2026 14:37
@FBumann FBumann requested a review from FabianHofmann July 1, 2026 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant