perf(radial): vectorize get_radial_zernikes via shared _zernike_scores (~2x) by timtreis · Pull Request #75 · afermg/cp_measure

timtreis · 2026-06-06T15:57:38Z

Stacked on #74 (perf/zernike-vectorize). Base will retarget to main once #74 merges. Review the radial-only diff.

What

get_radial_zernikes already built the Zernike basis on the masked foreground vectors (cheap), but reduced it with 2·K separate scipy.ndimage.sum_labels calls (one per moment × real/imag) and re-gathered pixels[ijv] inside every one. Profiling showed the reduction — not the basis — dominated.

This delegates the intensity-weighted moment sums to the shared cp_measure.utils._zernike_scores (added in #74 for get_zernike) with the pixel image as the per-pixel weight. The helper keeps the basis on the foreground vectors and segment-sums each moment by label with a single numpy.bincount, collapsing the whole reduction into one vectorised pass. The caller then normalises by each object's pixel count (radial Zernikes divide by pixel count, not the enclosing-circle area) and forms magnitude/phase.

No new dependencies. This cashes in the weight= reuse hook deliberately left on _zernike_scores.

Performance

tier	foreground	before	after	speedup
1080² medium	15.6%	315 ms	119 ms	2.6×
1080² sparse	4.1%	74 ms	42 ms	1.8×
2160² sparse	2.5%	270 ms	149 ms	1.8×
256² dense	43%	33 ms	17 ms	1.9×

Divergence vs the centrosome path: ~1e-14 (near bit-exact; the residual is summation order).

Note: this is a ~2× reduction win, not the 8× of the get_zernike PR. Unlike plain zernike, radial_zernikes never had the full-image (H,W,K) scatter — its basis was already on the masked vectors — so the win comes only from the reduction step.

Bonus: fixes a latent crash

The previous ij[label - 1] indexing assumed labels were 1..n and raised IndexError on non-contiguous label sets (e.g. {1, 3, 7}). The shared helper maps each label to its own row, so non-contiguous labels now work — covered by a regression test.

Reuse primitives.segment.label_to_idx_lut for the label->row map (correct sizing, find_objects-based) instead of a hand-rolled reverse map keyed on masks.max(); derive labels internally so get_zernike no longer needs its own unique() pass. Single foreground gather, skip the identically-zero imaginary segment-sum for m==0 moments, and precompute the azimuthal powers once. Return (real_sums, imag_sums, radii, counts): radii feeds get_zernike's pi*r**2 normalisation, counts the intensity-weighted radial Zernikes (PR #75), which reuse this via the restored `weight` arg. Add weighted + count golden tests vs centrosome so no path ships untested. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…s (~2x) `get_radial_zernikes` built the Zernike basis on the masked foreground vectors (already cheap) but then reduced it with 2*K separate `scipy.ndimage.sum_labels` calls — one per moment, for real and imaginary parts — and re-gathered `pixels[ijv]` inside every one of those calls. The reduction, not the basis, dominated the runtime. Delegate the intensity-weighted moment sums to the shared `cp_measure.utils._zernike_scores` (introduced for `get_zernike`) with the pixel image as the per-pixel `weight`. It keeps the basis on the foreground vectors and segment-sums each moment by label with `numpy.bincount`, so the whole reduction is a single vectorised pass. The caller then normalises by each object's pixel count (radial Zernikes divide by pixel count, not the enclosing-circle area) and forms magnitude/phase. Measured: ~1.8-2.6x (315->119 ms on a 1080^2 tier), divergence ~1e-14 vs the centrosome path (near bit-exact). No new deps. Also fixes a latent bug: the previous `ij[label - 1]` indexing assumed labels were 1..n and raised IndexError on non-contiguous label sets (e.g. {1, 3, 7}). The shared helper maps each label to its own row, so non-contiguous labels now work. Adds golden + edge tests (empty, single-pixel r=0, non-contiguous labels, edge-touching, non-default degree, 3D) matching a direct centrosome reference. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The intensity-weighted Zernike scatter indexes the pixel image at the foreground mask, so it requires pixels.shape == labels.shape. Reject a mismatch with a clear ValueError instead of an opaque boolean-index IndexError. (The pre-vectorization code's silent out-of-bounds clip is intentionally not carried forward — co-shaped is the contract every other cp_measure measurement uses.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

`python -m cp_measure._bench.targets --base <ref> --head <ref>` resolves a PR diff to exactly the measurement functions it changes, for the benchmark action. Resolution is SYMBOL-level, not file-level: it builds a static symbol-reference graph over the package (AST, resolving intra-package imports incl. submodule and relative imports) and selects a feature iff its call graph transitively reaches a changed symbol. So a shared-helper edit (e.g. utils._zernike_scores) selects only the features that actually use it — verified on the real PRs: #74 -> {zernike}, #75 -> {radial_zernikes}, where file-closure would have over-selected the ~6 features whose modules merely import utils. - Rooted at an explicit entry-point table (the get_* registry) so bulk.py's lazy numba/multimask imports can't cause an entry-point to be missed; a test cross-checks the table against the live registries by function identity. - Reads everything from git refs (git show), diffing against the merge-base, so it matches CI and is correct for stacked PRs given the PR's real base. - Three distinct states: benchmarked / skipped-unsupported (multimask, numba) / empty — a multimask-only PR is never mistaken for "no measurement change". - Tolerates the get_ferret->get_feret cross-branch rename via name candidates. Hermetic tests build a throwaway git repo + mini package to prove symbol-level precision and the three states; a guarded test checks the real #74/#75 refs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(synth): deterministic synthetic cell-image generator for benchmarking Add `cp_measure.synth.generate(image_size, n_objects, n_channels, seed)` — the shipped, importable generator for the PR-benchmark action (build step 1). Produces a cell-like contiguous label mask (organic star-shaped cells placed by gap-respecting dart-throwing, log-normal sizes, no degenerate ~1px objects) plus intensity channels built from a shared smooth envelope + shared/independent multi-scale Gaussian splats, so area, intensity, texture AND colocalisation features all carry real signal. Output is a pure function of the inputs (version stamped via `__version__`); placement is capacity-checked and raises loudly rather than silently under-placing. test/test_synth.py replaces the design's "eyeball the examples" gate with programmatic acceptance asserts at the matrix corners (min-size×max-count, max-size×min-count): determinism, contiguous exact count, no degenerate objects, shape/texture/intensity signal, and a controlled sub-unity channel correlation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(synth): review hardening — single cell-extent, sturdier tests, determinism Apply the "fix now" set from the max-effort review of the generator (no behavioural bugs were found; these harden maintainability, the test net, and cross-version reproducibility): - Extract `_cell_extent(base_r, amps)` as the single definition of a cell's radial reach, used by both the packing radius (worst-case amps) and the rasterisation window (actual amps). Removes the reach-vs-bulge drift risk that could silently break the no-overlap guarantee if one formula were edited. - Strengthen the two toothless tests: texture now asserts median per-object std is well ABOVE the read-noise floor (a splat-removed regression collapses to ~noise and fails); organic-shape now asserts a boundary radial-roughness CV that plain disks fail (the old solidity<0.99 passed for pixelated disks). Both verified to fail on their intended regressions. - Determinism: stable sort for tied radii; replace rng.choice(p=...) with inverse-CDF sampling on rng.random (version-stable draw count) so two separately-installed envs can't diverge. Bump __version__ 0.1.0 -> 0.2.0. - Widen the brittle seed-averaged correlation band (0.4-0.7 -> 0.35-0.8) so a legitimate constant re-tune doesn't flip it. - Per decisions: keep realistic PSF splat bleed; drop the unimplemented "clusters" docstring claim. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(bench): symbol-level PR target mapper (build step 2) `python -m cp_measure._bench.targets --base <ref> --head <ref>` resolves a PR diff to exactly the measurement functions it changes, for the benchmark action. Resolution is SYMBOL-level, not file-level: it builds a static symbol-reference graph over the package (AST, resolving intra-package imports incl. submodule and relative imports) and selects a feature iff its call graph transitively reaches a changed symbol. So a shared-helper edit (e.g. utils._zernike_scores) selects only the features that actually use it — verified on the real PRs: #74 -> {zernike}, #75 -> {radial_zernikes}, where file-closure would have over-selected the ~6 features whose modules merely import utils. - Rooted at an explicit entry-point table (the get_* registry) so bulk.py's lazy numba/multimask imports can't cause an entry-point to be missed; a test cross-checks the table against the live registries by function identity. - Reads everything from git refs (git show), diffing against the merge-base, so it matches CI and is correct for stacked PRs given the PR's real base. - Three distinct states: benchmarked / skipped-unsupported (multimask, numba) / empty — a multimask-only PR is never mistaken for "no measurement change". - Tolerates the get_ferret->get_feret cross-branch rename via name candidates. Hermetic tests build a throwaway git repo + mini package to prove symbol-level precision and the three states; a guarded test checks the real #74/#75 refs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * revert(bench): drop the change-detection mapper; benchmark all exposed functions Per design decision: the benchmark compares at the main exposed-function level — run every public get_* feature base-vs-head and let the speedup table show what changed (~1.0x = untouched). This removes the static AST symbol-graph mapper (build step 2) entirely, along with its edge-case surface; benchmark cost is controlled by the matrix size / per-function budget instead of pre-selection. - Remove src/cp_measure/_bench/targets.py and test/test_targets.py (keep the _bench package for the upcoming runner). - Remove accidentally-committed __pycache__/*.pyc and add a .gitignore for Python bytecode (the repo had none). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(bench): fixture/runner/comparator — benchmark all get_* head-vs-main Build step 2 (v3): the benchmark core, three composable pieces. - fixtures.py: build the (image_size x object_count x seed) matrix once from the pinned synth generator, serialise to .npz with a manifest + per-array sha256 (stamps synth.__version__). Both envs load identical, checksum-verified inputs. - run.py: `python -m cp_measure._bench.run` times EVERY public get_* function (core arity-1, correlation arity-2, plus a [legacy] variant where a `legacy` param exists) over the fixtures in one environment -> JSON. Channels normalised to [0,1] (the pipeline convention; get_texture requires it). Per-call warmup + reps (min), SIGALRM per-call timeout, thread-pinning set before numpy import. Functions enumerated from the live registry at HEAD; a function that errors on synth input is recorded, not fatal. - compare.py: `python -m cp_measure._bench.compare` diffs two run JSONs into a speedup table. speedup = main/head (>1 faster); per cell takes the min then the median across seeds; classifies faster/slower/within-noise/new/removed/no-data. Untouched functions land at ~1.0x — the "what changed" signal, no mapper needed. Validated end-to-end on a smoke matrix (all 12 functions time ok incl. texture; self-compare is 1.00x). The two-worktree/two-env orchestration is step 3 (workflow). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(bench): two-job benchmark workflow + orchestration driver (build step 3) Wires fixtures -> run(head) + run(main) -> compare -> sticky PR comment. - .github/workflows/benchmark.yml: triggered by the `benchmark` label (labels need write access, so the trigger is maintainer-gated) or workflow_dispatch. Two-job split: `build` runs untrusted PR code with `permissions: {}` (no token to steal, persist-credentials off); `report` holds pull-requests/issues:write but never checks out PR code — it only renders the artifact into a sticky `` comment and removes the label. fetch-depth: 0 so `main` is present; concurrency cancels superseded runs. - .github/scripts/run_benchmark.sh: installs head + main in two isolated uv envs, VENDORS head's synth.py + _bench/ into the main worktree so the generator and tooling are identical across both runs (only cp_measure.core.* differs), builds the fixtures once, runs both, compares. - fixtures.py: add CI_MATRIX (bounded for hosted-runner limits, the workflow default; full DEFAULT via dispatch) + a `python -m cp_measure._bench.fixtures` build CLI. Validated locally: script bash-syntax, YAML structure (tokenless build, gated report), fixtures CLI, full test suite. End-to-end CI run is via workflow_dispatch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(bench): review cleanup — leaner comments/docstrings + small fixes Elegance/LOC pass over the benchmark PR (net ~-50 lines, mostly verbose docstrings) plus the real findings from the review: Fixes: - workflow: `gh api --paginate | head` SIGPIPE under pipefail could abort the comment post on PRs with many comments — use a single `?per_page=100` page + `--jq 'first'` instead. Add `if: always()` to upload-artifact so a failed run still surfaces partial output. Drop the redundant matrix default + useless cat. - run.py: build call-args INSIDE the guarded path so an input a function can't handle (e.g. a 1-channel fixture) is recorded per-cell, not fatal. Record the matrix + fixture count in meta so the comment shows which sweep ran; note the shared-fn JIT caveat for [legacy] variants. - compare.py: label the status column (was a blank header); guard head_t==0; surface the matrix scope in the header. - run_benchmark.sh: trap-based cleanup of the temp dir/worktree/venvs (was leaked). - .gitignore: ignore local benchmark artifacts (bench-out/, *.npz). Cleanup: trim the synth/bench/test module docstrings and synth's per-constant comments to their load-bearing facts; collapse generate()'s numpydoc block; drop the unused load_fixture(verify=...) flag; de-clever _norm01's constant-image path. Kept the _cell_extent single-source helper (an earlier review's no-overlap fix). 31 tests pass, ruff clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(bench): trim to a lean regression set Consolidate the over-exhaustive acceptance tests (356 -> 110 lines, 7 tests): - synth: one invariants test (shape/dtype/contiguous count, no degenerate objects, shape+size variety, intensity/texture/coloc signal) + determinism + edges, all at a single representative config instead of parametrising every check over both matrix corners. Drop the radial-roughness disk-vs-organic discriminator (eccentricity spread + size variety still catch a broken gen). - bench: merge the fixture build/load/determinism cases, fold enumerate into the run integration test, and collapse the compare classification/render cases into one. Drop the trivial _norm01 and standalone CLI tests. * style: ruff-format with current ruff (88-col line wraps) * chore: stop tracking scratch tasks/ and .claude/ (added by mistake) * refactor(bench): report raw main/head timings, drop noise-band classification Per single-function resolution: show each function's main vs head time as mean (min-max) over reps x seeds plus the raw main/head ratio, and let the maintainer read their function directly. Removes the faster/slower/within-noise band (which a noisy/sequential run could mislabel) and any normalisation; run.py now stores just the rep times. * Revert "Merge pull request #80 from afermg/feat/synth-bench-generator" This reverts commit 3809218, reversing changes made to 7f67606. * feat(bench): synthetic-image PR performance benchmark action Single revertable unit; re-introduces the benchmark mechanic (reverted #80) with the harness-source fix folded in. - cp_measure/synth.py: deterministic synthetic cell-image generator. - cp_measure/_bench/{fixtures,run,compare}.py: build the (size x count x seed) fixture matrix, time every get_* head-vs-main, report raw mean (min-max) timings. - .github/workflows/benchmark.yml + scripts/run_benchmark.sh: label-triggered two-job workflow. The harness is checked out from main (not the PR head, which a perf PR does not carry); the PR head is fetched as a worktree, main's synth.py + _bench/ vendored in, and only cp_measure.core differs between the timed runs. * Revert "feat(bench): synthetic-image PR performance benchmark action" * demo: self-contained PR benchmark action (simplified) Everything lives on this branch (nothing on main). on: pull_request runs the workflow from the PR branch on every commit, times every public get_* on the PR head vs main, and posts a sticky comment with the timing table. - synth.py: minimal generator — n ellipses on a regular grid + a few random Gaussian blobs per channel. - _bench/{fixtures,run,compare}.py: build fixtures, time all get_* head-vs-main, raw timings table. - .github: single-job pull_request workflow (no label, no pull_request_target) + head-based driver that vendors the tooling into a main worktree. - includes the granularity speedup (#76) so the demo table shows a real delta. * demo: move the whole benchmark into .github/scripts (no package module) Remove src/cp_measure/{synth.py,_bench/} and their tests. Everything now lives in .github/scripts/benchmark.py — a single self-contained script (generator + runner + comparator); each env regenerates the same seeded inputs, so nothing is shared or vendored. Table now references the commit and emits one grid per affected function (speedup >= 1.1x) with image size as rows and object count as columns. * demo: extend benchmark matrix to 4 sizes x 2 counts (256–2048) Grid now spans image sizes 256/512/1024/2048 (rows) x object counts 16/64 (cols); bump the job timeout to 45m for the larger sizes. * demo: median per cell, 3 seeds x 3 counts, dynamic affected-threshold caption - per-cell aggregate is now the median (over seeds x reps); speedup = median/median - matrix: sizes 256-2048 (rows) x counts 16/64/256 (cols) x 3 seeds = 36 cells - caption derives the cutoff from AFFECTED (≥1.1x) instead of hardcoding >1 - job timeout 60m for the larger matrix * demo: drop 256px image size (unrealistically small) Sizes now 512/1024/2048 x counts 16/64/256 x 3 seeds = 27 cells (3x3 grid). * demo: shift matrix down to 256-1024 (drop slow 2048) Sizes 256/512/1024 x counts 16/64/256 x 3 seeds — 2048 was too slow per commit. * demo: report regressions too — flag functions that moved >=1.05x either way Was speedup>=1.1x only (regression-blind: a slowdown reported 'no change'). Now a function is shown if any cell is >=1.05x faster OR <=1/1.05x slower; header notes >1 faster / <1 slower. * demo: slim benchmark.py — drop unused bits - remove the n_channels param (always 2: ch0 for core, ch0+ch1 for coloc) - drop 'from __future__ import annotations' (unneeded on the 3.12 runner) - .gitignore: drop *.npz (no fixture files are written anymore) * revert(granunlarity): it has an independent PR, was used as test --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Alán F. Muñoz <afer.mg@gmail.com>

timtreis force-pushed the perf/radial-zernike-vectorize branch from 6d79c10 to 3082e2c Compare June 6, 2026 16:12

timtreis mentioned this pull request Jun 6, 2026

perf(sizeshape): vectorize get_zernike on foreground pixels (~8x) #74

Open

timtreis added the numpy label Jun 9, 2026

timtreis force-pushed the perf/radial-zernike-vectorize branch from 3082e2c to b4a8d5f Compare June 9, 2026 19:30

timtreis and others added 2 commits June 10, 2026 00:58

timtreis force-pushed the perf/radial-zernike-vectorize branch from b4a8d5f to ee7ed1e Compare June 9, 2026 23:01

timtreis mentioned this pull request Jun 18, 2026

feat(sanitize): central non-contiguous label-ID sanitation #89

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(radial): vectorize get_radial_zernikes via shared _zernike_scores (~2x)#75

perf(radial): vectorize get_radial_zernikes via shared _zernike_scores (~2x)#75
timtreis wants to merge 2 commits into
perf/zernike-vectorizefrom
perf/radial-zernike-vectorize

timtreis commented Jun 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

timtreis commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Performance

Bonus: fixes a latent crash

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

timtreis commented Jun 6, 2026 •

edited

Loading