Skip to content

feat(accelerator): numba costes colocalization#62

Draft
timtreis wants to merge 4 commits into
feat/numba-colocfrom
feat/numba-coloc-costes
Draft

feat(accelerator): numba costes colocalization#62
timtreis wants to merge 4 commits into
feat/numba-colocfrom
feat/numba-coloc-costes

Conversation

@timtreis

@timtreis timtreis commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

Stacked on #60. The 5th and final colocalization feature.

Per-object port of get_correlation_costes + bisection_costes/linear_costes into core/numba/_costes.py::costes_per_object, reusing #60's grouped layout (labels_to_offsets + flatten_pairs_grouped) — no sort. All three fast_costes modes (M_FASTER bisection, M_FAST/M_ACCURATE linear), control flow bit-reproduced. The reference's dead calculate_threshold is skipped; thr accepted for parity but unused.

Pearson-on-subset matches scipy.stats.pearsonr's op order; error_model="numpy" so a constant subset → NaN (not ZeroDivisionError), matching scipy.

Exact vs numpy on float pixels (scale=1); integer-dtype diverges by design (reference overflows z = fi + si). bzyx via to_bzyx-twice. Speedup 41.8× (1080², 144 obj).

Tests: kernel control-flow vs the real reference search at scale=255 (exercises the multi-iteration path), pearson vs scipy, regression vs reference, end-to-end golden 2D/3D/batch × 3 modes. Full suite 145 passed, lint clean. Stack: #59#60 → this.

timtreis and others added 2 commits June 4, 2026 00:02
Per-object port of get_correlation_costes + bisection_costes/linear_costes into
core/numba/_costes.py::costes_per_object, reusing the #60 grouped layout
(labels_to_offsets + flatten_pairs_grouped) — no sort. All three fast_costes
modes: M_FASTER (bisection), M_FAST/M_ACCURATE (linear), control flow
bit-reproduced (window math, num_true recompute cache, > vs >= threshold
asymmetry). The reference's dead calculate_threshold call is skipped; thr is
accepted for parity but unused.

Pearson-on-subset matches scipy.stats.pearsonr's order (centre, normalise each
vector, accumulate, clamp). error_model="numpy" so a constant subset yields NaN
(not ZeroDivisionError), matching scipy's ConstantInputWarning -> nan.

Exact vs numpy on float pixels (scale=1); integer-dtype diverges by design (the
reference overflows z = fi + si in uint8/uint16). bzyx via to_bzyx-twice like the
other coloc features. Speedup 41.8x (1080^2, 144 obj, float).

Tests: kernel control-flow vs the real reference search at scale=255 (exercises
the multi-iteration path), pearson vs scipy, regression vs reference, end-to-end
golden 2D/3D/batch x 3 modes. Full suite 145 passed, lint clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Extract _flatten_image() (mask-contiguity + labels_to_offsets +
  flatten_pairs_grouped), shared by _run and the costes runner — removes the
  duplicated per-image prep chain.
- Drop the dead any_fi/any_si flags in costes_per_object: tot_* is read only
  when n_comb > 0, which already guarantees a pixel strictly above each
  threshold, so the reference's any(>thr) guard is always true there.

Behaviour-preserving; 57 coloc/costes tests green, lint clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@timtreis timtreis changed the title feat(accelerator): numba costes colocalization (all 3 modes) feat(accelerator): numba costes colocalization Jun 3, 2026
bisection_costes called _count_combt then _pearson_combt for each visited
threshold, but _pearson_combt's first pass recomputes the exact same subset
count. Fuse them into _count_pearson_combt -> (cnt, r): one count pass, and the
Pearson passes only when cnt > 2 (else r is nan and unused).

Bit-identical to the previous kernel (same predicate, same accumulation order;
verified array_equal across correlated AND anti-correlated objects, i.e. both
search directions). ~9% faster (23.0 -> 20.9 ms, 1080^2/144 obj). 31 costes
tests green. (linear/accurate modes keep _count_combt + the num_true cache.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@timtreis

timtreis commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator Author

Follow-up perf (commit 8b4ee9e): the bisection search called _count_combt then _pearson_combt for each visited threshold, but _pearson_combt's first pass recomputes the exact same subset count. Fused into _count_pearson_combt -> (cnt, r): one count pass, Pearson passes only when cnt > 2. Bit-identical (verified array_equal across correlated AND anti-correlated objects — both search directions), ~9% faster (23.0→20.9 ms, 1080²/144 obj). 31 costes tests green. (linear/accurate modes keep the separate count + num_true cache.)

…y features

The five public get_correlation_* functions each re-ran the whole fused
coloc_per_object kernel (+ flatten), so computing several coloc features
paid ~5x redundant work and the numba backend lost to merged numpy on
manders/overlap/rwc at large.

Add get_correlation_all(p1, p2, masks, features=None): one flatten + one
kernel pass returns the requested coloc groups (None = all). The cheap
block (Pearson+slope, Manders, Overlap, K) is one pass; RWC's rank sort and
the Costes kernel are gated to the requested set. Stateless — fusion happens
by requesting the set in ONE call, not via any cache. It's the efficient
entry point for any caller (not only featurize).

The five single-feature functions become thin gated wrappers over it (single
source; each now computes only its tier, so even one direct call is minimal).
The numba correlation registry KEEPS per-group keys so the featurizer's
per-group selection (_collect_correlation_features) keeps working — an
earlier single-entry registry broke featurize with KeyError 'pearson'.

large 1080^2/142obj: all-coloc 38ms (fused) vs 103ms (5 separate) vs 140ms
(numpy) = 2.7x / 3.7x. Tests: subset/gating, bit-identity vs the wrappers,
empty/batch/3D, unknown-group error, and featurize-runs-under-numba.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@timtreis timtreis force-pushed the feat/numba-coloc-costes branch from 29dbe9e to b0fe736 Compare June 6, 2026 23:28
@timtreis timtreis added the numba label Jun 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant