Skip to content

feat(accelerator): numba radial_distribution (+ Issue #22 fix)#63

Draft
timtreis wants to merge 3 commits into
feat/numba-zernikefrom
feat/numba-radial-distribution
Draft

feat(accelerator): numba radial_distribution (+ Issue #22 fix)#63
timtreis wants to merge 3 commits into
feat/numba-zernikefrom
feat/numba-radial-distribution

Conversation

@timtreis

@timtreis timtreis commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

Stacked on #58 (the numba measureobjectintensitydistribution.py module lives there). Completes the core radial module.

Per-object cropped processing (each object on its bbox + 1px pad) replaces the whole-image centrosome geometry — this fixes Issue #22 (per-object results no longer depend on other labels in the field) and runs the geometry on small arrays.

  • core/numba/_radial.py:
    • geodesic_chamfer_fifo — 1/√2 chamfer shortest-path from the centre seed (FIFO Bellman-Ford). Bit-exact to centrosome propagate(zeros, seed, mask, 1) (shortest-path is algorithm-independent), ~35× faster — replaces the 80% bottleneck.
    • radial_reduce — per-(object,bin) histograms + 8-wedge RadialCV, replacing the repeated scipy.sparse.coo_matrix builds + numpy.ma CV.
  • Exact-Euclidean distance_transform_edt and the centre (maximum_position_of_labels) stay host-side (scipy/centrosome).
  • get_radial_distribution: to_bzyx, 2D-only (Z>1 → {}), single/batch.

⚠️ Intentional divergence (the one lane that doesn't bit-match main): diverges from the current numpy baseline on multi-object fields BY DESIGN — it equals the baseline run on each object in isolation (the #22 fix). The only other difference is on objects with a symmetric centre plateau, where the reference's scipy.ndimage.maximum_position tie-break is field/layout-dependent; the per-object crop picks a different (equally valid) centre — numba's is deterministic and field-independent. @afermg raised #22 upstream to scipy — propagate is here replaced by a bit-exact numba kernel, so no behaviour change beyond the #22 fix.

~21× end-to-end (1080², 144 obj; propagate 735→~12 ms). Tests: geodesic vs propagate (shape battery, bit-exact), radial_reduce vs numpy, golden numba(multi)[k] == numpy(isolated k), #22 independence, 2D-only/empty/batch. Full suite 118 passed, lint clean. Stack: #59#58 → this.

timtreis and others added 2 commits June 4, 2026 01:30
Per-object cropped processing replaces the whole-image centrosome geometry:
each object is handled on its own bbox + 1px-pad sub-image, which FIXES Issue #22
(per-object results no longer depend on other labels in the field) and runs the
geometry on small arrays.

- core/numba/_radial.py:
  - geodesic_chamfer_fifo: 1/sqrt(2) chamfer shortest-path from the centre seed
    (FIFO Bellman-Ford). BIT-EXACT to centrosome propagate(zeros, seed, mask, 1)
    (shortest-path is algorithm-independent), ~35x faster — replaces the 80% cost.
  - radial_reduce: per-(object,bin) intensity/count histograms + per-(object,bin,
    8-wedge) RadialCV, replacing the repeated scipy.sparse.coo_matrix builds and
    numpy.ma masked CV. error_model="numpy" so empty-bin 0/0 -> nan as in the ref.
- Exact-Euclidean distance_transform_edt (d_to_edge) and the centre
  (maximum_position_of_labels) stay host-side (scipy/centrosome).
- get_radial_distribution wrapper: to_bzyx, 2D-only (Z>1 -> {}), single/batch.

Diverges from the current numpy baseline on multi-object fields BY DESIGN (the #22
fix); equals the baseline run on each object in ISOLATION. The only other
divergence is on objects with a symmetric centre plateau, where the reference's
scipy.ndimage.maximum_position tie-break is field/layout-dependent and a per-object
crop picks a different (equally valid) centre — numba's is deterministic and
field-independent. ~9.5x end-to-end (1080^2, 144 obj); geodesic alone 35x.

Tests: geodesic vs propagate (shape battery, bit-exact), radial_reduce vs numpy,
golden numba(multi)[k] == numpy(isolated k) on unique-centre objects, #22
independence, 2D-only/empty/batch. Full suite 118 passed, lint clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The chamfer geodesic was already 35x, but end-to-end was Amdahl-capped at 9.5x by
the per-object host glue and the 144 scipy.ndimage.maximum_position calls (a
profile showed: host numpy 33%, maxpos 25%, EDT 19%, geodesic only 9%).

Fold the centre (argmax of d_to_edge, first-max C-order — deterministic and
field-independent), the geodesic, the per-bin histograms and the 8-wedge RadialCV
into a single radial_object() kernel per crop. The host loop now does only the
scipy EDT (exact Euclidean, kept) + the kernel call — no per-object mgrid /
np.where / maximum_position / concatenate. Replaces radial_reduce.

1004ms -> 48ms (21.0x, was 9.5x); scipy EDT (~23ms, correctly not reimplemented)
is now the floor. Exactness unchanged: golden numba(multi)[k] == numpy(isolated k)
on unique-centre objects, #22 independence, kernels all green; full suite 118.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@timtreis

timtreis commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator Author

Follow-up perf (commit 89f79e9): the shipped 9.5× was Amdahl-capped — a profile showed the numba geodesic is only ~9% of the function, while the per-object host glue (33%) and 144 scipy.ndimage.maximum_position calls (25%) dominated. Folded the centre (argmax d_to_edge), geodesic, histograms and 8-wedge CV into one radial_object kernel per crop (no per-object mgrid/np.where/maximum_position/concatenate).

9.5× → 21.0× (1004 → 48 ms; scipy exact-Euclidean EDT ~23ms is now the floor, correctly not reimplemented). Exactness unchanged — golden numba(multi)[k] == numpy(isolated k), #22 independence, kernels all green; full suite 118.

- Fix stale docstrings: _radial.py referenced the removed radial_reduce and
  "maximum_position_of_labels stay host-side"; the centre is now computed in the
  fused radial_object kernel. Updated both module docstrings.
- Dedup the result-key generation: the (MF_*, OF_*) triple + bin loop appeared 3x
  (in _radial_keys, the cols alias dict, and the assembly loop). Replaced with a
  single _radial_features() generator yielding (col, name, bin), used by both the
  n==0 empty-dict path and the main assembly; dropped the 6-key->3-array cols dict.
- Drop redundant numpy.ascontiguousarray on numpy.pad output (pad is already
  C-contiguous); document the assumption.
- Document the contiguous-labels (1..n) assumption for the object count.

Behaviour-preserving; 20 radial tests + dispatch green, lint clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant