feat(accelerator): numba radial_distribution (+ Issue #22 fix)#63
Draft
timtreis wants to merge 3 commits into
Draft
feat(accelerator): numba radial_distribution (+ Issue #22 fix)#63timtreis wants to merge 3 commits into
timtreis wants to merge 3 commits into
Conversation
Per-object cropped processing replaces the whole-image centrosome geometry: each object is handled on its own bbox + 1px-pad sub-image, which FIXES Issue #22 (per-object results no longer depend on other labels in the field) and runs the geometry on small arrays. - core/numba/_radial.py: - geodesic_chamfer_fifo: 1/sqrt(2) chamfer shortest-path from the centre seed (FIFO Bellman-Ford). BIT-EXACT to centrosome propagate(zeros, seed, mask, 1) (shortest-path is algorithm-independent), ~35x faster — replaces the 80% cost. - radial_reduce: per-(object,bin) intensity/count histograms + per-(object,bin, 8-wedge) RadialCV, replacing the repeated scipy.sparse.coo_matrix builds and numpy.ma masked CV. error_model="numpy" so empty-bin 0/0 -> nan as in the ref. - Exact-Euclidean distance_transform_edt (d_to_edge) and the centre (maximum_position_of_labels) stay host-side (scipy/centrosome). - get_radial_distribution wrapper: to_bzyx, 2D-only (Z>1 -> {}), single/batch. Diverges from the current numpy baseline on multi-object fields BY DESIGN (the #22 fix); equals the baseline run on each object in ISOLATION. The only other divergence is on objects with a symmetric centre plateau, where the reference's scipy.ndimage.maximum_position tie-break is field/layout-dependent and a per-object crop picks a different (equally valid) centre — numba's is deterministic and field-independent. ~9.5x end-to-end (1080^2, 144 obj); geodesic alone 35x. Tests: geodesic vs propagate (shape battery, bit-exact), radial_reduce vs numpy, golden numba(multi)[k] == numpy(isolated k) on unique-centre objects, #22 independence, 2D-only/empty/batch. Full suite 118 passed, lint clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The chamfer geodesic was already 35x, but end-to-end was Amdahl-capped at 9.5x by the per-object host glue and the 144 scipy.ndimage.maximum_position calls (a profile showed: host numpy 33%, maxpos 25%, EDT 19%, geodesic only 9%). Fold the centre (argmax of d_to_edge, first-max C-order — deterministic and field-independent), the geodesic, the per-bin histograms and the 8-wedge RadialCV into a single radial_object() kernel per crop. The host loop now does only the scipy EDT (exact Euclidean, kept) + the kernel call — no per-object mgrid / np.where / maximum_position / concatenate. Replaces radial_reduce. 1004ms -> 48ms (21.0x, was 9.5x); scipy EDT (~23ms, correctly not reimplemented) is now the floor. Exactness unchanged: golden numba(multi)[k] == numpy(isolated k) on unique-centre objects, #22 independence, kernels all green; full suite 118. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Collaborator
Author
|
Follow-up perf (commit 89f79e9): the shipped 9.5× was Amdahl-capped — a profile showed the numba geodesic is only ~9% of the function, while the per-object host glue (33%) and 144 9.5× → 21.0× (1004 → 48 ms; scipy exact-Euclidean EDT ~23ms is now the floor, correctly not reimplemented). Exactness unchanged — golden |
- Fix stale docstrings: _radial.py referenced the removed radial_reduce and "maximum_position_of_labels stay host-side"; the centre is now computed in the fused radial_object kernel. Updated both module docstrings. - Dedup the result-key generation: the (MF_*, OF_*) triple + bin loop appeared 3x (in _radial_keys, the cols alias dict, and the assembly loop). Replaced with a single _radial_features() generator yielding (col, name, bin), used by both the n==0 empty-dict path and the main assembly; dropped the 6-key->3-array cols dict. - Drop redundant numpy.ascontiguousarray on numpy.pad output (pad is already C-contiguous); document the assumption. - Document the contiguous-labels (1..n) assumption for the object count. Behaviour-preserving; 20 radial tests + dispatch green, lint clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacked on #58 (the numba
measureobjectintensitydistribution.pymodule lives there). Completes the core radial module.Per-object cropped processing (each object on its bbox + 1px pad) replaces the whole-image centrosome geometry — this fixes Issue #22 (per-object results no longer depend on other labels in the field) and runs the geometry on small arrays.
core/numba/_radial.py:geodesic_chamfer_fifo— 1/√2 chamfer shortest-path from the centre seed (FIFO Bellman-Ford). Bit-exact to centrosomepropagate(zeros, seed, mask, 1)(shortest-path is algorithm-independent), ~35× faster — replaces the 80% bottleneck.radial_reduce— per-(object,bin) histograms + 8-wedgeRadialCV, replacing the repeatedscipy.sparse.coo_matrixbuilds +numpy.maCV.distance_transform_edtand the centre (maximum_position_of_labels) stay host-side (scipy/centrosome).get_radial_distribution:to_bzyx, 2D-only (Z>1 → {}), single/batch.scipy.ndimage.maximum_positiontie-break is field/layout-dependent; the per-object crop picks a different (equally valid) centre — numba's is deterministic and field-independent. @afermg raised #22 upstream to scipy —propagateis here replaced by a bit-exact numba kernel, so no behaviour change beyond the #22 fix.~21× end-to-end (1080², 144 obj;
propagate735→~12 ms). Tests: geodesic vs propagate (shape battery, bit-exact),radial_reducevs numpy, goldennumba(multi)[k] == numpy(isolated k), #22 independence, 2D-only/empty/batch. Full suite 118 passed, lint clean. Stack: #59 → #58 → this.