feat(sanitize): central non-contiguous label-ID sanitation by timtreis · Pull Request #89 · afermg/cp_measure

timtreis · 2026-06-18T12:11:03Z

Relabel non-contiguous / arbitrary mask IDs (e.g. {1,5,17}) to 1..N so measurement code can assume clean labels. Input is never mutated; results map back to original IDs.

How

cp_measure._sanitize.sanitize_masks(masks) -> (clean, ids) via skimage.segmentation.relabel_sequential (existing dep). Single unique pass; no-copy fast path when already contiguous; bool masks = one object; raises on negative/non-integer.
Featurizer sanitizes once at loop entry, emits original IDs as rows.
@sanitize_labels on every core/correlation/numba get_* for direct callers.
short-circuits when mask is already contiguous

Relabel arbitrary or non-contiguous mask label IDs (e.g. {1,5,17}) to contiguous 1..N as early as possible in the pipeline, so every measurement function downstream can assume clean labels and focus on the math. The caller's array is never mutated and results are reported against the original IDs. - new cp_measure._sanitize: sanitize_masks() built on skimage.segmentation.relabel_sequential (already a dependency), plus a thin @sanitize_labels decorator that finds the label argument by name - featurizer sanitizes once at the top of the per-mask loop and emits original IDs as row labels (was range(1, n+1)) - @sanitize_labels applied to every core/correlation/numba get_* entry so direct callers get the same guarantee (cheap idempotent guard) - raises ValueError on negative / non-integer labels - tests: gapped {1,5,17} == contiguous baseline per function, featurizer end-to-end with original-ID rows, no-mutation, validation, 3D, registry decoration meta-test (partial-aware, multimask excluded) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-18T12:18:59Z

Benchmark — `7ba1d0f` vs `main`

speedup = main/head (>1 faster, <1 slower) · median per cell · showing functions that moved ≥1.05× either way

`costes`

size \ objects	16	64	256
256	0.98×	1.00×	1.01×
512	0.96×	0.98×	1.02×
1024	0.93×	0.96×	0.99×

`feret`

size \ objects	16	64	256
256	0.98×	0.97×	0.96×
512	0.95×	0.96×	0.97×
1024	0.93×	0.97×	0.93×

`manders_fold`

size \ objects	16	64	256
256	0.82×	0.93×	0.98×
512	0.78×	0.82×	0.96×
1024	0.77×	0.81×	0.84×

`pearson`

size \ objects	16	64	256
256	0.94×	0.99×	0.99×
512	0.91×	0.94×	1.01×
1024	0.88×	0.91×	0.96×

`rwc`

size \ objects	16	64	256
256	0.96×	0.97×	0.98×
512	0.94×	0.96×	0.99×
1024	0.94×	0.95×	0.95×

`texture`

size \ objects	16	64	256
256	1.00×	0.99×	0.98×
512	1.01×	0.99×	0.99×
1024	1.11×	1.03×	1.00×

`zernike`

size \ objects	16	64	256
256	1.00×	0.99×	0.97×
512	0.94×	0.95×	0.99×
1024	1.00×	1.01×	0.98×

Review-driven cleanup of the sanitation layer: - collapse sanitize_masks to a single numpy.unique pass: derive the contiguity test, negative-value guard and original IDs from one sorted array; drop the separate _is_contiguous helper and the redundant max()/min()/second-unique passes - accept boolean masks (treated as one object) instead of raising — restores the pre-existing behaviour of the raw measurement functions - decorate get_correlation_overlap (the one public correlation entry that was missed), so direct callers get correct results on gapped labels - trim the over-long module/function docstrings and the featurizer comment - tests: parametrize the policy/edge cases, exercise the rank remap with non-monotonic labels, assert real multimask functions stay unsanitized, add a bool-mask case (net leaner, 109 passing) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The decorator ran numpy.unique over the whole image on every call (~10 ms on 1024²), a fixed cost that dominated cheap functions and large/few-object images (manders_fold fell to 0.40×). Replace it with a max() gate plus a bincount presence check (~4x faster), falling back to unique only for pathologically large labels. relabel_sequential still handles the rare gapped path. Measured on 1024²: manders_fold 0.74→0.91×, intensity 0.97×, feret 0.99×; expensive functions (zernike/sizeshape/texture) were already ~1.0×. Negatives still raise; bit-identical results. 109 tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

featurize() sanitized each mask up front and then called the @sanitize_labels- decorated functions, so every feature re-ran the contiguity scan on the already-clean mask (~12x per object-type). Unwrap the decorator (via the __wrapped__ the wrapper already sets, partial-aware for the legacy path) so the featurizer pays the sanitation cost exactly once per mask. Direct callers still get the decorator's guarantee. Test asserts sanitize_masks is called once per object-type mask. 110 pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

timtreis · 2026-06-18T13:32:36Z

of note: the faster the actual runtime of a given function, the more the overhead of this hits, even when short-circuiting, f.e. in manders

The per-function equivalence tests parametrized over all ~13 measurement functions were redundant: the decorator is uniform, so the coverage meta-test (every public function wrapped) + one representative direct call (zernike, which crashes on gaps without sanitation) + the featurizer end-to-end test cover the same ground. Also merged the mutation/copy cases, dropped the separate numba meta-test, and removed a dead second monkeypatch. 178 -> 102 lines. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

With sanitize_masks guaranteeing contiguous 1..N at every entry, the gap-tolerant label bookkeeping inside the feature functions is dead weight. Remove it: - texture/granularity/radial_distribution/radial_zernikes: drop the internal `numpy.unique(masks)` enumeration (a full-image sort, redundant after the decorator's pass). Use len(props) / range_.size / arange(1, max+1) instead. - numba intensity: labels are 1..N, so the label->index map is just `label - 1`. Drop `lut` from flatten_numba (and the edge path) and compute it inline; nobjects = masks.max(). - delete primitives/segment.py: label_to_idx_lut had no remaining consumers. All outputs verified bit-identical on contiguous input (numpy + numba), numba gapped input maps correctly, backend-correctness + full suite (91) pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…tring flatten_numba (in this module) builds the flat arrays now; segment.py is gone. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Sanitation relabels arbitrary IDs to 1..N; drop the relabel_sequential caveat. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

timtreis and others added 3 commits June 18, 2026 14:34

timtreis and others added 2 commits June 18, 2026 15:40

timtreis force-pushed the feat/sanitize-noncontiguous-labels branch from ac9950c to b21be05 Compare June 18, 2026 13:58

docs(primitives): fix stale segment.py cross-ref in numba kernel docs…

01279d4

…tring flatten_numba (in this module) builds the flat arrays now; segment.py is gone. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

timtreis requested a review from afermg June 18, 2026 14:41

docs(readme): non-contiguous labels now handled internally

7ba1d0f

Sanitation relabels arbitrary IDs to 1..N; drop the relabel_sequential caveat. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sanitize): central non-contiguous label-ID sanitation#89

feat(sanitize): central non-contiguous label-ID sanitation#89
timtreis wants to merge 8 commits into
mainfrom
feat/sanitize-noncontiguous-labels

timtreis commented Jun 18, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 18, 2026 •

edited

Loading

Uh oh!

timtreis commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

timtreis commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark — 7ba1d0f vs main

costes

feret

manders_fold

pearson

rwc

texture

zernike

Uh oh!

timtreis commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

timtreis commented Jun 18, 2026 •

edited

Loading

github-actions Bot commented Jun 18, 2026 •

edited

Loading

Benchmark — `7ba1d0f` vs `main`

`costes`

`feret`

`manders_fold`

`pearson`

`rwc`

`texture`

`zernike`