Skip to content

feat(sanitize): central non-contiguous label-ID sanitation#89

Open
timtreis wants to merge 8 commits into
mainfrom
feat/sanitize-noncontiguous-labels
Open

feat(sanitize): central non-contiguous label-ID sanitation#89
timtreis wants to merge 8 commits into
mainfrom
feat/sanitize-noncontiguous-labels

Conversation

@timtreis

@timtreis timtreis commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Relabel non-contiguous / arbitrary mask IDs (e.g. {1,5,17}) to 1..N so measurement code can assume clean labels. Input is never mutated; results map back to original IDs.

How

  • cp_measure._sanitize.sanitize_masks(masks) -> (clean, ids) via skimage.segmentation.relabel_sequential (existing dep). Single unique pass; no-copy fast path when already contiguous; bool masks = one object; raises on negative/non-integer.
  • Featurizer sanitizes once at loop entry, emits original IDs as rows.
  • @sanitize_labels on every core/correlation/numba get_* for direct callers.
  • short-circuits when mask is already contiguous

Relabel arbitrary or non-contiguous mask label IDs (e.g. {1,5,17}) to
contiguous 1..N as early as possible in the pipeline, so every measurement
function downstream can assume clean labels and focus on the math. The caller's
array is never mutated and results are reported against the original IDs.

- new cp_measure._sanitize: sanitize_masks() built on
  skimage.segmentation.relabel_sequential (already a dependency), plus a thin
  @sanitize_labels decorator that finds the label argument by name
- featurizer sanitizes once at the top of the per-mask loop and emits original
  IDs as row labels (was range(1, n+1))
- @sanitize_labels applied to every core/correlation/numba get_* entry so
  direct callers get the same guarantee (cheap idempotent guard)
- raises ValueError on negative / non-integer labels
- tests: gapped {1,5,17} == contiguous baseline per function, featurizer
  end-to-end with original-ID rows, no-mutation, validation, 3D, registry
  decoration meta-test (partial-aware, multimask excluded)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jun 18, 2026

Copy link
Copy Markdown

Benchmark — 7ba1d0f vs main

speedup = main/head (>1 faster, <1 slower) · median per cell · showing functions that moved ≥1.05× either way

costes

size \ objects 16 64 256
256 0.98× 1.00× 1.01×
512 0.96× 0.98× 1.02×
1024 0.93× 0.96× 0.99×

feret

size \ objects 16 64 256
256 0.98× 0.97× 0.96×
512 0.95× 0.96× 0.97×
1024 0.93× 0.97× 0.93×

manders_fold

size \ objects 16 64 256
256 0.82× 0.93× 0.98×
512 0.78× 0.82× 0.96×
1024 0.77× 0.81× 0.84×

pearson

size \ objects 16 64 256
256 0.94× 0.99× 0.99×
512 0.91× 0.94× 1.01×
1024 0.88× 0.91× 0.96×

rwc

size \ objects 16 64 256
256 0.96× 0.97× 0.98×
512 0.94× 0.96× 0.99×
1024 0.94× 0.95× 0.95×

texture

size \ objects 16 64 256
256 1.00× 0.99× 0.98×
512 1.01× 0.99× 0.99×
1024 1.11× 1.03× 1.00×

zernike

size \ objects 16 64 256
256 1.00× 0.99× 0.97×
512 0.94× 0.95× 0.99×
1024 1.00× 1.01× 0.98×

timtreis and others added 3 commits June 18, 2026 14:34
Review-driven cleanup of the sanitation layer:

- collapse sanitize_masks to a single numpy.unique pass: derive the
  contiguity test, negative-value guard and original IDs from one sorted
  array; drop the separate _is_contiguous helper and the redundant
  max()/min()/second-unique passes
- accept boolean masks (treated as one object) instead of raising — restores
  the pre-existing behaviour of the raw measurement functions
- decorate get_correlation_overlap (the one public correlation entry that was
  missed), so direct callers get correct results on gapped labels
- trim the over-long module/function docstrings and the featurizer comment
- tests: parametrize the policy/edge cases, exercise the rank remap with
  non-monotonic labels, assert real multimask functions stay unsanitized, add
  a bool-mask case (net leaner, 109 passing)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The decorator ran numpy.unique over the whole image on every call (~10 ms on
1024²), a fixed cost that dominated cheap functions and large/few-object images
(manders_fold fell to 0.40×). Replace it with a max() gate plus a bincount
presence check (~4x faster), falling back to unique only for pathologically
large labels. relabel_sequential still handles the rare gapped path.

Measured on 1024²: manders_fold 0.74→0.91×, intensity 0.97×, feret 0.99×;
expensive functions (zernike/sizeshape/texture) were already ~1.0×. Negatives
still raise; bit-identical results. 109 tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
featurize() sanitized each mask up front and then called the @sanitize_labels-
decorated functions, so every feature re-ran the contiguity scan on the
already-clean mask (~12x per object-type). Unwrap the decorator (via the
__wrapped__ the wrapper already sets, partial-aware for the legacy path) so the
featurizer pays the sanitation cost exactly once per mask. Direct callers still
get the decorator's guarantee.

Test asserts sanitize_masks is called once per object-type mask. 110 pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@timtreis

Copy link
Copy Markdown
Collaborator Author

of note: the faster the actual runtime of a given function, the more the overhead of this hits, even when short-circuiting, f.e. in manders

timtreis and others added 2 commits June 18, 2026 15:40
The per-function equivalence tests parametrized over all ~13 measurement
functions were redundant: the decorator is uniform, so the coverage meta-test
(every public function wrapped) + one representative direct call (zernike, which
crashes on gaps without sanitation) + the featurizer end-to-end test cover the
same ground. Also merged the mutation/copy cases, dropped the separate numba
meta-test, and removed a dead second monkeypatch. 178 -> 102 lines.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
With sanitize_masks guaranteeing contiguous 1..N at every entry, the gap-tolerant
label bookkeeping inside the feature functions is dead weight. Remove it:

- texture/granularity/radial_distribution/radial_zernikes: drop the internal
  `numpy.unique(masks)` enumeration (a full-image sort, redundant after the
  decorator's pass). Use len(props) / range_.size / arange(1, max+1) instead.
- numba intensity: labels are 1..N, so the label->index map is just `label - 1`.
  Drop `lut` from flatten_numba (and the edge path) and compute it inline;
  nobjects = masks.max().
- delete primitives/segment.py: label_to_idx_lut had no remaining consumers.

All outputs verified bit-identical on contiguous input (numpy + numba), numba
gapped input maps correctly, backend-correctness + full suite (91) pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@timtreis timtreis force-pushed the feat/sanitize-noncontiguous-labels branch from ac9950c to b21be05 Compare June 18, 2026 13:58
…tring

flatten_numba (in this module) builds the flat arrays now; segment.py is gone.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@timtreis timtreis requested a review from afermg June 18, 2026 14:41
Sanitation relabels arbitrary IDs to 1..N; drop the relabel_sequential caveat.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant