Skip to content

[3/3] Add make_stitched_labels (materialise stitched labels)#1207

Merged
timtreis merged 10 commits into
scverse:mainfrom
timtreis:feature/stitched-labels
Jun 23, 2026
Merged

[3/3] Add make_stitched_labels (materialise stitched labels)#1207
timtreis merged 10 commits into
scverse:mainfrom
timtreis:feature/stitched-labels

Conversation

@timtreis

@timtreis timtreis commented Jun 14, 2026

Copy link
Copy Markdown
Member

Completes the tile-cut stitching arc ([1/3] #1188, [2/3] #1193) with the materialisation step.

Usage

sq.experimental.tl.calculate_tiling_qc(sdata, labels_key="labels")
sq.experimental.tl.assign_stitch_groups(sdata, labels_key="labels")
sq.experimental.im.make_stitched_labels(sdata, labels_key="labels")
# sdata.labels["labels_stitched"]      -> pieces of a cut cell share one ID (original untouched)
# sdata.tables["labels_stitched_table"] -> one row per stitched cell

Knobs: merge_strategy (how additive user features aggregate across pieces; default "sum"), join_labels (close the seam so a group is one connected component; stays lazy on dask), write_table, inplace.

Also in this PR

  • De-bloat of the merged arc: inlined the single-use utils/_geometry.py helpers into their one caller (module deleted) and routed _resolve_qc_params through the shared resolve_params. No behaviour change.

Closes the #1170 arc.

timtreis and others added 4 commits June 12, 2026 16:35
Lock the validation-sweep outcome: at min_confidence=0.5 the deterministic
fixture recovers >=50% of cut pieces with no intact false-merges. min_confidence
default stays 0.7 (full attainable recall, zero false merges); gap_proximity
kept in the 5-feature score.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…bels)

PR-C of the scverse#1170 split. Adds sq.experimental.im.make_stitched_labels: reads
the stitch_group_id mapping written by assign_stitch_groups and registers a new
labels element where each stitched group shares one ID (original labels left
untouched), plus an optional collapsed AnnData (one row per group) with
configurable merge_strategy.

Stacks on PR-B (feature/tiling-stitch-algo) and reuses its full-pipeline tests.

Scale rework (must handle 100k x 100k, never materialise the full array):
- LUT remap stays lazy; labels present in the image but absent from the QC
  table (e.g. min_area-filtered cells) pass through instead of indexing OOB.
- join_labels is now chunk-aware via dask map_overlap (depth = close_radius+2),
  never computing the whole image or running a full-frame regionprops.
- group aggregation vectorised (argsort+split, pandas groupby) instead of the
  O(cells x groups) per-group np.where scan.
- X aggregation is sparse-safe: sum/mean via sparse matmul, first via sparse
  gather, other reducers per-group-bounded; the full matrix is never densified.
- validate label_id/is_stitched up front; squeeze a leading singleton dim.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…our change

- inline equivalent_diameter + largest_contour (single consumer _tiling_stitch)
  into their call site and delete utils/_geometry.py; keep both guards (1px pad
  precondition + empty-contour skip).
- route _resolve_qc_params through the shared resolve_params helper instead of a
  duplicated Mapping-validation block; drop the now-dead _QC_FIELDS and unused
  fields import. utils/_params.py now has two genuine callers.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…le + validation

Addresses a max-effort review of the stitched-labels path:

- collapsed table no longer sums QC-contract columns: centroid_y/x take the
  mean of the pieces (a merged cell's position), the per-piece cut-artifact
  scores take the max (worst piece); merge_strategy now applies only to genuine
  additive user features. Drops the dtype-restore that re-narrowed/truncated
  promoted sums and means.
- _aggregate_X: sum no longer treated as integer-preserving, so an integer .X
  promotes to float64 instead of wrapping (uint16 200000 -> 3392); a callable
  strategy is now applied to singleton groups too (obs/X parity).
- multiscale: parse the new element with the resolved scale's transform (not the
  DataTree base/Identity), so the output overlays correctly; documented as a
  single-scale element at the QC scale's resolution.
- validate label_id (positive, unique, non-NaN) and merge_strategy up front, so
  bad input fails fast even with write_table=False.
- preserve varm (var axis is unchanged); correct the join_labels docstring (it is
  lazy via map_overlap and bridges seams up to 2*close_radius px, not "forces
  materialisation").
- add TestReviewFixes regression locks (each fails on the pre-fix code).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@timtreis timtreis requested a review from selmanozleyen June 14, 2026 17:25
@codecov

codecov Bot commented Jun 14, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 84.68468% with 34 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.81%. Comparing base (9f9d500) to head (228f721).

Files with missing lines Patch % Lines
src/squidpy/experimental/im/_stitched_labels.py 84.57% 13 Missing and 20 partials ⚠️
src/squidpy/experimental/tl/_tiling_stitch.py 80.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1207      +/-   ##
==========================================
+ Coverage   76.53%   76.81%   +0.27%     
==========================================
  Files          63       63              
  Lines        9070     9267     +197     
  Branches     1523     1565      +42     
==========================================
+ Hits         6942     7118     +176     
- Misses       1544     1547       +3     
- Partials      584      602      +18     
Files with missing lines Coverage Δ
src/squidpy/experimental/tl/_tiling_qc.py 70.66% <100.00%> (-0.67%) ⬇️
src/squidpy/experimental/tl/_tiling_stitch.py 75.29% <80.00%> (ø)
src/squidpy/experimental/im/_stitched_labels.py 84.57% <84.57%> (ø)

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

contour = largest_contour(mask)
if contour is None:
contours = find_contours(mask, 0.5)
if not contours: # degenerate mask traces nothing; skip it

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if not contours is ambigious if it means it's 0 the check should be to if contours != 0 or something like this.

@timtreis timtreis Jun 22, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if contours != 0 wouldn't work here because upon failure, it'd return an empty list

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah then len(contours) == 0 then right? that's what I mean by being unclear

Comment thread src/squidpy/experimental/im/_stitched_labels.py
Comment thread docs/release/notes-dev.md Outdated
Comment thread tests/experimental/test_stitched_labels.py Outdated
Key for the new labels element. Defaults to
``"{labels_key}_stitched"``. Existing element at this key is
overwritten with a warning.
table_key_added

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be wrong on this, but can't we notice if table_key_added is not None, write_table would be automatically true? I think it's a default at the price of adding a bool argument.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost 🤔 We have these states:

A) write_table=True + table_key_added=None --> write table with default key ({out_key}_table)
B) write_table=True + table_key_added="foo" --> write table with key (foo)
C) write_table=False + table_key_added=None --> no table
D) write_table=False + table_key_added="foo" --> error

For that granularity of "I don't care about the name, just use defaults", we need to enable the permutations

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah thats what I mean, we add one argument write_table for just supporting: "I don't care about the name, just use defaults". This is a preference thing so it's fine I guess

@selmanozleyen selmanozleyen left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nitpicks but it seems quite good!

- enforce table_key_added != labels element key before any sdata mutation
  (SpatialData names are unique across element types; avoids a half-written
  sdata and a cryptic late KeyError)
- reject the contradictory write_table=False + table_key_added combo instead
  of silently ignoring the key
- replace dead pytest.skip guards with asserts (sdata_tile_boundary is seeded,
  so stitched cells are a deterministic precondition)
- add regression cases for both new validation errors
- drop the make_stitched_labels release note (handled at release time)
@timtreis timtreis merged commit ae5ceb4 into scverse:main Jun 23, 2026
14 of 15 checks passed
@timtreis

Copy link
Copy Markdown
Member Author

yey! Will send a tutorial to squidpy_notebooks next

@timtreis timtreis deleted the feature/stitched-labels branch June 23, 2026 10:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants