Add an Example column to the taxa list to verify species presence one row at a time by mihow · Pull Request #1365 · RolnickLab/antenna

mihow · 2026-07-02T20:45:58Z

Summary

This adds a presence-verification workflow to the taxa list. Curators reviewing a project's species often need to answer a simple question one taxon at a time: is this really here? Today that means leaving the list, hunting down an occurrence, and verifying it in a separate view. This change surfaces one representative occurrence per taxon directly in the list as an Example thumbnail. Clicking it opens the existing occurrence identification modal right over the list, on the Identification tab, so the reviewer can confirm the taxon and move to the next row without losing their place. The Last-seen and Best-score cells link into the same modal, verified rows are dimmed and marked, and confirming an identification updates the row in place.

The whole feature is opt-in behind a query parameter, so the default taxa list keeps its current latency budget. When the parameter is off, the new fields serialize as null and no extra queries run.

Closes #1320.

List of Changes

#	What the user gets	How it works
1	An Example column showing a thumbnail of one occurrence per taxon; clicking it opens the occurrence identification modal over the taxa list, on the Identification tab.	New non-sortable column in the taxa table. The occurrence modal was extracted into a reusable `OccurrenceDetailsDialog` (route-driven via `?verifyOccurrence=<id>`) rather than forked. Backend returns a nested `example_occurrence {id, detection_id, image_url, score, verified}`.
2	The Last-seen and Best-score cells deep-link to the same modal, so a reviewer can jump to the most recent or highest-confidence occurrence.	Serializer also returns `best_scoring_occurrence_id` and `last_detected_occurrence_id`; the cells render as links to `?verifyOccurrence=`.
3	Already-verified taxa are dimmed and flagged with a shield marker; confirming an identification updates the row live.	Row styling keyed off the verified count; the identification mutation now invalidates the taxa query cache.
4	The default taxa list is unchanged in speed and shape.	The three fields are gated behind `?with_example_occurrences=true`; off by default they annotate `NULL` and run no subqueries.
5	The example shown is the useful one: for an unverified taxon, the best-scoring occurrence that hasn't been verified yet (a fast clean ID); for a verified taxon, the latest occurrence (is it still showing up?).	`TaxonQuerySet.with_example_occurrence_ids` picks between three correlated subqueries using a precomputed verified-taxon set.
6	No pagination slowdown as the feature is added.	`TaxonPagination.get_count` strips annotations before the COUNT so the example subqueries are never pulled into the count query by the tag filter's `.distinct()`.
7	(internal) The verified-occurrence rollup is computed once instead of twice.	`verified_taxon_counts()` was extracted so both the verified-count annotation and the example dispatch share a single pass.

Hardening from a takeaway review (commit `fd5ac069`)

A structural review after the first pass surfaced three things, now fixed:

Draft-project visibility. The taxa list annotates observed-occurrence data — the per-taxon counts and, with the flag on, the example occurrence ids and detection crop URLs — through subqueries that were not visibility-gated. A non-member could read it for a draft project. TaxonViewSet.get_queryset now refuses a project the requester cannot see with a 404, the same way the other project-scoped taxa endpoints already do. (The counts were exposed before this feature; the example adds object ids and image URLs, so the gate matters more now.)
One declaration of the example shape. The nested {id, detection_id, image_url, score, verified} object is now defined once in an ExampleOccurrenceSerializer (also typing the field for the OpenAPI schema) instead of being hand-built as a dict in the view.
No drift between the example's id and its image. with_best_detection() now also returns best_detection_id, chosen from the same detection as the image path, so the returned detection_id cannot point at a different detection than the thumbnail.

New tests cover each: a non-member is refused a draft project's taxa list while a member still sees examples; a higher-rank taxon used directly for identifications gets a pinned example; and the ?collection= path draws the example from the same set the count reports.

Detailed Description

Selection semantics (hybrid, exact-determination)

Unverified row → best-scoring occurrence that has no non-withdrawn identification (the quickest to confirm).
Verified row → latest occurrence by detection timestamp (has it appeared again since?).
Only verified_count rolls up to ancestors, so higher-rank rows (genus, family) get a NULL example — the example is exact-determination, not rolled up.

One gotcha worth recording: verifying an occurrence overrides its determination_score, so verified occurrences tie on score. The test fixture accounts for this by marking a taxon verified through a separate occurrence rather than relying on score ordering.

What is gated behind query parameters

Opt-in (default off) — this feature's cost gate:

?with_example_occurrences=true enables example_occurrence, best_scoring_occurrence_id, last_detected_occurrence_id. Off: all three annotate Value(None) — no subqueries, no hydration query, stable response shape. On: three subqueries fold into the page SELECT plus one hydration query per page. Parsed strictly (?with_example_occurrences=abc → 400).

Dispatch parameters (change which query shape runs, pre-existing):

?collection=<id> switches the observation counts from correlated subqueries to conditional aggregation over the detections join. This is why the example subqueries are opt-in: they degrade to per-row scans on this path.
?include_unobserved=true drops the observed-only restriction.
?verified=true|false filters rows to the verified / unverified set.
?apply_defaults=false bypasses the default score-threshold and taxa include/exclude filters.

Always-on: occurrences_count, events_count, last_detected, best_determination_score, verified_count.

Performance

Measured on the real DRF path (queryset + pagination COUNT + example hydration + full serialization), 25 taxa per page, cold (query cache flushed) vs warm, on three large projects. Projects are anonymised; sizes are what matter.

Project	occ / taxa	warm, flag off	warm, flag on	cold, flag off	cold, flag on	queries off → on
A	~93k / ~1.7k	61 ms	73 ms	156–420 ms	178–465 ms	8 → 9
B	~70k / ~3.2k	58 ms	70 ms	108–209 ms	129–312 ms	8 → 9
C	~179k / ~2.3k (verification-heavy)	770 ms	~775–877 ms	~990–1100 ms	~966–1970 ms	8 → 9

Findings:

The flag adds exactly one query, fixed. Off = 8 queries, on = 9, on every project and every page offset. The three example subqueries fold into the existing page SELECT; the extra query is the single per-page hydration. Query count does not scale with page size or offset.
Warm cost is about +12 ms (~20% of a 60 ms page) on normal projects — negligible in absolute terms.
Cold cost is +20–180 ms, larger on deeper pages because the subqueries evaluate over more offset rows.
The feature does not scale with project size. Projects A and B stay double-digit-ms warm despite 70k–93k occurrences.
Project C is slow with the flag off too (~770 ms warm). Isolating it, the always-on verified-occurrence rollup (verified_taxon_counts) alone accounts for ~743 ms on that project versus ~25 ms on the others — a Python pass whose cost grows with the number of verified taxa. That rollup predates this change; this feature adds only ~12 ms on top of it. It is called out here as a separate, pre-existing hotspot, not introduced by this PR. These are wall-clock measurements from a long-running local stack (PostgreSQL shared buffers warm); "cold" isolates the query-cache miss.

Testing

Backend: feature tests for the selection dispatch (unverified → best-scoring-unverified, verified → latest), the ancestor-NULL case, deployment scoping, the ?collection= fan-out, and the 400-on-bad-flag case; plus query-count tests asserting the hydration does not scale with page size and that the example subqueries are stripped from the pagination COUNT. Broader taxa regression suite passes.
makemigrations --check is clean (query-only change, no migration).
Linters clean (black / isort / flake8; tsc / eslint for the frontend).
Manual browser end-to-end: the Example column renders crops, the deep link opens the modal over the taxa list, confirming an identification creates it and the row updates in place (count, marker, dimming, and the example rolling to the next occurrence).

Possible follow-up (out of scope)

The always-on verification rollup goes O(verified occurrences) in a Python pass. On verification-heavy projects that is the dominant cost of the taxa list, independent of this feature. A denormalised per-(project, taxon) aggregate refreshed on the cached-count pattern would remove it. Worth a separate ticket.

…fication Add the backend for issue #1320. The taxa list can now return, per taxon, one example occurrence to verify — the best-scoring unverified occurrence for unverified rows (fastest clean ID) and the latest occurrence for already-verified rows (is it still showing up?) — plus the source occurrence ids behind the Last-seen and Best-score cells so the frontend can deep-link each to the identification modal. The whole selection is gated behind ?with_example_occurrences=true so the default list keeps its latency budget, especially on the ?collection= (detections-join) path. When off, the three fields serialize as null. - TaxonQuerySet.with_example_occurrence_ids: three correlated subqueries (index-served on the default path), with the verified-vs-unverified branch chosen by a precomputed verified-taxon set. - Extract verified_taxon_counts() so the verified rollup is computed once and shared by with_verification_counts and the example dispatch. - TaxonViewSet hydrates the chosen example ids into {id, detection_id, image_url, score, verified} in one query per page (no N+1). - TaxonPagination.get_count strips annotations before the COUNT (mirrors ProjectPagination) so the example subqueries are not evaluated for every taxon in the project via TagInverseFilter's .distinct(). Co-Authored-By: Claude <noreply@anthropic.com>

… list Frontend for issue #1320. Each taxon row gains a non-sortable Example column showing a thumbnail of the occurrence to verify; clicking it opens the existing occurrence identification modal (Agree / Suggest ID) over the taxa list, so a user can sweep unverified taxa and confirm presence one row at a time. The Last-seen and Best-score cells link to the same modal for their source occurrence. - Extract OccurrenceDetailsDialog from the occurrences page and reuse it on the taxa list, keyed off a ?verifyOccurrence= search param so the list stays behind. - Species model: verificationExample / bestScoringOccurrenceId / lastDetectedOccurrenceId getters over the new API fields; useSpecies requests ?with_example_occurrences=true. - Dim already-verified rows (new optional rowClassName hook on the Table) and mark them with a verified icon. - Invalidate the taxa list query after an identification so verified counts and the example thumbnail refresh without a reload. Co-Authored-By: Claude <noreply@anthropic.com>

…taxa list Entering the occurrence modal from the taxa list is a verification action, so it now opens on the Identification tab (Agree / Suggest ID) instead of Fields. Added an optional defaultTab prop to OccurrenceDetailsDialog; the occurrences list keeps its Fields default. Co-Authored-By: Claude <noreply@anthropic.com>

netlify · 2026-07-02T20:46:03Z

✅ Deploy Preview for antenna-ssec canceled.

Name	Link
🔨 Latest commit	`fd5ac06`
🔍 Latest deploy log	https://app.netlify.com/projects/antenna-ssec/deploys/6a46dfdfd09a0400089726f3

netlify · 2026-07-02T20:46:03Z

✅ Deploy Preview for antenna-preview canceled.

Name	Link
🔨 Latest commit	`fd5ac06`
🔍 Latest deploy log	https://app.netlify.com/projects/antenna-preview/deploys/6a46dfdf1e6f2e00085dd076

coderabbitai · 2026-07-02T20:46:06Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3477133d-77e0-4663-bd9a-2d46035948b2

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/taxa-presence-verification

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai · 2026-07-02T21:10:42Z

Note

Unit test generation is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it.

Generating unit tests... This may take up to 20 minutes.

coderabbitai · 2026-07-02T21:25:51Z

Request timed out after 900000ms (requestId=af2a3ee8-73a5-45e3-bdbd-af9b43319e2a)

Follow-up to the presence-verification Example column (#1320), applying the structural fixes surfaced by a takeaway review. Gate draft-project data behind visibility. The taxa list annotates observed- occurrence data — per-taxon counts and, under ?with_example_occurrences, example occurrence ids plus detection crop URLs — via subqueries that are not visibility- gated, so a non-member could read it for a draft project. TaxonViewSet.get_queryset now refuses a project the user cannot see with a 404, matching the sibling project-scoped taxa endpoints (top-identifiers, model-agreement). Single-source the nested example shape. ExampleOccurrenceSerializer is now the one declaration of {id, detection_id, image_url, score, verified}; the view hydrates a page of occurrences and serializes them through it instead of hand-building a dict, and it types the field for the OpenAPI schema. Remove the duplicate best-detection subquery. with_best_detection() now also annotates best_detection_id, picked from the same detection as best_detection_path, so the example's detection_id and its image can no longer drift apart. Tests: a higher-rank taxon used directly for identifications now has a pinned example (only pure roll-up ancestors are NULL); the collection path asserts the example is drawn from the same set occurrences_count reports; a draft project hides examples from a non-member while a member still sees them. Co-Authored-By: Claude <noreply@anthropic.com>

mihow and others added 3 commits July 1, 2026 10:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add an Example column to the taxa list to verify species presence one row at a time#1365

Add an Example column to the taxa list to verify species presence one row at a time#1365
mihow wants to merge 4 commits into
mainfrom
feat/taxa-presence-verification

mihow commented Jul 2, 2026 •

edited

Loading

Uh oh!

netlify Bot commented Jul 2, 2026 •

edited

Loading

Uh oh!

netlify Bot commented Jul 2, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jul 2, 2026 •

edited

Loading

Review skipped

Uh oh!

coderabbitai Bot commented Jul 2, 2026

Uh oh!

coderabbitai Bot commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

mihow commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

List of Changes

Hardening from a takeaway review (commit fd5ac069)

Detailed Description

Selection semantics (hybrid, exact-determination)

What is gated behind query parameters

Performance

Testing

Possible follow-up (out of scope)

Uh oh!

netlify Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for antenna-ssec canceled.

Uh oh!

netlify Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for antenna-preview canceled.

Uh oh!

coderabbitai Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

coderabbitai Bot commented Jul 2, 2026

Uh oh!

coderabbitai Bot commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mihow commented Jul 2, 2026 •

edited

Loading

Hardening from a takeaway review (commit `fd5ac069`)

netlify Bot commented Jul 2, 2026 •

edited

Loading

netlify Bot commented Jul 2, 2026 •

edited

Loading

coderabbitai Bot commented Jul 2, 2026 •

edited

Loading