Skip to content

Add an Example column to the taxa list to verify species presence one row at a time#1365

Draft
mihow wants to merge 4 commits into
mainfrom
feat/taxa-presence-verification
Draft

Add an Example column to the taxa list to verify species presence one row at a time#1365
mihow wants to merge 4 commits into
mainfrom
feat/taxa-presence-verification

Conversation

@mihow

@mihow mihow commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Summary

This adds a presence-verification workflow to the taxa list. Curators reviewing a project's species often need to answer a simple question one taxon at a time: is this really here? Today that means leaving the list, hunting down an occurrence, and verifying it in a separate view. This change surfaces one representative occurrence per taxon directly in the list as an Example thumbnail. Clicking it opens the existing occurrence identification modal right over the list, on the Identification tab, so the reviewer can confirm the taxon and move to the next row without losing their place. The Last-seen and Best-score cells link into the same modal, verified rows are dimmed and marked, and confirming an identification updates the row in place.

The whole feature is opt-in behind a query parameter, so the default taxa list keeps its current latency budget. When the parameter is off, the new fields serialize as null and no extra queries run.

Closes #1320.

List of Changes

# What the user gets How it works
1 An Example column showing a thumbnail of one occurrence per taxon; clicking it opens the occurrence identification modal over the taxa list, on the Identification tab. New non-sortable column in the taxa table. The occurrence modal was extracted into a reusable OccurrenceDetailsDialog (route-driven via ?verifyOccurrence=<id>) rather than forked. Backend returns a nested example_occurrence {id, detection_id, image_url, score, verified}.
2 The Last-seen and Best-score cells deep-link to the same modal, so a reviewer can jump to the most recent or highest-confidence occurrence. Serializer also returns best_scoring_occurrence_id and last_detected_occurrence_id; the cells render as links to ?verifyOccurrence=.
3 Already-verified taxa are dimmed and flagged with a shield marker; confirming an identification updates the row live. Row styling keyed off the verified count; the identification mutation now invalidates the taxa query cache.
4 The default taxa list is unchanged in speed and shape. The three fields are gated behind ?with_example_occurrences=true; off by default they annotate NULL and run no subqueries.
5 The example shown is the useful one: for an unverified taxon, the best-scoring occurrence that hasn't been verified yet (a fast clean ID); for a verified taxon, the latest occurrence (is it still showing up?). TaxonQuerySet.with_example_occurrence_ids picks between three correlated subqueries using a precomputed verified-taxon set.
6 No pagination slowdown as the feature is added. TaxonPagination.get_count strips annotations before the COUNT so the example subqueries are never pulled into the count query by the tag filter's .distinct().
7 (internal) The verified-occurrence rollup is computed once instead of twice. verified_taxon_counts() was extracted so both the verified-count annotation and the example dispatch share a single pass.

Hardening from a takeaway review (commit fd5ac069)

A structural review after the first pass surfaced three things, now fixed:

  1. Draft-project visibility. The taxa list annotates observed-occurrence data — the per-taxon counts and, with the flag on, the example occurrence ids and detection crop URLs — through subqueries that were not visibility-gated. A non-member could read it for a draft project. TaxonViewSet.get_queryset now refuses a project the requester cannot see with a 404, the same way the other project-scoped taxa endpoints already do. (The counts were exposed before this feature; the example adds object ids and image URLs, so the gate matters more now.)
  2. One declaration of the example shape. The nested {id, detection_id, image_url, score, verified} object is now defined once in an ExampleOccurrenceSerializer (also typing the field for the OpenAPI schema) instead of being hand-built as a dict in the view.
  3. No drift between the example's id and its image. with_best_detection() now also returns best_detection_id, chosen from the same detection as the image path, so the returned detection_id cannot point at a different detection than the thumbnail.

New tests cover each: a non-member is refused a draft project's taxa list while a member still sees examples; a higher-rank taxon used directly for identifications gets a pinned example; and the ?collection= path draws the example from the same set the count reports.

Detailed Description

Selection semantics (hybrid, exact-determination)

  • Unverified row → best-scoring occurrence that has no non-withdrawn identification (the quickest to confirm).
  • Verified row → latest occurrence by detection timestamp (has it appeared again since?).
  • Only verified_count rolls up to ancestors, so higher-rank rows (genus, family) get a NULL example — the example is exact-determination, not rolled up.

One gotcha worth recording: verifying an occurrence overrides its determination_score, so verified occurrences tie on score. The test fixture accounts for this by marking a taxon verified through a separate occurrence rather than relying on score ordering.

What is gated behind query parameters

Opt-in (default off) — this feature's cost gate:

  • ?with_example_occurrences=true enables example_occurrence, best_scoring_occurrence_id, last_detected_occurrence_id. Off: all three annotate Value(None) — no subqueries, no hydration query, stable response shape. On: three subqueries fold into the page SELECT plus one hydration query per page. Parsed strictly (?with_example_occurrences=abc → 400).

Dispatch parameters (change which query shape runs, pre-existing):

  • ?collection=<id> switches the observation counts from correlated subqueries to conditional aggregation over the detections join. This is why the example subqueries are opt-in: they degrade to per-row scans on this path.
  • ?include_unobserved=true drops the observed-only restriction.
  • ?verified=true|false filters rows to the verified / unverified set.
  • ?apply_defaults=false bypasses the default score-threshold and taxa include/exclude filters.

Always-on: occurrences_count, events_count, last_detected, best_determination_score, verified_count.

Performance

Measured on the real DRF path (queryset + pagination COUNT + example hydration + full serialization), 25 taxa per page, cold (query cache flushed) vs warm, on three large projects. Projects are anonymised; sizes are what matter.

Project occ / taxa warm, flag off warm, flag on cold, flag off cold, flag on queries off → on
A ~93k / ~1.7k 61 ms 73 ms 156–420 ms 178–465 ms 8 → 9
B ~70k / ~3.2k 58 ms 70 ms 108–209 ms 129–312 ms 8 → 9
C ~179k / ~2.3k (verification-heavy) 770 ms ~775–877 ms ~990–1100 ms ~966–1970 ms 8 → 9

Findings:

  1. The flag adds exactly one query, fixed. Off = 8 queries, on = 9, on every project and every page offset. The three example subqueries fold into the existing page SELECT; the extra query is the single per-page hydration. Query count does not scale with page size or offset.
  2. Warm cost is about +12 ms (~20% of a 60 ms page) on normal projects — negligible in absolute terms.
  3. Cold cost is +20–180 ms, larger on deeper pages because the subqueries evaluate over more offset rows.
  4. The feature does not scale with project size. Projects A and B stay double-digit-ms warm despite 70k–93k occurrences.
  5. Project C is slow with the flag off too (~770 ms warm). Isolating it, the always-on verified-occurrence rollup (verified_taxon_counts) alone accounts for ~743 ms on that project versus ~25 ms on the others — a Python pass whose cost grows with the number of verified taxa. That rollup predates this change; this feature adds only ~12 ms on top of it. It is called out here as a separate, pre-existing hotspot, not introduced by this PR. These are wall-clock measurements from a long-running local stack (PostgreSQL shared buffers warm); "cold" isolates the query-cache miss.

Testing

  • Backend: feature tests for the selection dispatch (unverified → best-scoring-unverified, verified → latest), the ancestor-NULL case, deployment scoping, the ?collection= fan-out, and the 400-on-bad-flag case; plus query-count tests asserting the hydration does not scale with page size and that the example subqueries are stripped from the pagination COUNT. Broader taxa regression suite passes.
  • makemigrations --check is clean (query-only change, no migration).
  • Linters clean (black / isort / flake8; tsc / eslint for the frontend).
  • Manual browser end-to-end: the Example column renders crops, the deep link opens the modal over the taxa list, confirming an identification creates it and the row updates in place (count, marker, dimming, and the example rolling to the next occurrence).

Possible follow-up (out of scope)

The always-on verification rollup goes O(verified occurrences) in a Python pass. On verification-heavy projects that is the dominant cost of the taxa list, independent of this feature. A denormalised per-(project, taxon) aggregate refreshed on the cached-count pattern would remove it. Worth a separate ticket.

mihow and others added 3 commits July 1, 2026 10:19
…fication

Add the backend for issue #1320. The taxa list can now return, per taxon, one
example occurrence to verify — the best-scoring unverified occurrence for
unverified rows (fastest clean ID) and the latest occurrence for already-verified
rows (is it still showing up?) — plus the source occurrence ids behind the
Last-seen and Best-score cells so the frontend can deep-link each to the
identification modal.

The whole selection is gated behind ?with_example_occurrences=true so the default
list keeps its latency budget, especially on the ?collection= (detections-join)
path. When off, the three fields serialize as null.

- TaxonQuerySet.with_example_occurrence_ids: three correlated subqueries
  (index-served on the default path), with the verified-vs-unverified branch
  chosen by a precomputed verified-taxon set.
- Extract verified_taxon_counts() so the verified rollup is computed once and
  shared by with_verification_counts and the example dispatch.
- TaxonViewSet hydrates the chosen example ids into {id, detection_id, image_url,
  score, verified} in one query per page (no N+1).
- TaxonPagination.get_count strips annotations before the COUNT (mirrors
  ProjectPagination) so the example subqueries are not evaluated for every taxon
  in the project via TagInverseFilter's .distinct().

Co-Authored-By: Claude <noreply@anthropic.com>
… list

Frontend for issue #1320. Each taxon row gains a non-sortable Example column
showing a thumbnail of the occurrence to verify; clicking it opens the existing
occurrence identification modal (Agree / Suggest ID) over the taxa list, so a user
can sweep unverified taxa and confirm presence one row at a time. The Last-seen and
Best-score cells link to the same modal for their source occurrence.

- Extract OccurrenceDetailsDialog from the occurrences page and reuse it on the
  taxa list, keyed off a ?verifyOccurrence= search param so the list stays behind.
- Species model: verificationExample / bestScoringOccurrenceId /
  lastDetectedOccurrenceId getters over the new API fields; useSpecies requests
  ?with_example_occurrences=true.
- Dim already-verified rows (new optional rowClassName hook on the Table) and mark
  them with a verified icon.
- Invalidate the taxa list query after an identification so verified counts and the
  example thumbnail refresh without a reload.

Co-Authored-By: Claude <noreply@anthropic.com>
…taxa list

Entering the occurrence modal from the taxa list is a verification action, so it
now opens on the Identification tab (Agree / Suggest ID) instead of Fields. Added
an optional defaultTab prop to OccurrenceDetailsDialog; the occurrences list keeps
its Fields default.

Co-Authored-By: Claude <noreply@anthropic.com>
@netlify

netlify Bot commented Jul 2, 2026

Copy link
Copy Markdown

Deploy Preview for antenna-ssec canceled.

Name Link
🔨 Latest commit fd5ac06
🔍 Latest deploy log https://app.netlify.com/projects/antenna-ssec/deploys/6a46dfdfd09a0400089726f3

@netlify

netlify Bot commented Jul 2, 2026

Copy link
Copy Markdown

Deploy Preview for antenna-preview canceled.

Name Link
🔨 Latest commit fd5ac06
🔍 Latest deploy log https://app.netlify.com/projects/antenna-preview/deploys/6a46dfdf1e6f2e00085dd076

@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3477133d-77e0-4663-bd9a-2d46035948b2

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/taxa-presence-verification

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Note

Unit test generation is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it.


Generating unit tests... This may take up to 20 minutes.

@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Request timed out after 900000ms (requestId=af2a3ee8-73a5-45e3-bdbd-af9b43319e2a)

Follow-up to the presence-verification Example column (#1320), applying the
structural fixes surfaced by a takeaway review.

Gate draft-project data behind visibility. The taxa list annotates observed-
occurrence data — per-taxon counts and, under ?with_example_occurrences, example
occurrence ids plus detection crop URLs — via subqueries that are not visibility-
gated, so a non-member could read it for a draft project. TaxonViewSet.get_queryset
now refuses a project the user cannot see with a 404, matching the sibling
project-scoped taxa endpoints (top-identifiers, model-agreement).

Single-source the nested example shape. ExampleOccurrenceSerializer is now the one
declaration of {id, detection_id, image_url, score, verified}; the view hydrates a
page of occurrences and serializes them through it instead of hand-building a dict,
and it types the field for the OpenAPI schema.

Remove the duplicate best-detection subquery. with_best_detection() now also
annotates best_detection_id, picked from the same detection as best_detection_path,
so the example's detection_id and its image can no longer drift apart.

Tests: a higher-rank taxon used directly for identifications now has a pinned
example (only pure roll-up ancestors are NULL); the collection path asserts the
example is drawn from the same set occurrences_count reports; a draft project hides
examples from a non-member while a member still sees them.

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Presence verification workflow from the taxa view

1 participant