Skip to content

Fix null detections in exports & API. Don't mark images as processed too soon#1312

Merged
mihow merged 11 commits into
mainfrom
fix/premptive-processed-marker
Jun 23, 2026
Merged

Fix null detections in exports & API. Don't mark images as processed too soon#1312
mihow merged 11 commits into
mainfrom
fix/premptive-processed-marker

Conversation

@mihow

@mihow mihow commented May 20, 2026

Copy link
Copy Markdown
Collaborator

Summary

Fixes #1310.

Null detections (empty-bbox sentinels marking "image processed, nothing found") were being created before the downstream save steps inside save_results. Two consequences:

  1. Preemptive processed marker. If any downstream step raised, the image was already flagged as processed via the null marker. filter_processed_images would then skip it on retry, leaving the image permanently stuck as "processed, zero detections." Observed in production where several hundred captures had only null detections and no real ones.
  2. Phantom occurrences. create_and_update_occurrences_for_detections iterated every detection including nulls, so each null marker spawned an Occurrence with determination=NULL. Those leaked through OccurrenceQuerySet.valid() (which only excluded occurrences with zero detections, not occurrences whose only detection is a null).

Reviewer heads-up — silent semantic change to OccurrenceQuerySet.valid()

valid() changes meaning from "has any detection" to "has at least one Detection.objects.valid() row AND determination is not null." Three call sites pick this up without any line change at the call site:

  • OccurrenceViewSet.get_queryset (ami/main/api/views.py) — intended target of the fix. Phantom occurrences stop appearing in the list endpoint.
  • project summary stats occurrences_count (ami/main/api/views.py) — will silently decrease on any deployment that has accumulated phantoms. No-op on clean deployments.
  • DwC-A export (ami/exports/format_types.py) — null-determination occurrences will be excluded from exports. Probably correct (DwC requires taxonID) but not validated against an actual export run in this PR.

If any of those three are load-bearing in a way I'm missing, flag it.

Changes

  1. test(ml) — RED test for the broker-outage path: asserts the null marker is never persisted if create_detection_images.delay raises and filter_processed_images re-yields the image.
  2. fix(ml) — move null persistence to the absolute final step in save_results. Null markers now run after the source_image.save() loop, create_detection_images.delay(), update_calculated_fields_for_events, and Deployment.update_calculated_fields(save=True). Closes the silent-bug window the prior reorder left open.
  3. refactor(main) — null-marker abstraction on Detection.
    • Detection.NULL_BBOX = None — canonical sentinel value for new writes.
    • Detection.is_null_marker property — recognises both bbox=None and legacy bbox=[].
    • Detection.build_null_marker(source_image, detection_algorithm) classmethod — single construction point.
    • DetectionQuerySet.valid() — consumer default (excludes null markers).
    • DetectionQuerySet.null_markers() — narrow, for "has this image been processed?" checks.
  4. refactor(main) — sweep inline NULL_DETECTIONS_FILTER call sites to the new manager methods across ami/main/models.py, ami/main/api/views.py, ami/ml/models/pipeline.py, plus a null_detections_q(prefix) helper for relation-prefixed Q expressions.
  5. fix(main) — tighten OccurrenceQuerySet.valid() to require at least one valid detection AND a non-null determination. Closes the phantom-Occurrence leak. See the reviewer heads-up above for the consumers that pick up the new semantic.
  6. feat(main)cleanup_null_only_occurrences management command for per-project cleanup of the field bug. Dry-run by default. Deletes phantom occurrences (no valid detections OR null determination) and dangling null-marker Detection rows on source images that have no real detection. Idempotent.

Test plan

  • test_null_marker_not_persisted_when_broker_dispatch_fails — RED, then GREEN after move-to-end.
  • TestDetectionNullMarkeris_null_marker for None / [] / real bbox, build_null_marker field setup, valid() / null_markers() disjointness.
  • TestOccurrenceValidQuerySet — fixture with real / null-only-detection / null-determination occurrences; asserts valid() returns only the real, fully-determined one.
  • TestCleanupNullOnlyOccurrencesCommand — dry-run reports without deleting; --commit deletes phantoms (both the no-real-detection arm and the null-determination arm) while preserving valid rows and null markers on images that also have a real detection; idempotent on second run.
  • Full ami/main/tests.py + ami/ml/tests.py + ami/jobs/tests/ pass locally.

Manual e2e (dev deployment)

  1. Happy path async_api job — a small collection through an ML pipeline. No new phantom occurrences.
  2. Broker-outage simulation — patched create_detection_images.delay to raise mid-job. 0 null markers persisted, 0 phantoms; image stays in the filter_processed_images yield list.
  3. Calc-field DB error — patched update_calculated_fields_for_events to raise. Same result: 0 null markers persisted.
  4. Cleanup command — dry-run on a dev deployment reported the expected phantom + dangling-null-marker counts; --commit removed exactly those rows and left valid() counts unchanged; a second dry-run reported 0 / 0 (idempotent). Running it against any affected production deployment is a post-merge ops step.

Out of scope — deferred follow-up

transaction.atomic() wrap. The persistence block (real detections → classifications → occurrences → calc-fields → null marker) can still partially commit if a mid-block step raises. This PR closes the ordering window (null marker writing before downstream steps); it does not close the within-block partial-commit window. A narrow transaction.atomic() wrap with transaction.on_commit for celery dispatch is the structural fix, deferred to a separate PR because transaction changes carry concurrency risk (see the select_for_update + ATOMIC_REQUESTS contention introduced by #1261) and need their own multi-worker e2e.

Dual-form bbox=None vs bbox=[]. New writes go through Detection.NULL_BBOX = None; legacy rows still carry bbox=[]. .null_markers() / .is_null_marker / null_detections_q() all recognise both, so no consumer breaks, but the dual form persists until a data migration backfills legacy rows. Worth a follow-up ticket.

Re-classification gap. Adjacent: filter_processed_images currently reprocesses from scratch because there is no mechanism to reclassify existing detections. Worth a separate ticket.

Summary by CodeRabbit

  • New Features

    • Added management command to clean up phantom detection records from existing data.
  • Bug Fixes

    • Improved detection pipeline handling of images with no detections to prevent orphan records.
    • Fixed API detection filtering to consistently exclude empty-detection markers for accurate results.
  • Tests

    • Added regression test coverage for empty detection scenarios and cleanup operations.

Copilot AI review requested due to automatic review settings May 20, 2026 00:42
@netlify

netlify Bot commented May 20, 2026

Copy link
Copy Markdown

Deploy Preview for antenna-ssec canceled.

Name Link
🔨 Latest commit a87133d
🔍 Latest deploy log https://app.netlify.com/projects/antenna-ssec/deploys/6a39ee6697c58700083ac648

@netlify

netlify Bot commented May 20, 2026

Copy link
Copy Markdown

Deploy Preview for antenna-preview canceled.

Name Link
🔨 Latest commit 4e33f96
🔍 Latest deploy log https://app.netlify.com/projects/antenna-preview/deploys/6a0d035a479bdb000845b3ed

@netlify

netlify Bot commented May 20, 2026

Copy link
Copy Markdown

Deploy Preview for antenna-preview canceled.

Name Link
🔨 Latest commit a87133d
🔍 Latest deploy log https://app.netlify.com/projects/antenna-preview/deploys/6a39ee66c7e5f50008e964f6

@coderabbitai

coderabbitai Bot commented May 20, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@mihow, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 34 minutes and 9 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 630187bf-91aa-4b28-be84-09d32f6847d8

📥 Commits

Reviewing files that changed from the base of the PR and between 8c019e4 and a87133d.

📒 Files selected for processing (1)
  • ami/ml/models/pipeline.py
📝 Walkthrough

Walkthrough

This PR fixes issue #1310 by deferring null-marker detection creation in save_results to after all real detections, classifications, and occurrences are persisted. It centralizes null-marker semantics into DetectionQuerySet.valid()/null_markers(), Detection.build_null_marker(), and null_detections_q(), updates all filtering call-sites, adds a cleanup_null_only_occurrences management command for existing phantom data, and adds regression tests.

Changes

Null-marker lifecycle and pipeline ordering

Layer / File(s) Summary
Detection / Occurrence query abstractions and model API
ami/main/models.py
Introduces null_detections_q(), redefines NULL_DETECTIONS_FILTER to SQL-null only, adds DetectionQuerySet.valid()/null_markers(), Detection.NULL_BBOX/is_null_marker/build_null_marker(), reworks OccurrenceQuerySet.valid() to require existing valid detections, and updates all detection-count call-sites (Deployment, Event, SourceImage, update_detection_counts, collection queryset, sample_detections_only) to use .valid() or the helper predicate.
API view queries switched to .valid()
ami/main/api/views.py
Removes NULL_DETECTIONS_FILTER import and replaces all usages in SourceImageViewSet.filter_by_has_detections, SourceImageViewSet.prefetch_detections, and DetectionViewSet.queryset with Detection.objects.valid().
Pipeline: defer null-marker detection creation to post-persistence
ami/ml/models/pipeline.py
Removes early null DetectionResponse appending before create_detections, updates get_or_create_detection to dedup via .null_markers().first(), and adds a final block in save_results that creates and persists null-marker detections only after real detection/classification/occurrence writes complete.
Management command: cleanup_null_only_occurrences
ami/main/management/commands/cleanup_null_only_occurrences.py
New command that reports (dry-run) and optionally deletes phantom Occurrence rows and dangling null-marker Detection rows for a specified project, performing deletions atomically with --commit.
Tests: null-marker behavior, valid querysets, and cleanup command
ami/main/tests.py, ami/ml/tests.py
Adds TestDetectionNullMarker, TestOccurrenceValidQuerySet, and TestCleanupNullOnlyOccurrencesCommand; adds three TestPipeline regression tests verifying null markers don't create phantom Occurrence rows and are not persisted when downstream persistence or broker dispatch fails.

Sequence Diagram(s)

sequenceDiagram
  participant Pipeline as save_results
  participant CD as create_detections
  participant CC as create_classifications
  participant CO as create_occurrences
  participant CDN as create_detections (null markers)

  Note over Pipeline: OLD: null markers added to results.detections BEFORE real processing
  Pipeline->>CD: create_detections(results.detections including nulls)
  Pipeline->>CC: create_classifications(...)
  Pipeline->>CO: create_occurrences(...) ← phantom Occurrences created here

  Note over Pipeline: NEW: null markers created AFTER all real processing
  Pipeline->>CD: create_detections(real detections only)
  Pipeline->>CC: create_classifications(...)
  Pipeline->>CO: create_occurrences(...) ← no phantom Occurrences
  Pipeline->>CDN: create_detections(null_detection_responses for undetected images)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • RolnickLab/antenna#1322: Rewrites filter_processed_images to classify real vs null detections via bbox_is_null, directly overlapping with the NULL_DETECTIONS_FILTER/bbox_is_null behavior changed in this PR.

Suggested reviewers

  • annavik

Poem

🐇 A null bbox once snuck in too soon,
Creating ghost moths by the light of the moon.
Now sentinels wait till the real work is done,
No phantom occurrences under the sun.
The cleanup command sweeps the old mess away—
This rabbit hops proudly through fix-it day! 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 51.11% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main fix: null detections are no longer created prematurely, and images aren't marked as processed until all steps complete.
Description check ✅ Passed The description comprehensively covers Summary, List of Changes, Related Issues (#1310), Detailed Description with side effects, How to Test, and Deployment Notes with manual e2e validation results.
Linked Issues check ✅ Passed The PR fully addresses #1310 objectives: null markers deferred to final step, phantom occurrences prevented via tightened OccurrenceQuerySet.valid(), and filter_processed_images re-yields images on failure.
Out of Scope Changes check ✅ Passed All changes are directly scoped to #1310: null-marker abstraction, DetectionQuerySet/OccurrenceQuerySet refactoring, call-site updates, cleanup command, and regression tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/premptive-processed-marker

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a pipeline persistence ordering bug where null (bbox=None) “processed, nothing found” detection markers were created too early, causing images to be skipped on retry after downstream failures and occasionally creating phantom Occurrence rows tied only to null detections.

Changes:

  • Reorders save_results to persist real detections/classifications/occurrences first, then creates null detection markers in a second pass.
  • Ensures null detections never enter the classification/occurrence creation paths.
  • Adds regression tests for “no phantom occurrence on null” and “no processed marker after failure”.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
ami/ml/models/pipeline.py Moves null-marker creation to after real detection/classification/occurrence persistence and saves nulls via a separate create_detections call.
ami/ml/tests.py Adds tests covering the phantom-occurrence regression and retry behavior after a simulated downstream failure.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread ami/ml/models/pipeline.py Outdated
mihow added a commit that referenced this pull request May 20, 2026
Plan for the takeaway-review follow-up work on PR #1312:
move-null-to-end + null-detection abstraction (DetectionQuerySet.valid /
.null_markers, Detection.is_null_marker, Detection.build_null_marker) +
sweep call sites + tighten OccurrenceQuerySet.valid + cleanup command for
project 171. Captures rationale for splitting transaction.atomic into a
separate follow-up PR (PR-1261 scar).

Co-Authored-By: Claude <noreply@anthropic.com>
@mihow

mihow commented May 20, 2026

Copy link
Copy Markdown
Collaborator Author

Claude says: Pushed takeaway-review follow-up commits on top of the original fix. Summary of what's new since the original push:

Commits added (e5f004a is the new HEAD):

  1. test(ml) — RED test for broker-outage leaving null marker (create_detection_images.delay raises)
  2. fix(ml) — move null persistence to the absolute final step in save_results, after calc-field updates and celery dispatch. Closes the silent-bug window @copilot caught.
  3. refactor(main) — add Detection.NULL_BBOX, Detection.is_null_marker, Detection.build_null_marker, DetectionQuerySet.valid() (consumer default), DetectionQuerySet.null_markers() (narrow processed-check). .valid() is named to absorb future predicates (soft-delete, missing algo).
  4. refactor(main) — sweep all inline NULL_DETECTIONS_FILTER call sites to the new manager methods. Fixes the drifted ~Q(...bbox__isnull=True) & ~Q(...bbox=[]) aggregate filter via a new null_detections_q(prefix) helper.
  5. fix(main) — tighten OccurrenceQuerySet.valid() to require ≥1 valid Detection AND non-null determination. Closes the phantom-Occurrence API leak from the field-bug data.
  6. feat(main)cleanup_null_only_occurrences management command (dry-run default, idempotent) for cleaning up the existing field-bug rows.

What's NOT in this PR (deferred to follow-up):

  • transaction.atomic() wrap around the persistence block. The remaining failure surface is "mid-block raise leaves partial state committed." A narrow atomic() with on_commit for celery dispatch is the structural fix, but tx changes carry the PR-fix(jobs): fixes for concurrent ML processing jobs #1261 concurrency scar (select_for_update + ATOMIC_REQUESTS contention) and need their own multi-worker e2e plan. This PR closes ~95% of the failure window by moving null persistence to the very end without taking on that concurrency risk.

Local test status:

  • 176/176 in ami.ml.tests + ami.main.TestDetectionNullMarker + ami.jobs.tests
  • 385/389 in the broader sweep. The 4 TestRolePermissions failures reproduce on the prior PR head 4e33f96 — they're pre-existing, unrelated to this PR.

Manual e2e on a dev deployment is queued next.

@mihow

mihow commented May 20, 2026

Copy link
Copy Markdown
Collaborator Author

Claude says: E2E validation on a dev deployment complete. All three paths pass.

Path 1 — Happy path (async_api job)

test_ml_job_e2e --project 9 --collection 38 --pipeline quebec_vermont_moths_2023 --dispatch-mode async_api

Job completed in 28.21s. 10 images, 68 detections, 121 classifications, 0 failed. NATS path through the new ordering works end-to-end.

Path 2 — Broker outage simulation

Live save_results call inside transaction.atomic() with create_detection_images.delay patched to raise RuntimeError. Pipeline: quebec_vermont_moths_2023. Two source images: one with a real detection in the response, one without.

=== broker outage ===
BEFORE:        null_dets=0  phantoms=0
DURING-TX:     null_dets=0  phantoms=0    ← image without real-detection NEVER gets a null marker
AFTER-ROLLBACK:null_dets=0  phantoms=0
DELTA-NULL inside tx: 0

Pre-fix behavior would have shown DURING-TX: null_dets=1 because the null marker was persisted before create_detection_images.delay ran. With null persistence moved to the absolute final step, the marker is never written.

Path 3 — Calc-field DB error

Same shape, with update_calculated_fields_for_events patched to raise instead.

=== calc-field DB error ===
BEFORE:        null_dets=0  phantoms=0
DURING-TX:     null_dets=0  phantoms=0
AFTER-ROLLBACK:null_dets=0  phantoms=0
DELTA-NULL inside tx: 0

Same result — null marker never persisted because the failure runs before null creation in the new order.

Cleanup command dry-run

Ran on dev DB to validate idempotent dry-run behavior (the real project-171 cleanup will run after merge against a prod DB):

$ python manage.py cleanup_null_only_occurrences --project 9
Project #9 (Workshop Test - Michael):
  Phantom occurrences (no valid detection or null determination): 112
  Orphan null-marker detections on images with no real detections: 112
Dry run — pass --commit to delete.

The 112 phantoms match the delta between Project.occurrences.count() and Project.occurrences.valid().count() after this PR's OccurrenceQuerySet.valid() tightening — confirms the new .valid() predicate identifies exactly the rows the cleanup command targets.

Branch is in good shape

Test counts:

  • 14/14 TestDetectionNullMarker / TestOccurrenceValidQuerySet / TestCleanupNullOnlyOccurrencesCommand
  • 176/176 ami.ml.tests + ami.main.TestDetectionNullMarker + ami.jobs.tests
  • 385/389 broader sweep (4 pre-existing TestRolePermissions failures unrelated to this PR)
  • 3 live e2e paths validated on the dev box

Ready for review.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
ami/main/api/views.py (1)

613-628: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

has_detections still disagrees with with_detections.

This prefetch correctly drops null markers, but filter_by_has_detections() still checks for any Detection row. /captures/?has_detections=true&with_detections=true can therefore return captures whose filtered_detections is empty.

Suggested fix
     def filter_by_has_detections(self, queryset: QuerySet) -> QuerySet:
         has_detections = self.request.query_params.get("has_detections")
         if has_detections is not None:
             has_detections = BooleanField(required=False).clean(has_detections)
             queryset = queryset.annotate(
-                has_detections=models.Exists(Detection.objects.filter(source_image=models.OuterRef("pk"))),
+                has_detections=models.Exists(
+                    Detection.objects.valid().filter(source_image=models.OuterRef("pk"))
+                ),
             ).filter(has_detections=has_detections)
         return queryset
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ami/main/api/views.py` around lines 613 - 628, The has_detections check
currently looks for any Detection row while filtered_detections drops
null/non-qualifying rows, causing mismatches; update filter_by_has_detections to
only count detections that meet the same criteria used for the prefetch (i.e.
use the annotated Detection queryset created from Detection.objects.valid() that
includes occurrence_meets_criteria or an Exists/Subquery against
qualifying_occurrence_ids) so the filter requires existence of at least one
Detection with occurrence_meets_criteria=True (or occurrence_id in
qualifying_occurrence_ids and score >= score) rather than any Detection row;
adjust the logic referencing filter_by_has_detections,
Detection.objects.valid(), occurrence_meets_criteria, qualifying_occurrence_ids
and filtered_detections accordingly.
ami/ml/models/pipeline.py (1)

442-459: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Dedup the legacy bbox=[] null marker here too.

This lookup only matches canonical bbox IS NULL rows. If the image already has a legacy empty-list sentinel for the same algorithm, reprocessing will create a second null marker instead of reusing the existing one.

Suggested fix
-        existing_detection = Detection.objects.filter(
-            source_image=source_image,
-            bbox__isnull=True,
-            detection_algorithm=detection_algo,
-        ).first()
+        existing_detection = (
+            Detection.objects.filter(
+                source_image=source_image,
+                detection_algorithm=detection_algo,
+            )
+            .null_markers()
+            .first()
+        )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ami/ml/models/pipeline.py` around lines 442 - 459, The dedupe query only
checks for bbox__isnull=True and misses legacy empty-list sentinel rows, causing
duplicate null markers; update the existing_detection lookup in the pipeline to
include both canonical NULL and the legacy sentinel by OR-ing bbox__isnull=True
with bbox equal to the legacy sentinel (use Detection.NULL_BBOX or the
empty-list value) — e.g., replace the single filter(...) call that sets
existing_detection with a query using Q(...) or bbox__in=[None,
Detection.NULL_BBOX] while keeping the same source_image and detection_algorithm
(refer to Detection, existing_detection, detection_algo, and
detection_resp.algorithm.key).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ami/main/management/commands/cleanup_null_only_occurrences.py`:
- Around line 76-82: The success message currently uses the delete() return
values (null_deleted, phantom_deleted) which include cascaded rows; replace
those with the pre-calculated counters (null_count and phantom_count) when
writing the final message—inside the same transaction.atomic() block after
orphan_null_markers.delete() and phantom_occs.delete(), call
self.stdout.write(self.style.SUCCESS(f"Deleted {phantom_count} phantom
occurrences and {null_count} orphan null markers.")) so the log reports only the
intended occurrence counts.

In `@ami/main/models.py`:
- Around line 4165-4169: The sampling for detections_only is out of sync: the
annotate uses source_images_with_detections_count with
filter=~null_detections_q("images__detections__") but
SourceImageCollection.sample_detections_only() still uses the simpler
detections__isnull=False and can include null-marker-only images; update
sample_detections_only to apply the same filter logic (i.e. use
null_detections_q("images__detections__") negated or reuse the same annotated
queryset/condition) so the sampled collection matches the
source_images_with_detections_count semantics and only includes images that the
new counter considers as having detections.
- Around line 2837-2844: The null-marker builder currently sets
timestamp=timezone.now() in Detection.build_null_marker which forces the marker
to use processing time; remove that explicit timestamp (or set timestamp=None)
so the Detection instance is created without a timestamp and Detection.save()
can backfill the correct capture timestamp; update the build_null_marker
constructor call that uses cls(NULL_BBOX, source_image, detection_algorithm,
...) accordingly and keep other fields (source_image, bbox=cls.NULL_BBOX,
detection_algorithm) unchanged.

---

Outside diff comments:
In `@ami/main/api/views.py`:
- Around line 613-628: The has_detections check currently looks for any
Detection row while filtered_detections drops null/non-qualifying rows, causing
mismatches; update filter_by_has_detections to only count detections that meet
the same criteria used for the prefetch (i.e. use the annotated Detection
queryset created from Detection.objects.valid() that includes
occurrence_meets_criteria or an Exists/Subquery against
qualifying_occurrence_ids) so the filter requires existence of at least one
Detection with occurrence_meets_criteria=True (or occurrence_id in
qualifying_occurrence_ids and score >= score) rather than any Detection row;
adjust the logic referencing filter_by_has_detections,
Detection.objects.valid(), occurrence_meets_criteria, qualifying_occurrence_ids
and filtered_detections accordingly.

In `@ami/ml/models/pipeline.py`:
- Around line 442-459: The dedupe query only checks for bbox__isnull=True and
misses legacy empty-list sentinel rows, causing duplicate null markers; update
the existing_detection lookup in the pipeline to include both canonical NULL and
the legacy sentinel by OR-ing bbox__isnull=True with bbox equal to the legacy
sentinel (use Detection.NULL_BBOX or the empty-list value) — e.g., replace the
single filter(...) call that sets existing_detection with a query using Q(...)
or bbox__in=[None, Detection.NULL_BBOX] while keeping the same source_image and
detection_algorithm (refer to Detection, existing_detection, detection_algo, and
detection_resp.algorithm.key).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 277e1e60-2244-4c0e-9ce4-f2f61f800478

📥 Commits

Reviewing files that changed from the base of the PR and between 4e33f96 and e5f004a.

📒 Files selected for processing (8)
  • ami/main/api/views.py
  • ami/main/management/commands/cleanup_null_only_occurrences.py
  • ami/main/models.py
  • ami/main/tests.py
  • ami/ml/models/pipeline.py
  • ami/ml/tests.py
  • docs/claude/planning/pr-1312-null-marker-followup.md
  • docs/claude/sessions/2026-05-19-pr-1312-premptive-processed-marker.md
✅ Files skipped from review due to trivial changes (2)
  • docs/claude/sessions/2026-05-19-pr-1312-premptive-processed-marker.md
  • docs/claude/planning/pr-1312-null-marker-followup.md

Comment thread ami/main/management/commands/cleanup_null_only_occurrences.py Outdated
Comment thread ami/main/models.py
Comment thread ami/main/models.py
mihow added a commit that referenced this pull request May 20, 2026
Five drift / quick-win fixes from PR #1312 review:

1. ami/main/api/views.py — filter_by_has_detections now uses
   Detection.objects.valid() so /captures/?has_detections=true agrees
   with the with_detections prefetch (which already drops null markers).
   Without this, has_detections=true could return captures whose
   filtered_detections array was empty.

2. ami/main/models.py — SourceImageCollection.sample_detections_only
   now samples by Detection.objects.valid() instead of
   detections__isnull=False, matching the tightened
   source_images_with_detections_count annotation. A detections_only
   collection no longer admits images that have only null markers.

3. ami/main/models.py — Detection.build_null_marker drops
   timestamp=timezone.now(). Detection.save() backfills timestamp from
   the source image's capture time, so the marker sorts/filters by
   capture time rather than processing time. Test asserts
   timestamp is None on the builder output.

4. ami/ml/models/pipeline.py — get_or_create_detection null-marker
   dedup now goes through .null_markers() so legacy bbox=[] sentinels
   from older runs are re-used. The lookup is still
   detection_algorithm-scoped, so the wider .null_markers() predicate
   stays narrow at the call site (no false matches across algorithms).

5. ami/main/management/commands/cleanup_null_only_occurrences.py —
   success message now reports the pre-calculated phantom_count /
   null_count instead of the .delete() return tuple, which includes
   cascade-deleted rows and would mislead the operator about what the
   command targeted.

68/68 in ami.main null-marker tests + ami.ml.tests pass.

Co-Authored-By: Claude <noreply@anthropic.com>
@mihow

mihow commented May 20, 2026

Copy link
Copy Markdown
Collaborator Author

Claude says: Pushed CodeRabbit feedback round in dda794f8. Five fixes applied:

Inline-thread items (3) — resolved on each thread:

  • models.py:2843 build_null_marker — dropped timestamp=timezone.now() so Detection.save() backfills from capture time
  • models.py:4168 sample_detections_only — now samples via Detection.objects.valid().values("source_image_id") to align with the tightened count semantics
  • cleanup_null_only_occurrences.py:82 — success message uses pre-calculated counts, with a comment explaining .delete() cascade-count inflation

Outside-diff items (2) — applied in same commit:

  • views.py:613-628 filter_by_has_detections — now uses Detection.objects.valid() so /captures/?has_detections=true agrees with the with_detections prefetch that already excludes null markers. Without this fix, a capture with only null markers would surface as "has detections" but its filtered_detections array would be empty.
  • pipeline.py:442-459 null-marker dedup in get_or_create_detection — switched to .null_markers() so legacy bbox=[] sentinels from older runs are reused. The lookup is still detection_algorithm-scoped, so widening bbox to OR [] doesn't cross algorithm boundaries.

68/68 in ami.main null-marker tests + ami.ml.tests pass post-fix. CI green pre-fix; re-running now.

@mihow

mihow commented May 21, 2026

Copy link
Copy Markdown
Collaborator Author

Claude says: Cleanup command validated on a dev deployment (project 9) — 112 phantom occurrences + 112 orphan null markers deleted, post-cleanup counts match valid() queryset, second dry-run reports 0/0 (idempotent). Real-data confirmation now in the PR body's e2e section. reviewDecision is still REVIEW_REQUIRED — pinging for human eyes.

@mihow mihow changed the title fix(ml): create null detection markers only after real saves succeed Don't mark images as processed too soon May 27, 2026
@mihow

mihow commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator Author

Claude says: (posted on Michael's behalf — part of a structural review pass over the four open PSv2-readiness PRs: #1276, #1279, #1312, #1324, toward the umbrella #910.)

This one reads as the strongest of the four. The fix is structural rather than a delay: the null-marker write moves to the absolute end of save_results, so any exception leaves the image unmarked and filter_processed_images re-yields it, and retries stay idempotent through the dedupe in get_or_create_detection. Centralizing the sentinel behind Detection.NULL_BBOX / DetectionQuerySet.valid() and sweeping the ~12 call sites is exactly the missing abstraction this area needed, and the RED→GREEN tests at the broker-outage and classification-failure boundaries are convincing. Nice work.

Two things seem worth a look before merge / before running the cleanup command at scale:

  1. The cleanup command's delete predicate looks broader than the bug it targets. phantom_occs = all_occs.exclude(pk__in=valid_occs) also matches occurrences that have a real, classified detection but a null determination (e.g. partial-failure debris where classifications saved but the determination update didn't land). Deleting those SET_NULLs the detection's occurrence FK, and as far as we can tell no code path ever re-creates occurrences for orphaned detections — while the image now has valid detections with classifications, so filter_processed_images will skip it forever. The test fixtures only cover null-detection phantoms, so this case isn't pinned. Suggestion: restrict the delete to occurrences with no valid detections, or re-assert determination for the others. Worth patching before a --commit run on the larger affected projects.

  2. The OccurrenceQuerySet.valid() tightening deserves an explicit sign-off, since it silently changes three consumer surfaces. Determination-null occurrences with real detections disappear from the list API, project stats, and exports — which is the intended cleanup on dirty data, but it's also the failure visibility for any future partial-write debris (and a detection-only pipeline's occurrences would be hidden entirely). The PR body discloses the stats decrease honestly; flagging it here so a human reviewer confirms that trade deliberately rather than via the diff.

Tiny cleanup, low priority: NULL_DETECTIONS_FILTER and null_detections_q() are two definitions of the same predicate — the constant could be null_detections_q("") so they can't drift.

Test suite state and the e2e runs across the three failure paths look good. With item 1 patched and a human eyeball on item 2, this seems ready — and given #1312 fixes the "lost work under async" class, it's arguably the first of the four PSv2-readiness PRs that should land.

@mihow mihow added the PSv2 Async & distributed ML backend (PSv2): job state, NATS dispatch, result handling. Umbrella #515. label Jun 16, 2026
Issue #1310: null detections (empty-bbox sentinels marking "image processed,
nothing found") were created before create_detections / create_classifications /
create_and_update_occurrences_for_detections ran. Two consequences:

1. If any of those downstream steps failed, the image was already flagged as
   processed via the null marker — filter_processed_images would skip it on the
   next run, leaving the image permanently in a "processed but no detections"
   state. Observed on project 171 (400 captures with only null detections).
2. create_and_update_occurrences_for_detections iterated every detection
   including nulls, so each null marker spawned a phantom Occurrence with
   determination=NULL.

Fix in ami/ml/models/pipeline.py save_results:
- Run create_detections / create_classifications / create_and_update_occurrences
  on the real DetectionResponses only.
- After those succeed, build null DetectionResponses for images that ended up
  without any detections and persist them via a second create_detections call.
- Null responses never enter the classification / occurrence loops, so no
  phantom Occurrence is created even in the happy path.

Tests in ami/ml/tests.py TestPipeline:
- test_null_detection_does_not_create_phantom_occurrence: asserts the happy
  path "pipeline found nothing" creates the null marker but no Occurrence.
- test_captures_not_marked_processed_after_failure: asserts that when a
  downstream step (create_classifications) raises, the image without a real
  detection is left unmarked and filter_processed_images re-yields it.

Co-Authored-By: Claude <noreply@anthropic.com>
mihow added a commit that referenced this pull request Jun 22, 2026
Plan for the takeaway-review follow-up work on PR #1312:
move-null-to-end + null-detection abstraction (DetectionQuerySet.valid /
.null_markers, Detection.is_null_marker, Detection.build_null_marker) +
sweep call sites + tighten OccurrenceQuerySet.valid + cleanup command for
project 171. Captures rationale for splitting transaction.atomic into a
separate follow-up PR (PR-1261 scar).

Co-Authored-By: Claude <noreply@anthropic.com>
mihow added a commit that referenced this pull request Jun 22, 2026
Five drift / quick-win fixes from PR #1312 review:

1. ami/main/api/views.py — filter_by_has_detections now uses
   Detection.objects.valid() so /captures/?has_detections=true agrees
   with the with_detections prefetch (which already drops null markers).
   Without this, has_detections=true could return captures whose
   filtered_detections array was empty.

2. ami/main/models.py — SourceImageCollection.sample_detections_only
   now samples by Detection.objects.valid() instead of
   detections__isnull=False, matching the tightened
   source_images_with_detections_count annotation. A detections_only
   collection no longer admits images that have only null markers.

3. ami/main/models.py — Detection.build_null_marker drops
   timestamp=timezone.now(). Detection.save() backfills timestamp from
   the source image's capture time, so the marker sorts/filters by
   capture time rather than processing time. Test asserts
   timestamp is None on the builder output.

4. ami/ml/models/pipeline.py — get_or_create_detection null-marker
   dedup now goes through .null_markers() so legacy bbox=[] sentinels
   from older runs are re-used. The lookup is still
   detection_algorithm-scoped, so the wider .null_markers() predicate
   stays narrow at the call site (no false matches across algorithms).

5. ami/main/management/commands/cleanup_null_only_occurrences.py —
   success message now reports the pre-calculated phantom_count /
   null_count instead of the .delete() return tuple, which includes
   cascade-deleted rows and would mislead the operator about what the
   command targeted.

68/68 in ami.main null-marker tests + ami.ml.tests pass.

Co-Authored-By: Claude <noreply@anthropic.com>
@mihow mihow force-pushed the fix/premptive-processed-marker branch from dda794f to 2136405 Compare June 22, 2026 19:26
mihow and others added 2 commits June 22, 2026 16:38
Adds test_null_marker_not_persisted_when_broker_dispatch_fails to
TestPipeline. Patches create_detection_images.delay to raise, asserts
the unmatched image has no null marker persisted and that
filter_processed_images yields it for re-processing.

Verified RED against current ordering — null persistence still runs
before delay, so the assertion fails. Next commit moves null
persistence to the absolute final step.

Co-Authored-By: Claude <noreply@anthropic.com>
Closes the failure window the previous fix left open: null markers were
persisted after real-detection / classification / occurrence saves but
BEFORE source_image.save, create_detection_images.delay,
update_calculated_fields_for_events, and Deployment.update_calculated_fields.
A raise in any of those four steps (broker outage, DB error) still left
the image flagged as processed.

Null markers now run as the last write in save_results so they only
persist when every prior step succeeds. Remaining failure window is the
return statement.

Makes RED test from prior commit pass.

Co-Authored-By: Claude <noreply@anthropic.com>
mihow and others added 5 commits June 22, 2026 16:38
Introduces a single source of truth for "this detection row is a sentinel
that records that an algorithm ran against an image and found nothing":

- Detection.NULL_BBOX = None (canonical bbox value for new null markers)
- Detection.is_null_marker (recognises both bbox=None and legacy bbox=[])
- Detection.build_null_marker(source_image, detection_algorithm) classmethod
- DetectionQuerySet.valid() — consumer default, excludes null markers
- DetectionQuerySet.null_markers() — narrow, for "has this image been
  processed?" checks (renamed from .null_detections())

valid() is named to grow: future predicates to fold in include soft-delete
tombstones, detections missing an algorithm reference, and detections
missing classifications. Consumers asking "give me detections" should
default to .valid().

Adds TestDetectionNullMarker covering: is_null_marker for bbox=None /
bbox=[] / real bbox, build_null_marker field setup, and disjointness of
.valid() / .null_markers() over a fixture with all three row types.

Next commit sweeps existing inline NULL_DETECTIONS_FILTER usage to the
new API.

Co-Authored-By: Claude <noreply@anthropic.com>
…null_markers()

Migrates 7 inline NULL_DETECTIONS_FILTER usages to the new manager methods:
- Detection.objects.exclude(NULL_DETECTIONS_FILTER) → .valid()
- self.detections.exclude(NULL_DETECTIONS_FILTER) → self.detections.all().valid()
- subquery .exclude(NULL_DETECTIONS_FILTER) → .valid()
- aggregate filter at SourceImageCollectionQuerySet.with_source_images_with_detections_count
  was the drifted inline ~Q(...bbox__isnull=True) & ~Q(...bbox=[]); now uses a
  new null_detections_q(prefix) helper for relation-prefixed Q expressions.

Touched:
- ami/main/models.py: Deployment.get_detections_count, Event.get_detections_count,
  SourceImage.create_occurrences_from_detections, _annotate_detections_count_subquery,
  SourceImageCollectionQuerySet.with_source_images_with_detections_count
- ami/main/api/views.py: OccurrenceViewSet prefetch_queryset, DetectionViewSet.queryset
- ami/ml/models/pipeline.py: filter_processed_images null-only and unclassified checks

NULL_DETECTIONS_FILTER constant is retained at module level. Direct
get_or_create_detection lookup keeps bbox__isnull=True (algorithm-scoped, narrower
than .null_markers() which also includes legacy bbox=[] from other pipelines);
added a comment pointing readers to Detection.NULL_BBOX for the canonical sentinel.

176/176 tests in ami.ml, ami.main.TestDetectionNullMarker, ami.jobs pass.

Co-Authored-By: Claude <noreply@anthropic.com>
OccurrenceQuerySet.valid() previously only excluded occurrences with no
detections at all. Field bug from Issue #1310 created two new phantom
shapes that still leaked to the API:

1. Occurrences whose only detections are null-marker sentinels (no real
   bounding box backing them).
2. Occurrences with determination__isnull=True.

valid() now requires at least one .valid() Detection (real, non-null)
AND a non-null determination. Built on top of the new
Detection.objects.valid() helper so both layers stay in sync as the
predicate grows (soft-delete, missing algo, etc.).

Downstream callers updated automatically: ami/exports/format_types.py
and OccurrenceViewSet.get_queryset both invoke OccurrenceQuerySet.valid
and will now filter out the project-171 phantoms once existing rows are
cleaned up (next commit).

Adds TestOccurrenceValidQuerySet covering all three exclusion shapes.

Co-Authored-By: Claude <noreply@anthropic.com>
One-shot per-project cleanup for the Issue #1310 field bug. Deletes:
- Phantom occurrences (no valid detections OR null determination)
- Orphan null-marker Detection rows on source images with no real detections

After running, the affected source images become eligible for
re-processing by filter_processed_images on the next ML run.

Dry-run by default; pass --commit to delete.

  python manage.py cleanup_null_only_occurrences --project 171         # dry-run
  python manage.py cleanup_null_only_occurrences --project 171 --commit

Idempotent — re-running on a cleaned project reports zero candidates
and exits without touching the database.

Adds TestCleanupNullOnlyOccurrencesCommand covering dry-run, commit,
and idempotency. Valid occurrences and null markers on images with at
least one real detection are explicitly preserved.

Co-Authored-By: Claude <noreply@anthropic.com>
Five drift / quick-win fixes from PR #1312 review:

1. ami/main/api/views.py — filter_by_has_detections now uses
   Detection.objects.valid() so /captures/?has_detections=true agrees
   with the with_detections prefetch (which already drops null markers).
   Without this, has_detections=true could return captures whose
   filtered_detections array was empty.

2. ami/main/models.py — SourceImageCollection.sample_detections_only
   now samples by Detection.objects.valid() instead of
   detections__isnull=False, matching the tightened
   source_images_with_detections_count annotation. A detections_only
   collection no longer admits images that have only null markers.

3. ami/main/models.py — Detection.build_null_marker drops
   timestamp=timezone.now(). Detection.save() backfills timestamp from
   the source image's capture time, so the marker sorts/filters by
   capture time rather than processing time. Test asserts
   timestamp is None on the builder output.

4. ami/ml/models/pipeline.py — get_or_create_detection null-marker
   dedup now goes through .null_markers() so legacy bbox=[] sentinels
   from older runs are re-used. The lookup is still
   detection_algorithm-scoped, so the wider .null_markers() predicate
   stays narrow at the call site (no false matches across algorithms).

5. ami/main/management/commands/cleanup_null_only_occurrences.py —
   success message now reports the pre-calculated phantom_count /
   null_count instead of the .delete() return tuple, which includes
   cascade-deleted rows and would mislead the operator about what the
   command targeted.

68/68 in ami.main null-marker tests + ami.ml.tests pass.

Co-Authored-By: Claude <noreply@anthropic.com>
@mihow mihow force-pushed the fix/premptive-processed-marker branch from 2136405 to 8e4bbf3 Compare June 22, 2026 23:38
@mihow mihow changed the title Don't mark images as processed too soon Fix null detections in exports & API. Don't mark images as processed too soon Jun 22, 2026
…-determination phantom

Rename the 'orphan' terminology for left-behind null-marker detections to
'dangling' (the standard term for a reference with no live target, already
used for storage blobs in this module) across the cleanup command's output,
the valid() docstring, and the tests.

Also add a fixture case to the cleanup command test: an occurrence with a
real detection but no determination. This is the other arm of the tightened
valid() definition, and the test now asserts the command treats it as a
phantom and deletes it.
Comment thread ami/ml/models/pipeline.py Outdated
Comment thread ami/ml/models/pipeline.py
…ntinel

The cleanup_null_only_occurrences command deleted any occurrence excluded by
Occurrence.valid(), which also matches occurrences that have a real, classified
detection but a missing determination. Deleting those SET_NULLs the detection's
occurrence FK (Detection.occurrence is on_delete=SET_NULL) and strands a
classified detection on an image that filter_processed_images then skips forever.
Narrow the command's phantom predicate to "has no valid detection backing it" so
those partial-write occurrences are preserved; the test now asserts they survive
with their detection's FK intact.

Also drop the legacy bbox=[] null-marker form. It has zero rows in prod, demo,
and staging, no code writes it (the field defaults to SQL NULL and new markers
use Detection.NULL_BBOX = None), so the dual-form predicate was dead weight. The
sole sentinel is now bbox IS NULL, and NULL_DETECTIONS_FILTER is defined via
null_detections_q() so the constant and helper cannot drift.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mihow

mihow commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
ami/ml/models/pipeline.py (1)

1064-1082: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Split incoming null detections out before the early persistence path.

The new final block only delays app-generated null markers. If the processing service sends a DetectionResponse with bbox=None, line 1064 still persists it before classifications, occurrence updates, crop dispatch, and calculated-field updates—and line 1079 can still create a phantom occurrence for it. Filter results.detections to real detections for the early path, then persist both incoming and generated null markers in the final block.

Proposed fix
+    real_detection_responses = [detection for detection in results.detections if detection.bbox is not None]
+    incoming_null_detection_responses = [detection for detection in results.detections if detection.bbox is None]
+
     detections = create_detections(
-        detections=results.detections,
+        detections=real_detection_responses,
         algorithms_known=algorithms_known,
         logger=job_logger,
     )
 
     classifications = create_classifications(
         detections=detections,
-        detection_responses=results.detections,
+        detection_responses=real_detection_responses,
         algorithms_known=algorithms_known,
         logger=job_logger,
     )
@@
-    null_detection_responses = create_null_detections_for_undetected_images(
+    generated_null_detection_responses = create_null_detections_for_undetected_images(
         results=results,
         detection_algorithm=detection_algorithm,
         logger=job_logger,
     )
+    null_detection_responses = incoming_null_detection_responses + generated_null_detection_responses
     create_detections(
         detections=null_detection_responses,
         algorithms_known=algorithms_known,

Also applies to: 1102-1116

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ami/ml/models/pipeline.py` around lines 1064 - 1082, Filter out null
detections (those with bbox=None) from results.detections before the early
processing path starting at create_detections(). Create a filtered list of only
real detections and pass this filtered list to create_detections(),
create_classifications(), and create_and_update_occurrences_for_detections()
instead of the raw results.detections. Preserve the original results.detections
(including null markers) for the final persistence block that handles both
incoming and generated null markers, ensuring null detections bypass the early
classification, occurrence creation, and crop dispatch steps.
🧹 Nitpick comments (2)
ami/main/tests.py (1)

6041-6070: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Use explicit pixel-coordinate bboxes in new null-marker tests.

These tests currently use bbox=[0.0, 0.0, 1.0, 1.0], which reads like normalized coordinates. Switching to clear pixel literals (e.g. [10, 10, 20, 20]) would better match repo conventions and reduce ambiguity in future refactors/validation changes.

♻️ Suggested test-literal cleanup
-            bbox=[0.0, 0.0, 1.0, 1.0],
+            bbox=[10, 10, 20, 20],

Based on learnings, Detection.bbox in this repository uses absolute pixel coordinate space (not normalized [0–1] floats).

Also applies to: 6115-6150, 6209-6257

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ami/main/tests.py` around lines 6041 - 6070, In the test methods
test_is_null_marker_false_for_real_detection,
test_build_null_marker_sets_canonical_fields, and
test_valid_and_null_markers_are_disjoint_and_complete (and the other test ranges
noted), replace the bbox values of [0.0, 0.0, 1.0, 1.0] with explicit
pixel-coordinate values like [10, 10, 20, 20] to accurately reflect that
Detection.bbox uses absolute pixel coordinate space rather than normalized
coordinates, which will improve clarity and consistency with repository
conventions.

Source: Learnings

ami/ml/tests.py (1)

1261-1262: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Use the canonical null-marker queryset in these assertions.

Line 1261, Line 1307, and Line 1347 hardcode bbox__isnull=True. That bypasses the new null-marker abstraction and can miss legacy bbox=[] sentinels covered by .null_markers(). Align these checks with the canonical API.

♻️ Proposed refactor
-        null_dets = image.detections.filter(bbox__isnull=True)
+        null_dets = Detection.objects.null_markers().filter(source_image=image)
@@
-        null_dets = image_without_real.detections.filter(bbox__isnull=True)
+        null_dets = Detection.objects.null_markers().filter(source_image=image_without_real)
@@
-        null_dets = image_without_real.detections.filter(bbox__isnull=True)
+        null_dets = Detection.objects.null_markers().filter(source_image=image_without_real)

Also applies to: 1307-1310, 1347-1350

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ami/ml/tests.py` around lines 1261 - 1262, Replace the hardcoded
bbox__isnull=True filter in the three assertion locations (around lines 1261,
1307, and 1347) with the canonical .null_markers() method. Instead of using
image.detections.filter(bbox__isnull=True), change these to use
image.detections.null_markers() which properly handles both the new null-marker
abstraction and legacy bbox=[] sentinels. This ensures the assertions correctly
validate null markers through the proper API abstraction.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ami/main/api/views.py`:
- Line 1848: The `use_aggregation` variable assignment currently only checks for
the presence of the "collection" key in request.query_params without validating
whether it has a non-empty value. This causes unnecessary aggregation mode
activation when collection is passed as a blank query parameter (e.g.,
?collection=). Modify the condition to not only check if "collection" is present
in request.query_params but also validate that it contains a non-empty/valid
value before setting use_aggregation to true.

In `@ami/main/models.py`:
- Line 2445: The fetch_image_content() function lacks URL validation, creating
an SSRF vulnerability. Add URL validation logic to fetch_image_content() that
blocks unsafe URL schemes (file://, ftp://, etc.), rejects private and
link-local IP ranges (127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16,
169.254.0.0/16), and disables HTTP redirects or validates redirect targets
against the same restrictions. Additionally, add validation in the
S3StorageSourceSerializer to validate the public_base_url field at the source to
prevent malicious values from being set in the first place.

In `@ami/ml/models/pipeline.py`:
- Around line 121-128: The `iter(images)` call at line 121 forces Django
QuerySets into memory when processing large collections. Create a helper
function that detects if `images` is a Django QuerySet and uses
`iterator(chunk_size=batch_size)` to stream results instead of materializing
them all at once. If `images` is not a QuerySet, fall back to using
`iter(images)`. Update the `image_iter` assignment to use this helper function
so that QuerySets remain lazy-evaluated throughout the iteration in the while
loop.

---

Outside diff comments:
In `@ami/ml/models/pipeline.py`:
- Around line 1064-1082: Filter out null detections (those with bbox=None) from
results.detections before the early processing path starting at
create_detections(). Create a filtered list of only real detections and pass
this filtered list to create_detections(), create_classifications(), and
create_and_update_occurrences_for_detections() instead of the raw
results.detections. Preserve the original results.detections (including null
markers) for the final persistence block that handles both incoming and
generated null markers, ensuring null detections bypass the early
classification, occurrence creation, and crop dispatch steps.

---

Nitpick comments:
In `@ami/main/tests.py`:
- Around line 6041-6070: In the test methods
test_is_null_marker_false_for_real_detection,
test_build_null_marker_sets_canonical_fields, and
test_valid_and_null_markers_are_disjoint_and_complete (and the other test ranges
noted), replace the bbox values of [0.0, 0.0, 1.0, 1.0] with explicit
pixel-coordinate values like [10, 10, 20, 20] to accurately reflect that
Detection.bbox uses absolute pixel coordinate space rather than normalized
coordinates, which will improve clarity and consistency with repository
conventions.

In `@ami/ml/tests.py`:
- Around line 1261-1262: Replace the hardcoded bbox__isnull=True filter in the
three assertion locations (around lines 1261, 1307, and 1347) with the canonical
.null_markers() method. Instead of using
image.detections.filter(bbox__isnull=True), change these to use
image.detections.null_markers() which properly handles both the new null-marker
abstraction and legacy bbox=[] sentinels. This ensures the assertions correctly
validate null markers through the proper API abstraction.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0422a8e7-8613-4eda-b223-3416334f77a5

📥 Commits

Reviewing files that changed from the base of the PR and between e5f004a and 8c019e4.

📒 Files selected for processing (6)
  • ami/main/api/views.py
  • ami/main/management/commands/cleanup_null_only_occurrences.py
  • ami/main/models.py
  • ami/main/tests.py
  • ami/ml/models/pipeline.py
  • ami/ml/tests.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • ami/main/management/commands/cleanup_null_only_occurrences.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
ami/ml/models/pipeline.py (1)

1064-1082: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Split incoming null detections out before the early persistence path.

The new final block only delays app-generated null markers. If the processing service sends a DetectionResponse with bbox=None, line 1064 still persists it before classifications, occurrence updates, crop dispatch, and calculated-field updates—and line 1079 can still create a phantom occurrence for it. Filter results.detections to real detections for the early path, then persist both incoming and generated null markers in the final block.

Proposed fix
+    real_detection_responses = [detection for detection in results.detections if detection.bbox is not None]
+    incoming_null_detection_responses = [detection for detection in results.detections if detection.bbox is None]
+
     detections = create_detections(
-        detections=results.detections,
+        detections=real_detection_responses,
         algorithms_known=algorithms_known,
         logger=job_logger,
     )
 
     classifications = create_classifications(
         detections=detections,
-        detection_responses=results.detections,
+        detection_responses=real_detection_responses,
         algorithms_known=algorithms_known,
         logger=job_logger,
     )
@@
-    null_detection_responses = create_null_detections_for_undetected_images(
+    generated_null_detection_responses = create_null_detections_for_undetected_images(
         results=results,
         detection_algorithm=detection_algorithm,
         logger=job_logger,
     )
+    null_detection_responses = incoming_null_detection_responses + generated_null_detection_responses
     create_detections(
         detections=null_detection_responses,
         algorithms_known=algorithms_known,

Also applies to: 1102-1116

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ami/ml/models/pipeline.py` around lines 1064 - 1082, Filter out null
detections (those with bbox=None) from results.detections before the early
processing path starting at create_detections(). Create a filtered list of only
real detections and pass this filtered list to create_detections(),
create_classifications(), and create_and_update_occurrences_for_detections()
instead of the raw results.detections. Preserve the original results.detections
(including null markers) for the final persistence block that handles both
incoming and generated null markers, ensuring null detections bypass the early
classification, occurrence creation, and crop dispatch steps.
🧹 Nitpick comments (2)
ami/main/tests.py (1)

6041-6070: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Use explicit pixel-coordinate bboxes in new null-marker tests.

These tests currently use bbox=[0.0, 0.0, 1.0, 1.0], which reads like normalized coordinates. Switching to clear pixel literals (e.g. [10, 10, 20, 20]) would better match repo conventions and reduce ambiguity in future refactors/validation changes.

♻️ Suggested test-literal cleanup
-            bbox=[0.0, 0.0, 1.0, 1.0],
+            bbox=[10, 10, 20, 20],

Based on learnings, Detection.bbox in this repository uses absolute pixel coordinate space (not normalized [0–1] floats).

Also applies to: 6115-6150, 6209-6257

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ami/main/tests.py` around lines 6041 - 6070, In the test methods
test_is_null_marker_false_for_real_detection,
test_build_null_marker_sets_canonical_fields, and
test_valid_and_null_markers_are_disjoint_and_complete (and the other test ranges
noted), replace the bbox values of [0.0, 0.0, 1.0, 1.0] with explicit
pixel-coordinate values like [10, 10, 20, 20] to accurately reflect that
Detection.bbox uses absolute pixel coordinate space rather than normalized
coordinates, which will improve clarity and consistency with repository
conventions.

Source: Learnings

ami/ml/tests.py (1)

1261-1262: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Use the canonical null-marker queryset in these assertions.

Line 1261, Line 1307, and Line 1347 hardcode bbox__isnull=True. That bypasses the new null-marker abstraction and can miss legacy bbox=[] sentinels covered by .null_markers(). Align these checks with the canonical API.

♻️ Proposed refactor
-        null_dets = image.detections.filter(bbox__isnull=True)
+        null_dets = Detection.objects.null_markers().filter(source_image=image)
@@
-        null_dets = image_without_real.detections.filter(bbox__isnull=True)
+        null_dets = Detection.objects.null_markers().filter(source_image=image_without_real)
@@
-        null_dets = image_without_real.detections.filter(bbox__isnull=True)
+        null_dets = Detection.objects.null_markers().filter(source_image=image_without_real)

Also applies to: 1307-1310, 1347-1350

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ami/ml/tests.py` around lines 1261 - 1262, Replace the hardcoded
bbox__isnull=True filter in the three assertion locations (around lines 1261,
1307, and 1347) with the canonical .null_markers() method. Instead of using
image.detections.filter(bbox__isnull=True), change these to use
image.detections.null_markers() which properly handles both the new null-marker
abstraction and legacy bbox=[] sentinels. This ensures the assertions correctly
validate null markers through the proper API abstraction.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ami/main/api/views.py`:
- Line 1848: The `use_aggregation` variable assignment currently only checks for
the presence of the "collection" key in request.query_params without validating
whether it has a non-empty value. This causes unnecessary aggregation mode
activation when collection is passed as a blank query parameter (e.g.,
?collection=). Modify the condition to not only check if "collection" is present
in request.query_params but also validate that it contains a non-empty/valid
value before setting use_aggregation to true.

In `@ami/main/models.py`:
- Line 2445: The fetch_image_content() function lacks URL validation, creating
an SSRF vulnerability. Add URL validation logic to fetch_image_content() that
blocks unsafe URL schemes (file://, ftp://, etc.), rejects private and
link-local IP ranges (127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16,
169.254.0.0/16), and disables HTTP redirects or validates redirect targets
against the same restrictions. Additionally, add validation in the
S3StorageSourceSerializer to validate the public_base_url field at the source to
prevent malicious values from being set in the first place.

In `@ami/ml/models/pipeline.py`:
- Around line 121-128: The `iter(images)` call at line 121 forces Django
QuerySets into memory when processing large collections. Create a helper
function that detects if `images` is a Django QuerySet and uses
`iterator(chunk_size=batch_size)` to stream results instead of materializing
them all at once. If `images` is not a QuerySet, fall back to using
`iter(images)`. Update the `image_iter` assignment to use this helper function
so that QuerySets remain lazy-evaluated throughout the iteration in the while
loop.

---

Outside diff comments:
In `@ami/ml/models/pipeline.py`:
- Around line 1064-1082: Filter out null detections (those with bbox=None) from
results.detections before the early processing path starting at
create_detections(). Create a filtered list of only real detections and pass
this filtered list to create_detections(), create_classifications(), and
create_and_update_occurrences_for_detections() instead of the raw
results.detections. Preserve the original results.detections (including null
markers) for the final persistence block that handles both incoming and
generated null markers, ensuring null detections bypass the early
classification, occurrence creation, and crop dispatch steps.

---

Nitpick comments:
In `@ami/main/tests.py`:
- Around line 6041-6070: In the test methods
test_is_null_marker_false_for_real_detection,
test_build_null_marker_sets_canonical_fields, and
test_valid_and_null_markers_are_disjoint_and_complete (and the other test ranges
noted), replace the bbox values of [0.0, 0.0, 1.0, 1.0] with explicit
pixel-coordinate values like [10, 10, 20, 20] to accurately reflect that
Detection.bbox uses absolute pixel coordinate space rather than normalized
coordinates, which will improve clarity and consistency with repository
conventions.

In `@ami/ml/tests.py`:
- Around line 1261-1262: Replace the hardcoded bbox__isnull=True filter in the
three assertion locations (around lines 1261, 1307, and 1347) with the canonical
.null_markers() method. Instead of using
image.detections.filter(bbox__isnull=True), change these to use
image.detections.null_markers() which properly handles both the new null-marker
abstraction and legacy bbox=[] sentinels. This ensures the assertions correctly
validate null markers through the proper API abstraction.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0422a8e7-8613-4eda-b223-3416334f77a5

📥 Commits

Reviewing files that changed from the base of the PR and between e5f004a and 8c019e4.

📒 Files selected for processing (6)
  • ami/main/api/views.py
  • ami/main/management/commands/cleanup_null_only_occurrences.py
  • ami/main/models.py
  • ami/main/tests.py
  • ami/ml/models/pipeline.py
  • ami/ml/tests.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • ami/main/management/commands/cleanup_null_only_occurrences.py
🛑 Comments failed to post (3)
ami/main/api/views.py (1)

1848-1848: 🚀 Performance & Scalability | 🟡 Minor | ⚡ Quick win

Avoid switching to aggregation mode for blank collection query values.

use_aggregation = "collection" in request.query_params treats ?collection= as present and forces the heavier aggregation path even when no collection filter is actually applied. Gate on a validated/non-empty value instead.

💡 Suggested fix
-        use_aggregation = "collection" in request.query_params
+        collection_raw = request.query_params.get("collection")
+        use_aggregation = bool(collection_raw)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ami/main/api/views.py` at line 1848, The `use_aggregation` variable
assignment currently only checks for the presence of the "collection" key in
request.query_params without validating whether it has a non-empty value. This
causes unnecessary aggregation mode activation when collection is passed as a
blank query parameter (e.g., ?collection=). Modify the condition to not only
check if "collection" is present in request.query_params but also validate that
it contains a non-empty/valid value before setting use_aggregation to true.
ami/main/models.py (1)

2445-2445: 🔒 Security & Privacy | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Inspect the fetch helper and public_base_url exposure without executing repo code.
fd -i '^media.py$' -x rg -n -C20 'def fetch_image_content' {}
rg -n -C3 '\bpublic_base_url\b|find_or_generate_thumbnail_for_label|thumbnails_enabled' --type=py

Repository: RolnickLab/antenna

Length of output: 1314


🏁 Script executed:

# Find the find_or_generate_thumbnail_for_label function and public_base_url definition
fd -type f -name "*.py" | head -20

Repository: RolnickLab/antenna

Length of output: 327


🏁 Script executed:

# Search for find_or_generate_thumbnail_for_label function definition and usage
rg -n "def find_or_generate_thumbnail_for_label|public_base_url" --type=py -A5 -B2

Repository: RolnickLab/antenna

Length of output: 156


🏁 Script executed:

# Check for any URL validation or security checks related to fetch_image_content
rg -n "fetch_image_content|SSRF|url.*validat|safe.*url" --type=py -B3 -A3

Repository: RolnickLab/antenna

Length of output: 156


🏁 Script executed:

# Search for the function and public_base_url more broadly
rg -n "def find_or_generate_thumbnail_for_label" 
rg -n "public_base_url" -A 2 -B 2
rg -n "class SourceImage" -A 50

Repository: RolnickLab/antenna

Length of output: 156


🏁 Script executed:

# Look for URL validation patterns in the codebase
rg -n "validate.*url|check.*url|scheme|urlparse|private.*ip|link.*local|redirect" -i

Repository: RolnickLab/antenna

Length of output: 156


🏁 Script executed:

# Check what file contains the line in question (ami/main/models.py line 2445)
fd models.py -type f -exec wc -l {} \;

Repository: RolnickLab/antenna

Length of output: 327


🏁 Script executed:

# Find models.py file
find . -name "models.py" -path "*/ami/main/*"

# Search for the functions without specifying type
rg "def find_or_generate_thumbnail_for_label"
rg "public_base_url"
rg "class SourceImage"

Repository: RolnickLab/antenna

Length of output: 177


🏁 Script executed:

# Check the actual line 2445 in the models.py file
cat ami/main/models.py | sed -n '2440,2450p'

Repository: RolnickLab/antenna

Length of output: 911


🏁 Script executed:

# Look for any URL validation mechanisms
rg "requests\." -A 3 -B 3 | head -100

Repository: RolnickLab/antenna

Length of output: 156


🏁 Script executed:

# Get context around line 2445 - find the function name and broader context
sed -n '2400,2480p' ami/main/models.py

Repository: RolnickLab/antenna

Length of output: 4178


🏁 Script executed:

# Search for find_or_generate_thumbnail_for_label and SourceImage definitions
grep -n "def find_or_generate_thumbnail_for_label\|class SourceImage\|public_base_url" ami/main/models.py | head -20

Repository: RolnickLab/antenna

Length of output: 1714


🏁 Script executed:

# Look for public_url method definition
grep -n "def public_url" ami/main/models.py -A 10

Repository: RolnickLab/antenna

Length of output: 1283


🏁 Script executed:

# Get the full public_url method implementation
sed -n '2197,2240p' ami/main/models.py

Repository: RolnickLab/antenna

Length of output: 1971


🏁 Script executed:

# Check if there's any URL validation in the build_public_url method
grep -n "def build_public_url\|build_public_url" ami/main/models.py -A 10

Repository: RolnickLab/antenna

Length of output: 1264


🏁 Script executed:

# Check how public_base_url is populated and if it's API-writable
grep -n "public_base_url" ami/main/models.py -B 2 -A 2 | head -40

Repository: RolnickLab/antenna

Length of output: 2075


🏁 Script executed:

# Check for any SSRF or URL validation utilities in the codebase
find . -name "*.py" -type f -exec grep -l "urlparse\|scheme\|private.*ip\|ssrf" {} \;

Repository: RolnickLab/antenna

Length of output: 426


🏁 Script executed:

# Check the files that mention URL validation
grep -n "urlparse\|scheme\|private" ami/exports/utils.py ami/main/models.py ami/tests/test_storage.py 2>/dev/null

Repository: RolnickLab/antenna

Length of output: 1416


🏁 Script executed:

# Check if there's any serializer validation on public_base_url
find . -name "*serializer*" -type f -exec grep -l "public_base_url\|SourceImage\|S3StorageSource" {} \;

Repository: RolnickLab/antenna

Length of output: 245


🏁 Script executed:

# Check the imports in models.py to see what URL utilities are available
head -50 ami/main/models.py

Repository: RolnickLab/antenna

Length of output: 1919


🏁 Script executed:

# Look for any URL validation or sanitization middleware/utilities
grep -rn "def.*url\|http\|scheme" ami/utils/ --include="*.py" 2>/dev/null | head -30

Repository: RolnickLab/antenna

Length of output: 1196


🏁 Script executed:

# Check serializer validation for public_base_url and SourceImage
grep -n "public_base_url" ami/main/api/serializers.py -B 3 -A 3

Repository: RolnickLab/antenna

Length of output: 838


🏁 Script executed:

# Check if there's any validation in S3StorageSource or SourceImage serializers
grep -n "class.*Serializer\|public_base_url" ami/main/api/serializers.py | head -40

Repository: RolnickLab/antenna

Length of output: 2429


🏁 Script executed:

# Look for any URL validation utilities in ami/utils/
ls -la ami/utils/*.py

Repository: RolnickLab/antenna

Length of output: 855


🏁 Script executed:

# Check the complete fetch_image_content and related functions
sed -n '1,50p' ami/utils/media.py

Repository: RolnickLab/antenna

Length of output: 1709


Add URL validation to fetch_image_content() to prevent SSRF attacks.

The vulnerability is confirmed: public_base_url is API-writable (exposed in S3StorageSourceSerializer) and used to construct URLs fetched server-side in find_or_generate_thumbnail_for_label(). The fetch_image_content() function performs no validation—it accepts any scheme, private/link-local hosts, and follows redirects. An attacker could inject file://, http://localhost:*, or http://169.254.169.254/* to exfiltrate internal resources.

Required mitigations:

  • Block unsafe schemes (file://, ftp://, etc.)
  • Reject private/link-local IP ranges (127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.0.0/16)
  • Disable HTTP redirects or validate redirect targets against the same restrictions
  • Consider adding public_base_url validation in the serializer as a secondary control

The existing timeout (5.0s connect, 30.0s read) is insufficient without URL validation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ami/main/models.py` at line 2445, The fetch_image_content() function lacks
URL validation, creating an SSRF vulnerability. Add URL validation logic to
fetch_image_content() that blocks unsafe URL schemes (file://, ftp://, etc.),
rejects private and link-local IP ranges (127.0.0.0/8, 10.0.0.0/8,
172.16.0.0/12, 192.168.0.0/16, 169.254.0.0/16), and disables HTTP redirects or
validates redirect targets against the same restrictions. Additionally, add
validation in the S3StorageSourceSerializer to validate the public_base_url
field at the source to prevent malicious values from being set in the first
place.
ami/ml/models/pipeline.py (1)

121-128: 🚀 Performance & Scalability | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, locate and inspect the file
find . -name "pipeline.py" -path "*/ml/models/*" | head -5

Repository: RolnickLab/antenna

Length of output: 184


🏁 Script executed:

# Check file size to determine reading strategy
wc -l ami/ml/models/pipeline.py

Repository: RolnickLab/antenna

Length of output: 187


🏁 Script executed:

# Read the specific lines mentioned in the review (121-128 and around 258-273)
sed -n '115,135p' ami/ml/models/pipeline.py

Repository: RolnickLab/antenna

Length of output: 1101


🏁 Script executed:

# Read the second location mentioned (around 258-273)
sed -n '250,280p' ami/ml/models/pipeline.py

Repository: RolnickLab/antenna

Length of output: 1427


🏁 Script executed:

# Check for existing imports to understand context
head -50 ami/ml/models/pipeline.py

Repository: RolnickLab/antenna

Length of output: 1402


🌐 Web query:

Django QuerySet len() vs count() eager evaluation behavior

💡 Result:

In Django, the behavior of len and count on a QuerySet differs significantly in terms of database interaction and evaluation: len Calling len on a QuerySet forces the evaluation of the entire QuerySet [1][2]. It fetches all records from the database and loads them into memory as model instances [3]. Consequently, it returns the length of the result list held in Python [1][2]. Because it retrieves all objects, this method is memory-intensive and slow for large datasets [3]. count Calling count performs a SELECT COUNT(*) query at the database level [1][2][4]. It returns an integer representing the number of records without loading the actual objects into memory [1][4]. This is significantly more efficient than len when you only need to know the number of records [1][2][3]. Caching Behavior Both methods respect the QuerySet's internal result cache: If the QuerySet has already been fully evaluated and cached (e.g., by iterating over it), calling len will return the length of the cached results without hitting the database [4][3]. Similarly, if the QuerySet is already fully cached, count will return the length of that cached result set directly, avoiding an additional database query [4]. Summary Recommendation Use count if you only need the number of records, as it avoids loading data into memory [1][3]. Use len only if you have already retrieved the QuerySet data and want to avoid an additional, redundant database hit for the count, or if you are already planning to process the full collection of objects in memory [3].

Citations:


🌐 Web query:

Django QuerySet iterator() chunk_size lazy evaluation performance

💡 Result:

In Django, the QuerySet iterator method is a performance optimization tool used to reduce memory consumption when processing large result sets [1][2][3]. Core Performance and Behavioral Characteristics: - Memory Management: By default, Django QuerySets cache results in memory after the first iteration to avoid subsequent database hits [1][2]. The iterator method disables this caching mechanism, fetching rows from the database as needed [1][4][3]. This allows you to process large numbers of objects without loading the entire result set into RAM [1][2][3]. - Lazy Evaluation: Like standard QuerySets, calling iterator does not execute the database query immediately [1]. The query is executed only when you begin iterating over the result set [5][1]. - Chunking: The chunk_size parameter determines how many rows are fetched from the database at once into memory [6][7]. When iterator is used, Django uses server-side cursors (where supported by the database backend) to manage the retrieval of records in the specified chunk sizes [6][4]. - Prefetching Support: Since Django 4.1, iterator supports prefetch_related provided that a chunk_size is explicitly defined [5][4][3]. In this scenario, Django performs the prefetch operations within the defined chunks [6][7]. Without a specified chunk_size, prefetch_related calls are ignored when using iterator [5][8]. When to Use iterator: - Use it when you have a very large result set that would cause your application to exceed available memory if loaded entirely [1][2]. - Use it primarily for one-time processing tasks, such as management commands, bulk data exports, or migrations [2]. When to Avoid iterator: - Avoid it if you need to reuse the QuerySet results, as the absence of caching means that any subsequent iteration will trigger a completely new database query [2]. - Avoid it for standard web request views where results are typically paginated; using standard QuerySet evaluation is generally more efficient for small to medium-sized sets because it leverages the built-in result cache [2]. Always profile your specific use case to determine if the memory savings provided by iterator outweigh the potential overhead of disabling result caching and the nuances of chunked prefetching [5][4].

Citations:


🏁 Script executed:

# Verify QuerySet import is available for type checking
rg "from django.db" ami/ml/models/pipeline.py | head -10

Repository: RolnickLab/antenna

Length of output: 185


🏁 Script executed:

# Check if QuerySet is used or imported elsewhere in the file
rg "QuerySet" ami/ml/models/pipeline.py

Repository: RolnickLab/antenna

Length of output: 681


🏁 Script executed:

# Get more context around line 259 to see the full collect_images function
sed -n '240,290p' ami/ml/models/pipeline.py

Repository: RolnickLab/antenna

Length of output: 2102


🏁 Script executed:

# Get more context around line 121 to see the full filter_processed_images function signature
sed -n '100,140p' ami/ml/models/pipeline.py

Repository: RolnickLab/antenna

Length of output: 2140


🏁 Script executed:

# Check the function signatures to understand parameters
sed -n '105,115p' ami/ml/models/pipeline.py

Repository: RolnickLab/antenna

Length of output: 938


🏁 Script executed:

# Verify if images parameter can actually be a QuerySet in filter_processed_images
sed -n '70,130p' ami/ml/models/pipeline.py

Repository: RolnickLab/antenna

Length of output: 3161


🏁 Script executed:

# Check the actual return type hint of collect_images to understand how filter_processed_images receives images
sed -n '220,240p' ami/ml/models/pipeline.py

Repository: RolnickLab/antenna

Length of output: 783


🏁 Script executed:

# Verify what happens after filter_processed_images - does it list() the result?
sed -n '260,275p' ami/ml/models/pipeline.py

Repository: RolnickLab/antenna

Length of output: 791


Optimize QuerySet evaluation to prevent memory bloat on large collections.

len(images) at line 259 materializes Django QuerySets into memory before processing. Since images can be a QuerySet from collection.images or deployment-filtered queries, large collections load every capture up front. Use count() for totals and iterator(chunk_size=batch_size) for iteration to keep results lazy and streamed.

The same issue affects filter_processed_images() when it receives a QuerySet: calling iter(images) at line 121 may force evaluation. Add a helper to detect QuerySets and stream them via iterator() instead.

Proposed fix
+from django.db.models import QuerySet
+
@@
-    if not pipeline_classifier_ids:
+    def iter_input_images():
+        if isinstance(images, QuerySet):
+            return images.iterator(chunk_size=batch_size)
+        return iter(images)
+
+    if not pipeline_classifier_ids:
         task_logger.warning(f"Pipeline {pipeline} has no classification algorithms saved. Will reprocess all images.")
@@
-        yield from images
+        yield from iter_input_images()
         return
 
-    image_iter = iter(images)
+    image_iter = iter_input_images()
@@
-    total_images = len(images)
+    total_images = images.count() if isinstance(images, QuerySet) else len(images)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ami/ml/models/pipeline.py` around lines 121 - 128, The `iter(images)` call at
line 121 forces Django QuerySets into memory when processing large collections.
Create a helper function that detects if `images` is a Django QuerySet and uses
`iterator(chunk_size=batch_size)` to stream results instead of materializing
them all at once. If `images` is not a QuerySet, fall back to using
`iter(images)`. Update the `image_iter` assignment to use this helper function
so that QuerySets remain lazy-evaluated throughout the iteration in the while
loop.

…_detection branch [skip ci]

Comment-only. Clarify that existing_detection is the single pre-existing row matching the
incoming response's identity (the null-marker sentinel in the null branch, the same-bbox
detection in the real branch), not "all detections for the image" — and why the null branch
needs .null_markers() while the real branch is disambiguated by the bbox match alone.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mihow mihow merged commit d1051e6 into main Jun 23, 2026
5 checks passed
@mihow mihow deleted the fix/premptive-processed-marker branch June 23, 2026 02:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PSv2 Async & distributed ML backend (PSv2): job state, NATS dispatch, result handling. Umbrella #515.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix captures are marked as processed with zero detections when they shouldn't be

2 participants