Skip to content

perf(watson): prefetch relations + force async indexing#14881

Open
valentijnscholten wants to merge 6 commits into
DefectDojo:devfrom
valentijnscholten:perf/watson-index-prefetch
Open

perf(watson): prefetch relations + force async indexing#14881
valentijnscholten wants to merge 6 commits into
DefectDojo:devfrom
valentijnscholten:perf/watson-index-prefetch

Conversation

@valentijnscholten
Copy link
Copy Markdown
Member

@valentijnscholten valentijnscholten commented May 15, 2026

Summary

Watson index updates were (in OS) done in the background. Still they were slow and inefficient as there is no prefetching of relations. So Waton will happily execute N x M x X x ... queries. They were also accumulating all models of a long running request in memory.

  • Watson indexer N+1 fix. Watson's SearchAdapter._resolve_field walks __-separated relation paths via per-instance getattr, triggering one query per FK hop per object during async indexing. For Finding (test__engagement__product__name + jira_issue__jira_key) and Vulnerability_Id (finding__test__engagement__product__name) on a 1000-row batch this adds thousands of extra queries per task.
  • New helper dojo/utils_watson_prefetch.py auto-derives select_related / prefetch_related paths from each adapter's fields/store by walking model._meta, then applies them in update_watson_search_index_for_model. On any error we log loudly and fall back to the plain queryset so indexing still completes.
  • Toggle: DD_WATSON_INDEX_PREFETCH_ENABLED env / WATSON_INDEX_PREFETCH_ENABLED setting, default True.
  • force_async=True dispatch flag. dojo_dispatch_task / we_want_async now accept force_async=True, which keeps a task in the background even when the user has block_execution=True or sync=True is also passed. Wired into the async watson indexer middleware — index updates are slow and never need to be synchronous from the user's perspective.
  • flush intermediately after reaching treshold.

Why

  • Bulk imports / reimports trigger async watson indexing via AsyncSearchContextMiddleware. Even on the background path the indexing was slow because Watson re-fetched every relation per object. This change cuts indexing queries from O(objects × FK depth) to O(1) per batch on the FK chain.
  • force_async decouples "this task should run in the background" from "this user wants foreground execution" — the watson indexer is a clear case where the caller knows better than the user preference.

Notes

  • A separate naming follow-up was discussed (sync=Trueforce_sync=True for symmetry with force_async), but deferred to a dedicated PR since it touches ~84 sites

Watson's SearchAdapter resolves __-separated relation paths via per-instance
getattr, triggering an N+1 query storm during async indexing. For Finding
(test__engagement__product__name + jira_issue__jira_key) and Vulnerability_Id
(finding__test__engagement__product__name) on a 1000-row batch this adds
thousands of extra queries per task.

dojo/utils_watson_prefetch.py auto-derives select_related / prefetch_related
paths from each adapter's fields/store by walking model._meta, then applies
them in update_watson_search_index_for_model. Toggle:
DD_WATSON_INDEX_PREFETCH_ENABLED (default True). On any error we log loudly
and fall back to the plain queryset so indexing still completes.

Also adds force_async=True to dojo_dispatch_task / we_want_async — keeps
the watson indexer in the background even when the caller is a
block_execution=True user, since index updates are slow and never need
to be synchronous from the user's perspective.

Tests:
- unittests/test_watson_index_prefetch.py (10 tests) — path classification
  for Product/Finding/Vulnerability_Id/Endpoint, unknown-path drop, setting
  toggle, derivation-raise fallback with log assertion.
- unittests/test_celery_dispatch_force_async.py (4 tests) — force_async
  precedence over sync=True and block_execution.
@github-actions github-actions Bot added settings_changes Needs changes to settings.py based on changes in settings.dist.py included in this PR unittests labels May 15, 2026
@valentijnscholten valentijnscholten added this to the 2.59.0 milestone May 15, 2026
- test_tag_inheritance_perf: update V2/V3 import baselines (-52 each)
  to reflect adapter-derived select_related/prefetch_related in the
  async watson indexer running inline under CELERY_TASK_ALWAYS_EAGER.
- test_watson_async_search_index: add CELERY_TASK_ALWAYS_EAGER=True to
  the threshold=0 case. force_async=True now always dispatches via
  apply_async; without eager mode the task never runs and the index
  stays empty.
Wrap watson.search_context_manager.add_to_context with a size-based hook
that drains the per-request context to async celery tasks as soon as it
reaches WATSON_ASYNC_INDEX_UPDATE_BATCH_SIZE, instead of waiting for
end-of-request. Bounds in-memory growth on long-running imports and lets
celery workers start indexing batches earlier (parallel fanout).

Hook installed once in dojo.apps.ready(). BATCH_SIZE doubles as
threshold; set to 0/negative to disable the intermediate flush.

Drop WATSON_ASYNC_INDEX_UPDATE_THRESHOLD: index dispatch is now
unconditionally async. Removes the sub-threshold sync branch (which
blocked the request on _bulk_save_search_entries) and the
disable-async path.

Consolidate _extract_tasks_for_async + _trigger_async_index_update +
_dispatch_async_index_batches + _flush_search_context_intermediate into
one helper `_drain_search_context_to_async` that groups, dispatches,
and discards entries from the set in place. With the set drained,
watson's end() bulk-saves an empty iterator — no explicit invalidate()
needed.

Tests:
- test_watson_intermediate_flush: new — drain dispatches + clears,
  threshold-triggered hook, threshold=0 disables, invalid context skips.
- test_watson_async_search_index: collapse three threshold-variant
  tests into one, class-level CELERY_TASK_ALWAYS_EAGER=True.
- test_tag_inheritance_perf: reimport no-change baselines V2 69→74,
  V3 87→92 (always-async path adds 5 queries vs prior sub-threshold
  sync branch).
@valentijnscholten valentijnscholten added the affects_pro PRs that affect Pro and need a coordinated release/merge moment. label May 16, 2026
@github-actions github-actions Bot added the docs label May 16, 2026
@valentijnscholten valentijnscholten removed the affects_pro PRs that affect Pro and need a coordinated release/merge moment. label May 16, 2026
Lock in the N+1 elimination claim directly with CaptureQueriesContext —
previously only observed indirectly via the ZAP import perf test.
CI runs the V3_FEATURE_LOCATIONS=True matrix where BaseModel.save calls
full_clean — Product.description is blank=False, so the bare fixture
ValidationErrors out. Local default (V3 off) skips validation, masking
this in the prior run.
@valentijnscholten valentijnscholten force-pushed the perf/watson-index-prefetch branch from 6431064 to 2683553 Compare May 16, 2026 11:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs settings_changes Needs changes to settings.py based on changes in settings.dist.py included in this PR unittests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant