feat(api): forward loader/chunker/extractor/resolver from apply_changes to ingest/update#250
Conversation
…es to ingest/update ``apply_changes`` is the documented single entrypoint for CI-driven incremental ingestion, but in v1.1.0 it silently ignored per-call strategy overrides — the inner ``ingest()`` and ``update()`` dispatches ran with SDK defaults regardless. Callers who wanted custom chunker boundaries, GLiNER thresholds, LLM-verified resolution, or a non-auto-selected loader had no way to get them through ``apply_changes``; their only escape was to bypass the convenience wrapper and loop over ``ingest``/``update``/``delete_document`` themselves. Add ``loader=``, ``chunker=``, ``extractor=``, ``resolver=`` to ``apply_changes`` (and ``apply_changes_sync``) and forward them to the inner ``ingest()`` and ``update()`` calls. ``delete_document`` doesn't take strategies and is unaffected. Defaults remain ``None`` — backwards compatible. Tests: - ``test_strategy_overrides_forward_to_ingest_and_update``: caller-passed strategies reach both inner dispatch points. - ``test_strategy_overrides_default_to_none``: omitted strategies stay ``None`` (i.e. SDK defaults are preserved). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis PR extends the ChangesStrategy override parameters in apply_changes flow
Sequence DiagramsequenceDiagram
participant User
participant ApplyChanges as apply_changes()
participant Update
participant Ingest
User->>ApplyChanges: apply_changes(modified, added, loader, chunker, ...)
ApplyChanges->>Update: update(if_missing="ingest", loader, chunker, extractor, resolver)
Note over Update: Process modified paths
ApplyChanges->>Ingest: ingest(loader, chunker, extractor, resolver)
Note over Ingest: Process added paths
Ingest-->>ApplyChanges: completed
Update-->>ApplyChanges: completed
ApplyChanges-->>User: changes applied
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@graphrag_sdk/tests/test_facade.py`:
- Around line 1762-1768: The test fakes construct IngestionResult and
UpdateResult with non-canonical kwarg names (e.g., document_id, chunks,
entities, relations, action); update the fake return objects in the functions
that build these results (the IngestionResult call around the earlier fake
ingest and the fake_update function returning UpdateResult, and the similar
fakes at the other occurrences around lines 1795–1801) to use the true/canonical
dataclass field names used by the real API (inspect the IngestionResult and
UpdateResult definitions to find the correct attribute names such as the
document reference, chunk/entity/relation counts, and status/action field) and
set equivalent values so the tests reflect the real object shape.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ca205cde-8386-4c41-bbc3-40c3758faeba
📒 Files selected for processing (2)
graphrag_sdk/src/graphrag_sdk/api/main.pygraphrag_sdk/tests/test_facade.py
| return [IngestionResult(document_id="a.md", chunks=0, entities=0, relations=0)] | ||
|
|
||
| async def fake_update(source, **kwargs): | ||
| captured["update"] = dict(kwargs, source=source) | ||
| return UpdateResult( | ||
| document_id="m.md", action="updated", chunks=0, entities=0, relations=0, | ||
| ) |
There was a problem hiding this comment.
Use canonical IngestionResult/UpdateResult fields in test fakes (Line 1762, Line 1767, Line 1795, Line 1800).
These fake return objects are built with non-canonical kwargs (document_id, chunks, entities, relations, action), which can break or make the tests misleading versus the real API shape.
Proposed fix
- from graphrag_sdk.core.models import IngestionResult, UpdateResult
+ from graphrag_sdk.core.models import DocumentInfo, IngestionResult, UpdateResult
@@
async def fake_ingest(source, **kwargs):
captured["ingest"] = dict(kwargs, source=source)
- return [IngestionResult(document_id="a.md", chunks=0, entities=0, relations=0)]
+ return [
+ IngestionResult(
+ document_info=DocumentInfo(uid="a.md", path="a.md"),
+ nodes_created=0,
+ relationships_created=0,
+ chunks_indexed=0,
+ metadata={},
+ )
+ ]
@@
async def fake_update(source, **kwargs):
captured["update"] = dict(kwargs, source=source)
return UpdateResult(
- document_id="m.md", action="updated", chunks=0, entities=0, relations=0,
+ document_info=DocumentInfo(uid="m.md", path="m.md"),
+ nodes_created=0,
+ relationships_created=0,
+ chunks_indexed=0,
+ metadata={},
+ replaced_existing=True,
+ no_op=False,
)
@@
- from graphrag_sdk.core.models import IngestionResult, UpdateResult
+ from graphrag_sdk.core.models import DocumentInfo, IngestionResult, UpdateResult
@@
async def fake_ingest(source, **kwargs):
captured["ingest"] = dict(kwargs)
- return [IngestionResult(document_id="a.md", chunks=0, entities=0, relations=0)]
+ return [
+ IngestionResult(
+ document_info=DocumentInfo(uid="a.md", path="a.md"),
+ nodes_created=0,
+ relationships_created=0,
+ chunks_indexed=0,
+ metadata={},
+ )
+ ]
@@
async def fake_update(source, **kwargs):
captured["update"] = dict(kwargs)
return UpdateResult(
- document_id="m.md", action="updated", chunks=0, entities=0, relations=0,
+ document_info=DocumentInfo(uid="m.md", path="m.md"),
+ nodes_created=0,
+ relationships_created=0,
+ chunks_indexed=0,
+ metadata={},
+ replaced_existing=True,
+ no_op=False,
)Also applies to: 1795-1801
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@graphrag_sdk/tests/test_facade.py` around lines 1762 - 1768, The test fakes
construct IngestionResult and UpdateResult with non-canonical kwarg names (e.g.,
document_id, chunks, entities, relations, action); update the fake return
objects in the functions that build these results (the IngestionResult call
around the earlier fake ingest and the fake_update function returning
UpdateResult, and the similar fakes at the other occurrences around lines
1795–1801) to use the true/canonical dataclass field names used by the real API
(inspect the IngestionResult and UpdateResult definitions to find the correct
attribute names such as the document reference, chunk/entity/relation counts,
and status/action field) and set equivalent values so the tests reflect the real
object shape.
Cuts a release that includes the apply_changes strategy-forwarding fix merged in FalkorDB#250. v1.1.0 silently dropped per-call ``loader``/ ``chunker``/``extractor``/``resolver`` overrides at the apply_changes boundary; v1.1.1 forwards them through to the inner ingest/update calls. Default behaviour is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
apply_changesis the documented single entrypoint for CI-driven incremental ingestion (graph.apply_changes(**parse_git_diff(...))in the v1.1.0 release notes), but the public signature accepts onlyadded/modified/deletedplus concurrency knobs — there is no way to pass per-call strategy overrides. Internally it dispatches toingest()andupdate()with SDK defaults, so any caller who wanted a custom chunker/extractor/resolver had to bypassapply_changesentirely and loop over the primitives themselves.This PR adds
loader=,chunker=,extractor=,resolver=kwargs toapply_changes(and the sync wrapper) and forwards them to the inneringest()andupdate()dispatches.delete_documentdoes not take strategies and is unaffected. Defaults are allNone— fully backwards compatible.Motivation
The downstream GraphRAG-UI docs-CI orchestrator builds a per-graph
IngestionConfig(GLiNER threshold 0.75,SentenceTokenCapChunking(256, 2),LLMVerifiedResolution) and then callsapply_changes. Without this forwarding, our strategies are silently dropped at the SDK boundary — the orchestrator either has to call the primitives directly (losingapply_changes's batch-level concurrency control and uniformBatchEntryerror surface) or accept SDK-default ingestion (silent retrieval quality drift).Changes
apply_changes(*, ..., loader=None, chunker=None, extractor=None, resolver=None, ...)— four new kwargsself.update(path, loader=, chunker=, extractor=, resolver=, ...)in_update_oneself.ingest(added, loader=, chunker=, extractor=, resolver=, ...)in theaddedbranchapply_changes_syncmirrors the new signature and forwardsArgs:to document the new kwargsTests
test_strategy_overrides_forward_to_ingest_and_update: caller-passed strategies reach both inner dispatch pointstest_strategy_overrides_default_to_none: omitted strategies stayNone(SDK defaults preserved)TestApplyChangestests still pass; 114 tests pass acrosstest_facade.pyTest plan
pytest tests/test_facade.py— 114 passed locallypytest tests/test_facade.py::TestApplyChanges— 12 passed (10 existing + 2 new)🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
apply_changes()andapply_changes_sync()methods to accept optional customization parameters for greater control over data processing operations.Tests