Skip to content
Open
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
version: 2
updates:
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "weekly"
groups:
github-actions:
patterns:
- "*"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 M6 — Verify dependabot pip ecosystem works with PEP-621 pyproject.toml.

- package-ecosystem: "pip"
  directory: "/graphrag_sdk"

Dependabot's pip parser handles [project.dependencies] but has known gaps with [project.optional-dependencies] groups (it tends to only update [project.dependencies] and miss the dev/extras extras). Worth a dry run before relying on it, or switch to uv ecosystem if your uv.lock is the source of truth.

- package-ecosystem: "pip"
directory: "/graphrag_sdk"
schedule:
interval: "weekly"
groups:
python-dependencies:
patterns:
- "*"
8 changes: 3 additions & 5 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -85,10 +85,8 @@ jobs:
done
echo "FalkorDB did not become reachable within 20 attempts" >&2
exit 1
- name: Run incremental-update integration tests
# Only the v1.1.0 invariant suite needs the live database; the
# rest of test_integration.py is import-smoke that already runs
# in the unit job above.
- name: Run real-FalkorDB integration tests
run: >
python -m pytest -v
tests/test_integration.py::TestIncrementalUpdateInvariants
-m integration
tests/test_integration.py
2 changes: 2 additions & 0 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
name: Deploy Docs

on:
workflow_dispatch:
push:
branches: [main, staging]
paths:
- "docs/**"
- "mkdocs.yml"
- "README.md"

permissions:
contents: read
Expand Down
7 changes: 7 additions & 0 deletions .github/workflows/pypi-publish.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,15 @@ jobs:
fi

- run: pip install build
- run: pip install twine

- run: python -m build
- run: twine check dist/*

- uses: actions/upload-artifact@v4
with:
name: python-package-distributions
path: graphrag_sdk/dist/

- uses: pypa/gh-action-pypi-publish@release/v1
with:
Expand Down
12 changes: 10 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,10 @@ pip install -e "graphrag_sdk[dev]"
You will need a running FalkorDB instance for integration work. The easiest way is via Docker:

```bash
docker run -p 6379:6379 falkordb/falkordb
docker compose up -d falkordb
```

This exposes FalkorDB on the default Redis port (6379). No additional configuration is required for local development.
This exposes FalkorDB on the default Redis port (6379) and the browser UI on port 3000.

---

Expand All @@ -41,6 +41,14 @@ python -m pytest graphrag_sdk/tests/ -q

There are 558 tests covering the ingestion pipeline, the GraphRAG facade, extraction strategies, resolution strategies, retrieval strategies, storage layers, and utilities. All tests use mock providers, so no live LLM or database connection is needed to run them.

Run real-FalkorDB integration tests with:

```bash
RUN_INTEGRATION=1 python -m pytest graphrag_sdk/tests/test_integration.py -m integration -q
```

These tests use scripted local providers, not live LLM APIs.

---

## 3. Code Style
Expand Down
2 changes: 2 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,15 @@ services:
image: falkordb/falkordb:v4.18.0
ports:
- "6379:6379"
- "3000:3000"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 M7 — Browser UI binds to all interfaces by default.

ports:
  - "3000:3000"

Fine on a laptop, surprising on a shared dev host or CI runner — exposes the FalkorDB Browser UI on 0.0.0.0:3000. Safer default:

ports:
  - "127.0.0.1:3000:3000"

Or at minimum a one-line note in CONTRIBUTING.md so contributors aren't surprised.

volumes:
- falkordb_data:/data
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 3s
retries: 5
start_period: 5s

volumes:
falkordb_data:
3 changes: 3 additions & 0 deletions graphrag_sdk/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -97,3 +97,6 @@ plugins = ["pydantic.mypy"]
[tool.pytest.ini_options]
asyncio_mode = "auto"
testpaths = ["tests"]
markers = [
"integration: tests that require a live FalkorDB instance",
]
7 changes: 6 additions & 1 deletion graphrag_sdk/src/graphrag_sdk/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,11 @@
# ── Core Contracts ───────────────────────────────────────────────
from graphrag_sdk.core.connection import ConnectionConfig, FalkorDBConnection
from graphrag_sdk.core.context import Context
from graphrag_sdk.core.exceptions import DocumentNotFoundError, GraphRAGError
from graphrag_sdk.core.exceptions import (
DocumentNotFoundError,
GraphRAGError,
LatencyBudgetExceededError,
)
from graphrag_sdk.core.models import (
ApplyChangesResult,
BatchEntry,
Expand Down Expand Up @@ -134,6 +138,7 @@
"GraphRelationship",
"GraphSchema",
"IngestionResult",
"LatencyBudgetExceededError",
"LLMBatchItem",
"LLMInterface",
"LiteLLM",
Expand Down
39 changes: 32 additions & 7 deletions graphrag_sdk/src/graphrag_sdk/api/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,12 @@
from graphrag_sdk import __version__
from graphrag_sdk.core.connection import ConnectionConfig, FalkorDBConnection
from graphrag_sdk.core.context import Context
from graphrag_sdk.core.exceptions import ConfigError, DatabaseError, DocumentNotFoundError
from graphrag_sdk.core.exceptions import (
ConfigError,
DatabaseError,
DocumentNotFoundError,
LatencyBudgetExceededError,
)
from graphrag_sdk.core.models import (
ApplyChangesResult,
BatchEntry,
Expand Down Expand Up @@ -1446,13 +1451,16 @@ async def retrieve(
ctx = Context()

ctx.log(f"Retrieve: {question[:80]}...")
ctx.ensure_budget("graph config validation")

await self._validate_graph_config()
await self._validate_graph_config(ctx=ctx)

retrieval = strategy or self._retrieval_strategy
ctx.ensure_budget("retrieval strategy search")
retriever_result = await retrieval.search(question, ctx)

if reranker is not None:
ctx.ensure_budget("retrieval reranking")
retriever_result = await reranker.rerank(question, retriever_result, ctx)

ctx.log(f"Retrieved {len(retriever_result.items)} context items")
Expand Down Expand Up @@ -1480,13 +1488,16 @@ def _validate_history(
f"history[{i}]: each message must have 'role' and "
f"'content' keys, got {sorted(msg.keys())}"
)
try:
validated.append(ChatMessage(role=msg["role"], content=msg["content"]))
except Exception:
role = msg["role"]
content = msg["content"]
if role not in {"system", "user", "assistant"}:
raise ValueError(
f"history[{i}]: invalid role '{msg['role']}'. "
f"history[{i}]: invalid role '{role}'. "
f"Must be one of: 'system', 'user', 'assistant'"
)
if not isinstance(content, str):
raise ValueError(f"history[{i}]: content must be a string")
validated.append(ChatMessage(role=role, content=content))
else:
raise TypeError(
f"history[{i}]: expected ChatMessage or dict, got {type(msg).__name__}"
Expand Down Expand Up @@ -1515,8 +1526,11 @@ async def _rewrite_question_with_history(
question=question,
)
try:
ctx.ensure_budget("question rewrite LLM call")
resp = await self.llm.ainvoke(prompt)
rewritten = (resp.content or "").strip().splitlines()[0].strip() if resp.content else ""
except LatencyBudgetExceededError:
raise
except Exception as e:
# Broad catch is intentional (see docstring) — but log at WARNING
# with full traceback so programming bugs surface in operator
Expand Down Expand Up @@ -1585,6 +1599,7 @@ async def completion(
# Step 1: Optionally rewrite the question for retrieval.
retrieval_query = question
if validated_history and rewrite_question_with_history:
ctx.ensure_budget("question rewrite")
retrieval_query = await self._rewrite_question_with_history(
question,
validated_history,
Expand All @@ -1594,6 +1609,7 @@ async def completion(
ctx.log(f"Rewrote for retrieval: {retrieval_query[:80]}")

# Step 2: Retrieve + rerank (using possibly-rewritten query).
ctx.ensure_budget("completion retrieval")
retriever_result = await self.retrieve(
retrieval_query,
strategy=strategy,
Expand Down Expand Up @@ -1635,6 +1651,7 @@ async def completion(
ChatMessage(role="user", content=final_user_content),
]

ctx.ensure_budget("completion LLM call")
llm_response = await self.llm.ainvoke_messages(messages)

result = RagResult(
Expand Down Expand Up @@ -1675,7 +1692,7 @@ async def _write_graph_config(self) -> None:
except Exception:
logger.debug("Failed to write graph config node", exc_info=True)

async def _validate_graph_config(self) -> None:
async def _validate_graph_config(self, ctx: Context | None = None) -> None:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 M3 — Positional ctx adds subclass-break risk.

async def _validate_graph_config(self, ctx: Context | None = None) -> None:

This is a private method by name but it's on a public class. Any subclass overriding _validate_graph_config(self) will break silently if a future caller passes ctx positionally. Cheap fix:

async def _validate_graph_config(self, *, ctx: Context | None = None) -> None:

Keyword-only matches the rest of the codebase's *, ctx=... convention.

"""Check that the current embedder matches the graph's stored config.

Two checks, both cached after first run:
Expand All @@ -1697,6 +1714,8 @@ async def _validate_graph_config(self) -> None:
return

try:
if ctx is not None:
ctx.ensure_budget("graph config query")
result = await self._graph_store.query_raw(
"MATCH (c:__GraphRAGConfig__ {id: 'default'}) "
"RETURN c.embedding_model, c.embedding_dimension"
Expand All @@ -1721,6 +1740,8 @@ async def _validate_graph_config(self) -> None:
)
except ConfigError:
raise
except LatencyBudgetExceededError:
raise
except Exception:
# Don't mark as validated on transient failures — retry next call.
logger.debug("Failed to validate graph config", exc_info=True)
Expand All @@ -1729,8 +1750,12 @@ async def _validate_graph_config(self) -> None:
# Probe the embedder once: confirm it produces vectors of the
# configured dimension. Catches user error like
# ``embedding_dimension=256`` paired with a 1536-dim model.
if ctx is not None:
ctx.ensure_budget("graph config embedder probe")
try:
probe = await self.embedder.aembed_query("dim_check")
except LatencyBudgetExceededError:
raise
except Exception:
# Probe failure is non-fatal — but don't cache a "validated"
# state, otherwise a transient outage permanently disables
Expand Down
23 changes: 21 additions & 2 deletions graphrag_sdk/src/graphrag_sdk/core/connection.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@
from typing import Any
from urllib.parse import urlparse

from graphrag_sdk.core.exceptions import DatabaseError

logger = logging.getLogger(__name__)


Expand Down Expand Up @@ -180,7 +182,13 @@ async def query(
last_exc = exc
# Don't retry non-transient errors (e.g. schema/index conflicts)
if self._is_non_transient(exc):
raise
logger.error(
"Non-transient FalkorDB query failure: %s: %s",
type(exc).__name__,
exc,
)
logger.debug("Non-transient FalkorDB query failure details", exc_info=True)
raise DatabaseError(f"FalkorDB query failed: {exc}") from exc
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 C1 — Breaking change to a public class, undocumented.

Previously the original exception (falkordb.ResponseError, redis.ConnectionError, etc.) propagated untouched. Now every non-transient error is wrapped in DatabaseError. Downstream code like:

try:
    await conn.query(...)
except falkordb.ResponseError as e:    # ← silently stops matching
    if 'index' in str(e): ...

…breaks. FalkorDBConnection is re-exported from graphrag_sdk/__init__.py:21, so this is a public-class contract change.

Either (a) preserve the original type (just raise it untouched and emit the structured ERROR log alongside), or (b) advertise as breaking — major bump + CHANGELOG migration note. As written it's an opaque API change buried in a 1300-line PR.

await self._breaker.record_failure()
logger.warning(
"Query attempt %d/%d failed: %s",
Expand All @@ -193,7 +201,18 @@ async def query(
break
base_delay = self.config.retry_delay * (2**attempt)
await asyncio.sleep(base_delay * (0.5 + random.random()))
raise last_exc # type: ignore[misc]
logger.error(
"FalkorDB query failed after %d attempts: %s: %s",
self.config.retry_count,
type(last_exc).__name__ if last_exc is not None else "UnknownError",
last_exc,
)
if last_exc is not None:
logger.debug(
"FalkorDB query failure details",
exc_info=(type(last_exc), last_exc, last_exc.__traceback__),
)
raise DatabaseError(f"FalkorDB query failed: {last_exc}") from last_exc
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 C1 (same issue) — retries-exhausted path.

raise last_excraise DatabaseError(...) from last_exc is the same contract change in the transient-retry-exhausted path. Same fix applies: re-raise the original.


# Substrings that indicate a non-transient (permanent) error —
# retrying will never succeed.
Expand Down
12 changes: 12 additions & 0 deletions graphrag_sdk/src/graphrag_sdk/core/context.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@
from typing import Any
from uuid import uuid4

from graphrag_sdk.core.exceptions import LatencyBudgetExceededError

logger = logging.getLogger(__name__)


Expand Down Expand Up @@ -52,6 +54,16 @@ def budget_exceeded(self) -> bool:
remaining = self.remaining_budget_ms
return remaining is not None and remaining <= 0

def ensure_budget(self, operation: str) -> None:
"""Raise if the latency budget is already exhausted before *operation* starts."""
if not self.budget_exceeded:
return
budget = self.latency_budget_ms if self.latency_budget_ms is not None else 0.0
raise LatencyBudgetExceededError(
f"Latency budget exceeded before {operation} "
f"(elapsed={self.elapsed_ms:.1f}ms, budget={budget:.1f}ms)"
)

def child(self, **overrides: Any) -> Context:
"""Create a child context inheriting tenant/trace but with optional overrides.
Expand Down
18 changes: 18 additions & 0 deletions graphrag_sdk/src/graphrag_sdk/core/exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,12 @@ class GraphRAGError(Exception):
pass


class LatencyBudgetExceededError(GraphRAGError):
"""Raised when an operation cannot start within the remaining latency budget."""

pass


# ── Provider Errors ──────────────────────────────────────────────


Expand All @@ -19,12 +25,24 @@ class LLMError(GraphRAGError):
pass


class LLMTimeoutError(LLMError):
"""Raised when an LLM provider call exceeds its configured timeout."""

pass


class EmbeddingError(GraphRAGError):
"""Raised when an embedding provider call fails."""

pass


class EmbeddingTimeoutError(EmbeddingError):
"""Raised when an embedding provider call exceeds its configured timeout."""

pass


# ── Ingestion Errors ─────────────────────────────────────────────


Expand Down
Loading
Loading