Feat/sqlite local dev 405#566
Conversation
Auto-detects SQLite from connection URI and degrades gracefully: - New src/db_types.py with JSONBCompat, VectorType, pg_check, pg_only_index, upsert_insert cross-dialect helpers - config.py: is_sqlite() helper + auto-null pool/schema for SQLite URIs - db.py: dialect-aware connect_args, schema handling, create_all instead of Alembic for SQLite - models.py: replace JSONB/Vector with compat types, suppress PG-only constraints/indexes - search.py: LIKE fallback for FTS, disable pgvector on SQLite - crud/document.py: force pgvector off on SQLite - crud/message.py: asyncio.Lock fallback for pg_advisory_xact_lock - crud/session.py + deriver/queue_manager.py: dialect-safe upsert_insert Start locally with: DB_CONNECTION_URI="sqlite+aiosqlite:///./honcho_local.db" VECTOR_STORE_TYPE=lancedb uv run fastapi dev src/main.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ecorator The original SQLite support used `.astext` and `.astext.cast(Boolean)` which don't forward through the custom JSONBCompat TypeDecorator, generating wrong SQL on SQLite (LIKE instead of JSON extract). - Add Comparator subclasses to JSONBCompat and VectorType in db_types.py so that __getitem__, as_string(), as_boolean(), contains(), and cosine_distance() are properly delegated to the underlying JSON/pgvector types - Replace .astext with .as_string() in deriver.py and filter.py - Replace .astext.cast(Boolean) with .as_boolean() in session.py - Add focused pytest tests (tests/test_sqlite_json_ops.py) covering all three code paths against a real in-memory SQLite database Fixes plastic-labs#405 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds SQLite runtime support and cross-dialect DB types; replaces Postgres-only JSONB/Vector usages with compatibility wrappers; routes pgvector/full-text and advisory-locking paths to SQLite-safe alternatives; adjusts DB init to create tables for SQLite and adds SQLite-focused integration tests. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant App as Application
participant LockMgr as LockManager
participant DB as Database
Note over App,DB: create_messages(workspace, session, messages...)
Client->>App: POST /messages
App->>LockMgr: request_lock(workspace, session)
alt SQLite
LockMgr-->>App: acquired asyncio.Lock (per-session)
App->>DB: BEGIN TRANSACTION
App->>DB: SELECT last_seq
App->>DB: INSERT messages (compute seq)
App->>DB: COMMIT
App->>LockMgr: release asyncio.Lock
else Postgres
App->>DB: EXECUTE advisory_lock SQL
App->>DB: BEGIN TRANSACTION
App->>DB: SELECT last_seq
App->>DB: INSERT messages (compute seq)
App->>DB: COMMIT (advisory lock released)
end
DB-->>App: OK
App-->>Client: 200/201
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/crud/message.py (1)
254-295:⚠️ Potential issue | 🟠 MajorRelease the SQLite session lock on every failure path.
Any exception between Line 254 and Line 291 leaks the acquired
asyncio.Lock, which will block later writes to the same(workspace_name, session_name)until the process restarts. Thetry/finallyneeds to wrap the entire seq-allocation / insert section, not justawait db.commit().🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/crud/message.py` around lines 254 - 295, Wrap the entire sequence allocation and insert section (from immediately after _session_lock = await _acquire_session_lock(...) through db.add_all(...) and await db.commit()) in a single try/finally so the acquired _session_lock is always released on any exception; in the finally block check _session_lock is not None (and optionally _session_lock.locked()) before calling _session_lock.release() to avoid leaking the asyncio.Lock used for SQLite; update the scope of the try to include the last_seq query, the for-loop creating message_objects, db.add_all(message_objects) and await db.commit() so any error in those steps still runs the release.
🧹 Nitpick comments (1)
tests/test_sqlite_json_ops.py (1)
270-279: Assert the queue payload, not just the status code.Both calls succeed if the endpoint returns
200, even when theobserver_idfilter is ignored or the aggregated counts are wrong. Add at least one assertion on the returned rows/counts for the unfiltered and filtered responses so this test actually guards the.as_string()SELECT/WHERE paths.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_sqlite_json_ops.py` around lines 270 - 279, The test currently only checks HTTP 200 from sqlite_client.get for "/v3/workspaces/{ws}/queue/status" but doesn't validate the payload; update the assertions to parse r.json() for both unfiltered (r from first sqlite_client.get) and filtered (r from second call with params={"observer_id": peer}) responses and assert expected row structure and counts: for the unfiltered response assert total rows/count field is present and >0 (or equals the known expected count in the test fixture) and for the filtered response assert rows are a subset (every returned row's observer_id equals peer) and the filtered count is <= the unfiltered count. Use the existing variables r, ws, and peer and the same endpoint to locate where to add these JSON payload assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/test_sqlite_json_ops.py`:
- Around line 56-70: The teardown currently only restores Base.metadata.schema
but not each Table.schema you set to None; capture each table's original schema
before mutating (e.g., build a mapping from table key/name to table.schema using
Base.metadata.tables), set table.schema = None for tests, and in the finally
block iterate the same Base.metadata.tables and restore each table.schema from
that mapping so every Table.schema (not just Base.metadata.schema) is returned
to its original value.
---
Outside diff comments:
In `@src/crud/message.py`:
- Around line 254-295: Wrap the entire sequence allocation and insert section
(from immediately after _session_lock = await _acquire_session_lock(...) through
db.add_all(...) and await db.commit()) in a single try/finally so the acquired
_session_lock is always released on any exception; in the finally block check
_session_lock is not None (and optionally _session_lock.locked()) before calling
_session_lock.release() to avoid leaking the asyncio.Lock used for SQLite;
update the scope of the try to include the last_seq query, the for-loop creating
message_objects, db.add_all(message_objects) and await db.commit() so any error
in those steps still runs the release.
---
Nitpick comments:
In `@tests/test_sqlite_json_ops.py`:
- Around line 270-279: The test currently only checks HTTP 200 from
sqlite_client.get for "/v3/workspaces/{ws}/queue/status" but doesn't validate
the payload; update the assertions to parse r.json() for both unfiltered (r from
first sqlite_client.get) and filtered (r from second call with
params={"observer_id": peer}) responses and assert expected row structure and
counts: for the unfiltered response assert total rows/count field is present and
>0 (or equals the known expected count in the test fixture) and for the filtered
response assert rows are a subset (every returned row's observer_id equals peer)
and the filtered count is <= the unfiltered count. Use the existing variables r,
ws, and peer and the same endpoint to locate where to add these JSON payload
assertions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7ab741a7-c906-4f59-8dfc-5cf969ad6a42
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (13)
pyproject.tomlsrc/config.pysrc/crud/deriver.pysrc/crud/document.pysrc/crud/message.pysrc/crud/session.pysrc/db.pysrc/db_types.pysrc/deriver/queue_manager.pysrc/models.pysrc/utils/filter.pysrc/utils/search.pytests/test_sqlite_json_ops.py
There was a problem hiding this comment.
Pull request overview
This PR restores SQLite compatibility for JSON field accessors/operators that regressed after the SQLite support work in #405, by ensuring JSON/Vector custom SQLAlchemy types delegate comparator/operator behavior correctly across dialects and updating affected query call sites.
Changes:
- Add cross-dialect SQLAlchemy helpers/types (
JSONBCompat,VectorType,upsert_insert, plus Postgres-only DDL helpers) to make models and queries behave correctly on SQLite. - Replace JSON
.astext/.astext.cast(Boolean)usage with.as_string()/.as_boolean()in the three affected query paths. - Add focused SQLite integration tests and introduce
aiosqliteas a dependency for async SQLite.
Reviewed changes
Copilot reviewed 13 out of 14 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| uv.lock | Locks aiosqlite dependency for SQLite async driver support. |
| pyproject.toml | Adds aiosqlite runtime dependency. |
| src/config.py | Adds is_sqlite() and SQLite-specific DB defaults (pool/schema). |
| src/db.py | Adjusts connect args for SQLite and initializes SQLite via create_all instead of Alembic. |
| src/db_types.py | Introduces cross-dialect JSONBCompat/VectorType, Postgres-only DDL helpers, and dialect-aware insert helper. |
| src/models.py | Switches JSONB columns to JSONBCompat, vector columns to VectorType, and gates Postgres-only constraints/indexes. |
| src/utils/filter.py | Uses .as_string() for JSON field comparisons to compile correctly on SQLite. |
| src/utils/search.py | Avoids pgvector/fulltext-only paths on SQLite; uses fallback matching. |
| src/crud/document.py | Disables pgvector DB-only query path on SQLite. |
| src/crud/session.py | Uses upsert_insert and replaces observe_others JSON accessor with .as_boolean(). |
| src/crud/deriver.py | Uses .as_string() for queue payload JSON field accessors. |
| src/deriver/queue_manager.py | Uses upsert_insert to support SQLite upsert semantics. |
| src/crud/message.py | Adds SQLite fallback locking via per-session asyncio.Lock. |
| tests/test_sqlite_json_ops.py | Adds SQLite in-memory tests covering the three fixed JSON operator paths. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| peer_names.keys() | ||
| ), # Exclude peers being updated | ||
| models.SessionPeer.configuration["observe_others"].astext.cast( | ||
| Boolean | ||
| ), # Only observers | ||
| models.SessionPeer.configuration["observe_others"].as_boolean(), # Only observers | ||
| ) |
There was a problem hiding this comment.
After switching to configuration["observe_others"].as_boolean(), the sqlalchemy.types.Boolean import in this module appears unused (Ruff F401). Remove the unused import (or any other now-unused imports from the previous cast(Boolean) approach) to keep lint/CI passing.
| # Tell is_sqlite() to return True so the app takes SQLite code paths | ||
| original_uri = settings.DB.CONNECTION_URI | ||
| settings.DB.CONNECTION_URI = SQLITE_URL | ||
|
|
||
| engine = create_async_engine(SQLITE_URL, echo=False) | ||
| async with engine.begin() as conn: | ||
| # SQLite doesn't need schema prefixes | ||
| original_schema = Base.metadata.schema | ||
| Base.metadata.schema = None | ||
| for table in Base.metadata.tables.values(): | ||
| table.schema = None | ||
| await conn.run_sync(Base.metadata.create_all) |
There was a problem hiding this comment.
This fixture switches settings.DB.CONNECTION_URI to SQLite after importing src.models/src.db/src.main at module import time. But pg_check()/pg_only_index() are evaluated when models are defined, so if the process started with a Postgres URI, Postgres-only CHECK constraints (regex with ~) and indexes may already be attached to Base.metadata, and create_all on SQLite can fail. To make this test reliable, configure SQLite (env/settings) before importing models/app, or reload the relevant modules/metadata after switching settings.
| _session_lock = await _acquire_session_lock(db, workspace_name, session_name) | ||
|
|
||
| # Get the last sequence number on a session - uses (workspace_name, session_name, seq_in_session) index |
There was a problem hiding this comment.
SQLite path acquires an asyncio.Lock, but it’s only released in the commit try/finally. Any exception between acquiring the lock and reaching the commit (e.g., during last_seq query or message object construction) will leave the lock permanently held for that (workspace, session), effectively deadlocking future writes in the same process. Wrap the entire post-acquire section in a broader try/finally (and consider rolling back before release) so the lock is always released on all exit paths.
| # Per-session asyncio locks used as a fallback for SQLite (single-process only). | ||
| # PostgreSQL uses pg_advisory_xact_lock which is cross-process safe. | ||
| _sqlite_session_locks: dict[tuple[str, str], asyncio.Lock] = {} | ||
| _sqlite_locks_mutex = asyncio.Lock() |
There was a problem hiding this comment.
The per-session SQLite lock table (_sqlite_session_locks) grows monotonically and never evicts entries. On a long-running local dev server that creates many sessions, this will leak memory. Consider using a WeakValueDictionary, adding an LRU/TTL eviction strategy, or deleting the entry when the lock is released and there are no waiters.
|
|
||
| from collections.abc import AsyncGenerator | ||
| from typing import Any | ||
| from unittest.mock import AsyncMock, MagicMock, patch |
There was a problem hiding this comment.
Unused imports (AsyncMock, MagicMock) will be flagged by Ruff (F401) and fail CI. Remove them or use them in the test.
| from unittest.mock import AsyncMock, MagicMock, patch | |
| from unittest.mock import patch |
| "configuration", | ||
| JSONB, | ||
| JSONBCompat, | ||
| default=dict, | ||
| nullable=False, | ||
| server_default=text("'{}'::jsonb"), | ||
| server_default=text("'{}'"), | ||
| ), |
There was a problem hiding this comment.
Model server_default for JSON/JSONB columns changed from "'{}'::jsonb" to "'{}'". Alembic migrations in this repo explicitly set "'{}'::jsonb" (e.g., migrations/versions/e9b705f9adf9_…); this mismatch will cause noisy autogenerate diffs and may attempt to alter defaults in Postgres. Consider using a dialect-aware default helper (Postgres: ::jsonb, SQLite: plain string) so models stay consistent with migrations.
- Remove unused Boolean import from session.py (Ruff F401) - Remove unused AsyncMock/MagicMock imports from test file (Ruff F401) - Restore per-table schema in sqlite_engine fixture teardown - Expand try/finally in create_messages() to cover lock acquire through commit, ensuring SQLite asyncio.Lock is always released on any exception Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
tests/test_sqlite_json_ops.py (1)
81-81: Use snake_case for local sessionmaker variable.Rename
Sessiontosession_maker(or similar) to match project naming rules for variables.As per coding guidelines, "Use snake_case for variables and functions; PascalCase for classes".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_sqlite_json_ops.py` at line 81, The variable name "Session" violates snake_case; rename the local variable Session returned from async_sessionmaker(bind=sqlite_engine, expire_on_commit=False) to a snake_case name like session_maker (or session_factory) across this test so all references to Session use the new name; ensure you update any subsequent uses in tests to reference session_maker instead.src/crud/session.py (1)
981-983: Update stale PostgreSQL-only comment wording.The comment says this path uses PostgreSQL JSONB operators, but the new accessor path is now cross-dialect. Please reword to avoid misleading future maintenance.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/crud/session.py` around lines 981 - 983, The comment above existing_observers_stmt is misleadingly Postgres-specific; update it to reflect that the query uses the new cross-dialect JSON accessor rather than PostgreSQL-only JSONB operators. Replace the wording to say this uses the cross-dialect JSON accessor to check the observe_others field and performs a single efficient count of existing observers not being updated, referencing existing_observers_stmt (the select(func.count()) where(...) construction) so maintainers know what to look for.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/test_sqlite_json_ops.py`:
- Around line 50-75: The fixture mutates settings.DB.CONNECTION_URI,
Base.metadata.schema, and table.schema before the try: yield block so if
create_async_engine or conn.run_sync(Base.metadata.create_all) fails the
restoration code after yield never runs; wrap the setup and yield in a
try/finally (or perform the mutations, then use try: await
conn.run_sync(Base.metadata.create_all); yield engine finally: ...restore...) so
that the restoration of Base.metadata.schema, per-table table.schema and
settings.DB.CONNECTION_URI (the logic currently after yield) always executes,
and ensure engine.dispose() and Base.metadata.drop_all() are also called from
that finally to avoid state leakage.
- Around line 141-152: The test fixture is registering a new HonchoException
handler via `@app.exception_handler`(HonchoException) which mutates global app
state and isn't restored; modify the fixture to save the original handler
(app.router.exception_handlers.get(HonchoException) or the existing handler
reference) before registering the test handler, then after the TestClient
context (before yielding/after yielding as appropriate) restore the original
handler (or remove the test handler) so the global handler from src/main.py is
not permanently shadowed; ensure this change touches the same symbols used in
the fixture (the `@app.exception_handler`(HonchoException) registration,
override_get_db, app.dependency_overrides[get_db], and TestClient usage) so
cleanup happens even if the test raises.
---
Nitpick comments:
In `@src/crud/session.py`:
- Around line 981-983: The comment above existing_observers_stmt is misleadingly
Postgres-specific; update it to reflect that the query uses the new
cross-dialect JSON accessor rather than PostgreSQL-only JSONB operators. Replace
the wording to say this uses the cross-dialect JSON accessor to check the
observe_others field and performs a single efficient count of existing observers
not being updated, referencing existing_observers_stmt (the select(func.count())
where(...) construction) so maintainers know what to look for.
In `@tests/test_sqlite_json_ops.py`:
- Line 81: The variable name "Session" violates snake_case; rename the local
variable Session returned from async_sessionmaker(bind=sqlite_engine,
expire_on_commit=False) to a snake_case name like session_maker (or
session_factory) across this test so all references to Session use the new name;
ensure you update any subsequent uses in tests to reference session_maker
instead.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d3e4001d-c3f2-4ea0-81df-e7d2be52235b
📒 Files selected for processing (3)
src/crud/message.pysrc/crud/session.pytests/test_sqlite_json_ops.py
🚧 Files skipped from review as they are similar to previous changes (1)
- src/crud/message.py
| original_uri = settings.DB.CONNECTION_URI | ||
| settings.DB.CONNECTION_URI = SQLITE_URL | ||
|
|
||
| engine = create_async_engine(SQLITE_URL, echo=False) | ||
| async with engine.begin() as conn: | ||
| # SQLite doesn't need schema prefixes | ||
| original_schema = Base.metadata.schema | ||
| original_table_schemas = { | ||
| name: t.schema for name, t in Base.metadata.tables.items() | ||
| } | ||
| Base.metadata.schema = None | ||
| for table in Base.metadata.tables.values(): | ||
| table.schema = None | ||
| await conn.run_sync(Base.metadata.create_all) | ||
|
|
||
| try: | ||
| yield engine | ||
| finally: | ||
| async with engine.begin() as conn: | ||
| await conn.run_sync(Base.metadata.drop_all) | ||
| await engine.dispose() | ||
| # Restore original schema + URI | ||
| Base.metadata.schema = original_schema | ||
| for name, table in Base.metadata.tables.items(): | ||
| table.schema = original_table_schemas[name] | ||
| settings.DB.CONNECTION_URI = original_uri |
There was a problem hiding this comment.
Ensure fixture teardown runs even if setup fails.
settings/schema/patch mutations occur before the try: yield block. If create_all() or cache.setup() raises, restoration won’t run and state can leak into later tests.
Proposed hardening
`@pytest_asyncio.fixture`(scope="module")
async def sqlite_engine():
"""Create an async SQLite engine with all tables."""
- # Tell is_sqlite() to return True so the app takes SQLite code paths
- original_uri = settings.DB.CONNECTION_URI
- settings.DB.CONNECTION_URI = SQLITE_URL
-
- engine = create_async_engine(SQLITE_URL, echo=False)
- async with engine.begin() as conn:
- # SQLite doesn't need schema prefixes
- original_schema = Base.metadata.schema
- original_table_schemas = {
- name: t.schema for name, t in Base.metadata.tables.items()
- }
- Base.metadata.schema = None
- for table in Base.metadata.tables.values():
- table.schema = None
- await conn.run_sync(Base.metadata.create_all)
-
- try:
+ original_uri = settings.DB.CONNECTION_URI
+ original_schema = Base.metadata.schema
+ original_table_schemas = {
+ name: t.schema for name, t in Base.metadata.tables.items()
+ }
+ settings.DB.CONNECTION_URI = SQLITE_URL
+ engine = create_async_engine(SQLITE_URL, echo=False)
+ try:
+ async with engine.begin() as conn:
+ Base.metadata.schema = None
+ for table in Base.metadata.tables.values():
+ table.schema = None
+ await conn.run_sync(Base.metadata.create_all)
yield engine
finally:
async with engine.begin() as conn:
await conn.run_sync(Base.metadata.drop_all)
await engine.dispose()
- # Restore original schema + URI
Base.metadata.schema = original_schema
for name, table in Base.metadata.tables.items():
table.schema = original_table_schemas[name]
settings.DB.CONNECTION_URI = original_uriAlso applies to: 99-130
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_sqlite_json_ops.py` around lines 50 - 75, The fixture mutates
settings.DB.CONNECTION_URI, Base.metadata.schema, and table.schema before the
try: yield block so if create_async_engine or
conn.run_sync(Base.metadata.create_all) fails the restoration code after yield
never runs; wrap the setup and yield in a try/finally (or perform the mutations,
then use try: await conn.run_sync(Base.metadata.create_all); yield engine
finally: ...restore...) so that the restoration of Base.metadata.schema,
per-table table.schema and settings.DB.CONNECTION_URI (the logic currently after
yield) always executes, and ensure engine.dispose() and Base.metadata.drop_all()
are also called from that finally to avoid state leakage.
…ate leaks - Move sqlite_engine originals save before mutations so teardown always restores schema/URI even if create_all() raises - Move sqlite_cache try/finally to start immediately after redis_patch.start() so the patch is always stopped on any failure - Save and restore HonchoException handler in sqlite_client fixture to avoid leaking global app state across tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add empty_json_default() to db_types.py using SQLAlchemy's @compiles to emit '{}'::jsonb on PostgreSQL and '{}' on SQLite. Replace all server_default=text("'{}'") usages in models.py so Alembic autogenerate sees the same string as existing migrations on Postgres without needing separate model definitions per dialect. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… fixture Ensures Base.metadata.schema and per-table schema are always restored even if create_all() raises, matching CodeRabbit's proposed hardening. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/models.py (1)
283-283: Keep the vector attributes onMapped[...].
MappedColumn[Any]drops the Python-side type forembedding, which weakens type-checking right where this PR adds cross-dialect serialization. SQLAlchemy’s annotated declarative docs useMapped[...]on the left-hand side and letmapped_column()derive from that annotation. (docs.sqlalchemy.org)🧭 Suggested fix
- embedding: MappedColumn[Any] = mapped_column(VectorType(1536), nullable=True) + embedding: Mapped[list[float] | None] = mapped_column( + VectorType(1536), nullable=True + )Apply the same change to both
MessageEmbedding.embeddingandDocument.embedding.As per coding guidelines,
src/models.py: Use explicit type hints with SQLAlchemy mapped_column annotations.Also applies to: 391-391
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/models.py` at line 283, The embedding attributes are annotated as MappedColumn[Any], which loses Python-side typing; change both MessageEmbedding.embedding and Document.embedding to use SQLAlchemy's Mapped[...] annotation (e.g., Mapped[Optional[List[float]]] or another appropriate vector Python type) and keep the mapped_column(VectorType(1536), nullable=True) call unchanged so mapped_column derives its DB metadata from the Mapped[...] annotation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/db_types.py`:
- Around line 35-40: The contains method is forcing the generic JSON comparator
via type_coerce(self.expr, SaJSON()) which bypasses dialect-specific comparators
(e.g., JSONB on Postgres); change contains to delegate to the underlying
dialect-aware comparator by calling self.expr.comparator.contains(other,
**kwargs) so load_dialect_impl() can supply the correct JSON/JSONB comparator
and operators.
---
Nitpick comments:
In `@src/models.py`:
- Line 283: The embedding attributes are annotated as MappedColumn[Any], which
loses Python-side typing; change both MessageEmbedding.embedding and
Document.embedding to use SQLAlchemy's Mapped[...] annotation (e.g.,
Mapped[Optional[List[float]]] or another appropriate vector Python type) and
keep the mapped_column(VectorType(1536), nullable=True) call unchanged so
mapped_column derives its DB metadata from the Mapped[...] annotation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 70e684f1-7db9-4ac2-96fe-975038bde1be
📒 Files selected for processing (2)
src/db_types.pysrc/models.py
| def contains(self, other: Any, **kwargs: Any) -> Any: | ||
| # Use JSON's contains (→ @> on PostgreSQL, json_extract on SQLite) | ||
| # instead of base ColumnOperators.contains which generates LIKE. | ||
| from sqlalchemy import JSON as SaJSON, type_coerce | ||
|
|
||
| return type_coerce(self.expr, SaJSON()).contains(other, **kwargs) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's examine the db_types.py file around lines 35-40
cat -n src/db_types.py | head -60Repository: plastic-labs/honcho
Length of output: 2777
🏁 Script executed:
# Search for usage of the contains() method in the codebase
rg "\.contains\(" --type py -B 2 -A 2Repository: plastic-labs/honcho
Length of output: 1612
🌐 Web query:
SQLAlchemy JSON type comparator contains method vs JSONB
💡 Result:
In SQLAlchemy, whether you can use a “contains” comparator depends on the column’s SQL type (JSON vs PostgreSQL JSONB) and on the backend. 1) PostgreSQL JSONB: use JSONB.Comparator.contains (maps to @> containment) - If the mapped column is sqlalchemy.dialects.postgresql.JSONB, you can call .contains via JSONB’s comparator. - SQLAlchemy exposes JSONB.Comparator.contains for the PostgreSQL dialect. - This corresponds to PostgreSQL’s jsonb containment operator @> (e.g., params @> '[{"a": "333"}]'::jsonb). Example (SQLAlchemy ORM / Core style): - key = [{"a": "333"}] - select(User).where(User.params.contains(key)) 2) SQLAlchemy generic JSON (non-JSONB) on PostgreSQL/other backends: .contains may not exist or may not mean jsonb containment - The JSON and JSONB concepts overlap in naming, but the actual comparator support differs by type. - SQLAlchemy has discussed that “JSON has operators like contains which overlap with JSONB names, and these should not be callable” when they don’t do the same thing as JSONB. Practical consequence: - If your column is declared as plain JSON (or otherwise does not use the PostgreSQL JSONB type), calling .contains can be unsupported or can generate misleading SQL. - A common error pattern is that an operator like “contains” isn’t supported for that expression/type. 3) How this shows up in your specific question (“comparator contains method vs JSONB”) - “contains” that works for JSONB is the JSONB containment semantics (jsonb @>) exposed as JSONB.Comparator.contains. - If you instead use JSON (not JSONB), SQLAlchemy may not provide the same comparator method or may intentionally restrict/avoid it. If you tell me your exact SQLAlchemy column type (sqlalchemy.JSON vs sqlalchemy.dialects.postgresql.JSONB, and the SQLAlchemy version), I can provide the exact query expression that will compile for your dialect.
Citations:
- 1: https://docs.sqlalchemy.org/en/14/dialects/postgresql.html?highlight=jsonb
- 2: http://docs.sqlalchemy.org/en/latest/dialects/postgresql.html
- 3: https://www.postgresql.org/docs/16/datatype-json.html
- 4: https://stackoverflow.com/questions/75698358/how-to-filter-in-jsonb-column-in-sqlalchemy-orm
- 5: deprecate "wrong" JSON operators (at least contains()) that are misleading since they overlap with JSONB; generalize to all types sqlalchemy/sqlalchemy#12736
- 6: https://stackoverflow.com/questions/39470239/sqlalchemy-json-column-how-to-perform-a-contains-query/39470478
🌐 Web query:
SQLAlchemy JSON type contains method implementation __contains__ comparator
💡 Result:
In SQLAlchemy, the JSON/JSONB “contains” behavior is implemented as a type-level Comparator method, not via Python’s contains. The common operator call pattern is: <json_column>.comparator.contains(value) (for JSONB), which uses a SQL operator/function underneath. 1) Where “contains” lives - For PostgreSQL JSONB, SQLAlchemy provides JSONB.Comparator.contains, which generates the “@>” operator semantics (“Test if keys (or array) are a superset of/contained the keys of the argument …” and “equivalent of the @> operator”). - Source-level evidence: the PostgreSQL JSON type defines a Comparator class with a contains method that returns self.operate(CONTAINS, other, result_type=Boolean). 2) Why you can see NotImplementedError / wrong operator issues - JSON.Comparator.contains/contains-like operator behavior has historically been tricky across backends because different backends implement different semantics. In particular, a recent SQLAlchemy issue discusses “wrong JSON operators (at least contains) that are misleading since they overlap with JSONB” and proposes overriding these on JSON.Comparator (to warn/raise) and using a backend-correct recipe. This indicates that calling contains on generic JSON may be deprecated/unsupported or at least not equivalent to JSONB semantics. - Separately, for generic JSON on some backends (e.g., MySQL), using ColumnOperators.contains / Python “in” style membership may not be supported, leading to errors such as “Operator 'contains' is not supported on this expression”. A common workaround is to use the backend JSON function that implements array membership/containment. 3) Practical query patterns (what to use instead) A) PostgreSQL (JSONB) - Use JSONB.Comparator.contains (or just JSONB column’s .contains via comparator). - Example conceptually: MyModel.my_jsonb.comparator.contains([ ... ]) (the contained value must be a JSON fragment). B) MySQL (JSON) - Use MySQL’s JSON_CONTAINS(json_doc, val[, path]) via SQLAlchemy func.json_contains(...). - Example from a working answer: session.query(Story).filter(func.json_contains(Story.section_ids, X) == 1). C) Generic approach - If you’re using SQLAlchemy’s generic JSON type with a with_variant to JSONB on PostgreSQL, be careful: SQLAlchemy’s JSON operator support and coercions vary by backend; SQLAlchemy maintainers explicitly call out the need for backend-correct operator recipes and that “contains” semantics may not match across types/backends. Bottom line - SQLAlchemy’s JSON contains operator is implemented via JSON/JSONB Comparator methods (not via Python’s contains). - If you need “contains” semantics, use JSONB.Comparator.contains on PostgreSQL/JSONB; on MySQL/JSON, use func.json_contains (JSON_CONTAINS) rather than relying on generic .contains/membership operator compilation.
Citations:
- 1: https://www.sqlalchemy.org/trac/ticket/12736
- 2: https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/dialects/postgresql/json.py
- 3: https://stackoverflow.com/questions/39470239/sqlalchemy-json-column-how-to-perform-a-contains-query/39470478
- 4: https://docs.sqlalchemy.org/en/14/orm/queryguide.html
- 5: deprecate "wrong" JSON operators (at least contains()) that are misleading since they overlap with JSONB; generalize to all types sqlalchemy/sqlalchemy#12736
🌐 Web query:
SQLAlchemy type_coerce JSON contains comparator PostgreSQL JSONB
💡 Result:
In SQLAlchemy with PostgreSQL JSONB columns, use the built-in contains comparator method on the JSONB.Comparator to perform the PostgreSQL @> containment operator. This checks if the JSONB column contains the specified JSON structure. Define the column as: from sqlalchemy.dialects.postgresql import JSONB from sqlalchemy.orm import Mapped, mapped_column class MyModel(Base): tablename = 'mytable' data: Mapped[dict] = mapped_column(JSONB) Example queries: 1. Contains a specific key-value pair: stmt = select(MyModel).where(MyModel.data.contains({"status": "active"})) 2. Contains an array element: stmt = select(MyModel).where(MyModel.data.contains([42])) 3. Contains a nested object within an array: key = [{"a": "333"}] stmt = select(MyModel).where(MyModel.data.contains(key)) The other argument to contains must be a valid JSON-serializable structure (dict, list, etc.), not a string. type_coerce is not required for the standard contains operator on JSONB columns, as SQLAlchemy handles the type coercion internally for this PostgreSQL-specific comparator. However, type_coerce may be used in custom scenarios, such as when indexing into JSONB (e.g., type_coerce(column['key'], JSONB).contains(value) if the indexed expression needs explicit typing) or for cross-dialect custom functions. This is directly from SQLAlchemy's PostgreSQL dialect source and documentation, current as of 2026.
Citations:
- 1: https://github.com/zzzeek/sqlalchemy/blob/master/lib/sqlalchemy/dialects/postgresql/json.py
- 2: https://stackoverflow.com/questions/39470239/sqlalchemy-json-column-how-to-perform-a-contains-query
- 3: http://docs.sqlalchemy.org/en/latest/dialects/postgresql.html
- 4: https://stackoverflow.com/questions/39470239/sqlalchemy-json-column-how-to-perform-a-contains-query/39470478
- 5: deprecate "wrong" JSON operators (at least contains()) that are misleading since they overlap with JSONB; generalize to all types sqlalchemy/sqlalchemy#12736
- 6: Querying a JSON column on a boolean field: type_coerce doesn't coerce True sqlalchemy/sqlalchemy#4969
- 7: https://stackoverflow.com/questions/75698358/how-to-filter-in-jsonb-column-in-sqlalchemy-orm
- 8: flip off the "hashable" flag for PG unhashable types: JSON, JSONB (already has it), ARRAY. also NULLTYPE, add/test type_coerce as well as cast() to indexed access sqlalchemy/sqlalchemy#3499
- 9: http://docs.sqlalchemy.org/en/latest/core/custom_types.html
contains() is delegating to the wrong comparator and bypassing dialect-specific logic.
type_coerce(..., JSON()).contains() forces all dialects to use the generic JSON comparator, which SQLAlchemy explicitly warns against—"wrong JSON operators (at least contains) that are misleading since they overlap with JSONB". On PostgreSQL, the underlying type is JSONB (from load_dialect_impl()), which has a different contains() semantics than the generic JSON type.
Instead, delegate to the dialect-specific comparator directly:
def contains(self, other: Any, **kwargs: Any) -> Any:
return self.expr.comparator.contains(other, **kwargs)This allows load_dialect_impl() to control which type's comparator is used: JSONB on PostgreSQL (with @> operator), JSON on SQLite (with appropriate backend handling).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/db_types.py` around lines 35 - 40, The contains method is forcing the
generic JSON comparator via type_coerce(self.expr, SaJSON()) which bypasses
dialect-specific comparators (e.g., JSONB on Postgres); change contains to
delegate to the underlying dialect-aware comparator by calling
self.expr.comparator.contains(other, **kwargs) so load_dialect_impl() can supply
the correct JSON/JSONB comparator and operators.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/test_sqlite_json_ops.py`:
- Around line 276-285: The test currently only checks HTTP 200; parse the JSON
response from sqlite_client.get("/v3/workspaces/{ws}/queue/status") (variable r)
and assert the unfiltered result contains the previously enqueued item (e.g.
check presence of expected id/payload fields), then call sqlite_client.get with
params={"observer_id": peer} and assert the filtered JSON only contains rows for
that observer (verify observer field equals peer and/or that non-matching rows
are excluded); use the existing variables r, ws, peer and the same endpoint to
locate where to add these assertions.
- Around line 94-124: The fixture sqlite_cache calls cache.setup(...) but does
not restore the process-wide cache client; in the finally block you must close
or reinitialize the global cache client so teardown matches the restored
settings—import src.cache.client (or reference client in your cache module) and
if src.cache.client.cache exposes a close/shutdown coroutine call it (await if
needed), otherwise set src.cache.client.cache = None and/or call
cache.setup(...) with the original settings to reinitialize; ensure this cleanup
runs after stopping the redis patch and before restoring ControlMixin._disable
and settings so subsequent tests use the real backend.
- Around line 241-246: Make the as_boolean observer-count path test actually
trigger rejection by setting settings.SESSION_OBSERVERS_LIMIT down to 1 before
calling sqlite_client.put (so the updated observer count exceeds the limit) and
then assert that the response is a rejection (e.g., assert r.status_code not in
(200, 204) or assert r.status_code == 403). Use the test harness (monkeypatch or
equivalent) to set settings.SESSION_OBSERVERS_LIMIT = 1 around the PUT to
/v3/workspaces/.../peers/{target}/config and restore afterwards; this ensures
the JSON boolean query branch in src/crud/session.py (observer-count logic) is
exercised and failures are detected.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ce6e6372-22b4-4edd-acd9-fe9b53324c9e
📒 Files selected for processing (1)
tests/test_sqlite_json_ops.py
| @pytest_asyncio.fixture(scope="module") | ||
| async def sqlite_cache(): | ||
| original_enabled = settings.CACHE.ENABLED | ||
| original_url = settings.CACHE.URL | ||
| original_disable = ControlMixin._disable # pyright: ignore[reportPrivateUsage] | ||
|
|
||
| fake_redis = FakeAsyncRedis(decode_responses=True) | ||
|
|
||
| def fake_from_url(*_a: Any, **_kw: Any): | ||
| return fake_redis | ||
|
|
||
| @property # type: ignore | ||
| def patched_disable(self): # pyright: ignore | ||
| try: | ||
| return original_disable.fget(self) # pyright: ignore[reportOptionalCall] | ||
| except LookupError: | ||
| return set() # pyright: ignore | ||
|
|
||
| redis_patch = patch("redis.asyncio.from_url", fake_from_url) | ||
| redis_patch.start() | ||
| try: | ||
| ControlMixin._disable = patched_disable # pyright: ignore | ||
| settings.CACHE.ENABLED = True | ||
| settings.CACHE.URL = "redis://fake:6379/0" | ||
| cache.setup("redis://fake:6379/0", pickle_type=PicklerType.SQLALCHEMY, enable=True) | ||
| yield fake_redis | ||
| finally: | ||
| redis_patch.stop() | ||
| ControlMixin._disable = original_disable # pyright: ignore | ||
| settings.CACHE.ENABLED = original_enabled | ||
| settings.CACHE.URL = original_url |
There was a problem hiding this comment.
Restore the global cache backend in teardown.
cache.setup(...) mutates process-wide state, but this fixture only restores settings and the Redis patch. After this module, later tests can still keep talking to the fake backend with the patch removed. Please close/reinitialize src.cache.client.cache in finally so the backend matches the restored settings again.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_sqlite_json_ops.py` around lines 94 - 124, The fixture
sqlite_cache calls cache.setup(...) but does not restore the process-wide cache
client; in the finally block you must close or reinitialize the global cache
client so teardown matches the restored settings—import src.cache.client (or
reference client in your cache module) and if src.cache.client.cache exposes a
close/shutdown coroutine call it (await if needed), otherwise set
src.cache.client.cache = None and/or call cache.setup(...) with the original
settings to reinitialize; ensure this cleanup runs after stopping the redis
patch and before restoring ControlMixin._disable and settings so subsequent
tests use the real backend.
| # Update target's config — this triggers the observer-count query (as_boolean path) | ||
| r = sqlite_client.put( | ||
| f"/v3/workspaces/{ws}/sessions/{sess}/peers/{target}/config", | ||
| json={"observe_others": True}, | ||
| ) | ||
| assert r.status_code in (200, 204), f"set_peer_config failed: {r.status_code} {r.text}" |
There was a problem hiding this comment.
Make the as_boolean() path observable.
In src/crud/session.py:1115-1160, the JSON boolean query only changes behavior when the observer count crosses settings.SESSION_OBSERVERS_LIMIT. With a 2xx-only assertion here, a broken count still passes as long as the limit stays above 1. Lower the limit for this test and assert the expected rejection so this actually catches the SQLite regression.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_sqlite_json_ops.py` around lines 241 - 246, Make the as_boolean
observer-count path test actually trigger rejection by setting
settings.SESSION_OBSERVERS_LIMIT down to 1 before calling sqlite_client.put (so
the updated observer count exceeds the limit) and then assert that the response
is a rejection (e.g., assert r.status_code not in (200, 204) or assert
r.status_code == 403). Use the test harness (monkeypatch or equivalent) to set
settings.SESSION_OBSERVERS_LIMIT = 1 around the PUT to
/v3/workspaces/.../peers/{target}/config and restore afterwards; this ensures
the JSON boolean query branch in src/crud/session.py (observer-count logic) is
exercised and failures are detected.
| # Plain queue status — exercises as_string() in SELECT expressions | ||
| r = sqlite_client.get(f"/v3/workspaces/{ws}/queue/status") | ||
| assert r.status_code == 200, f"queue/status failed: {r.status_code} {r.text}" | ||
|
|
||
| # Filtered queue status — exercises as_string() in WHERE clause too | ||
| r = sqlite_client.get( | ||
| f"/v3/workspaces/{ws}/queue/status", | ||
| params={"observer_id": peer}, | ||
| ) | ||
| assert r.status_code == 200, f"queue/status?observer_id failed: {r.status_code} {r.text}" |
There was a problem hiding this comment.
Assert the filtered queue result, not just the status code.
Both queue-status calls only check for HTTP 200, so this test still passes if payload["observer"].as_string() compiles but returns the wrong rows. Please assert that the unfiltered response contains the enqueued item and that the observer_id filter narrows the result set to the expected observer.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_sqlite_json_ops.py` around lines 276 - 285, The test currently
only checks HTTP 200; parse the JSON response from
sqlite_client.get("/v3/workspaces/{ws}/queue/status") (variable r) and assert
the unfiltered result contains the previously enqueued item (e.g. check presence
of expected id/payload fields), then call sqlite_client.get with
params={"observer_id": peer} and assert the filtered JSON only contains rows for
that observer (verify observer field equals peer and/or that non-matching rows
are excluded); use the existing variables r, ws, peer and the same endpoint to
locate where to add these assertions.
Moves settings.DB.CONNECTION_URI assignment inside the try so every mutation is covered by the finally restoration block, silencing the remaining CodeRabbit warning. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
What this fixes
The SQLite support added in #405 broke on queries that use JSON field accessors. The
JSONBCompatcustomTypeDecoratorwasn't forwarding JSON-specific operators to the underlyingJSON/JSONBtype, so SQLAlchemy fell back to wrong SQL on SQLite (e.g.LIKEinstead ofjson_extract).Three code paths were affected:
src/crud/deriver.py.astexton queue payload fields.as_string()src/crud/session.py.astext.cast(Boolean)onobserve_othersconfig.as_boolean()src/utils/filter.py.astexton metadata filter fields.as_string()What changed
src/db_types.py— AddedComparatorsubclasses toJSONBCompatandVectorTypeso that__getitem__,as_string(),as_boolean(),contains(), andcosine_distance()properly delegate to the underlyingJSON/pgvector types instead of falling back to baseColumnOperatorssrc/crud/deriver.py—.astext→.as_string()(queue status query)src/crud/session.py—.astext.cast(Boolean)→.as_boolean()(observer count queries)src/utils/filter.py—.astext→.as_string()(metadata filter queries)Tests
Added
tests/test_sqlite_json_ops.pywith 3 focused tests that run against a real in-memory SQLite database (no PostgreSQL required):test_metadata_filter_as_string— list messages with a metadata equality filtertest_observe_others_as_boolean— update peer config withobserve_othersinside a sessiontest_queue_status_as_string— fetch workspace queue status plain + filtered by observerHow to test locally