feat(inference): add native Anthropic (Claude) provider by johnford2002 · Pull Request #2889 · karakeep-app/karakeep

johnford2002 · 2026-06-15T04:04:26Z

Summary

Adds a first-class Anthropic (Claude) inference provider, selected by ANTHROPIC_API_KEY, using the official @anthropic-ai/sdk Messages API.

Today Claude can only be used by pointing OPENAI_BASE_URL at Anthropic's OpenAI-compatibility endpoint, which ignores strict/json_schema enforcement — so Karakeep's default structured tagging output isn't actually enforced, leading to occasional malformed JSON and failed tagging jobs. This native provider uses Anthropic's Structured Outputs for guaranteed schema conformance.

What it does

New AnthropicInferenceClient in packages/shared/inference.ts, selected in InferenceClientFactory.build() when ANTHROPIC_API_KEY is set (precedence: OpenAI → Anthropic → Ollama).
Text + image (vision) inference for auto-tagging and summarization.
Maps the existing structured / json / plain output modes onto Anthropic's output_config.format json_schema, reusing the same z.toJSONSchema the Ollama client already uses.
Defaults to claude-haiku-4-5 when no Claude model is configured (override via INFERENCE_TEXT_MODEL / INFERENCE_IMAGE_MODEL).
New env vars: ANTHROPIC_API_KEY, optional ANTHROPIC_BASE_URL. Docs updated under 03-configuration.

The change is purely additive — no existing OpenAI/Ollama logic is modified.

Limitations

Anthropic has no embeddings API (they recommend third-party providers), so generateEmbeddingFromText throws a clear, documented error. Semantic search still requires a separate embedding provider (OpenAI/Ollama).

Test plan

New unit tests in packages/shared/inference.test.ts (8) cover text, image, structured-output mapping, model-default substitution, and the embeddings error.
pnpm format, pnpm lint, pnpm typecheck, pnpm build all pass.
@karakeep/shared test suite green (111 tests).

🤖 Generated with Claude Code

Spec for adding configurable PostgreSQL support alongside SQLite, enabling users with remote/NAS-hosted databases to avoid SQLite's poor network filesystem performance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

16-task plan covering configuration, dual schemas, dialect factory, error abstraction, migrations, SQL helpers, documentation, and a SQLite-to-PostgreSQL data migration script. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Adds DATABASE_DIALECT env var (sqlite|postgresql, default sqlite) and PostgreSQL connection fields (DATABASE_URL or individual host/port/user/password/name) to the Zod config schema, with validation requiring one of the two forms when postgresql dialect is selected. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Move all 21 Drizzle relation definitions from schema.ts into a new schema.relations.ts file, re-exported via schema.ts. This enables future PostgreSQL schema files to share the same dialect-agnostic relation definitions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Move table definitions from schema.ts to schema.sqlite.ts to make room for a future schema.pg.ts. Create a thin schema.ts entry point that re-exports from both schema.sqlite.ts and schema.relations.ts. Update schema.relations.ts to import directly from schema.sqlite.ts to avoid a circular dependency through schema.ts. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Creates packages/db/schema.pg.ts as the PostgreSQL equivalent of schema.sqlite.ts, mirroring all 32 tables, indexes, and constraints using drizzle-orm/pg-core types (pgTable, boolean, timestamp with timezone, doublePrecision, jsonb, AnyPgColumn). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add isUniqueConstraintError predicate to packages/db that handles both SQLite (SQLITE_CONSTRAINT_UNIQUE/PRIMARYKEY) and PostgreSQL (code 23505) unique constraint violations, replacing direct SqliteError usage in trpc models. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replace direct SQLite database creation with a factory pattern that creates either a SQLite or PostgreSQL connection based on the configured DATABASE_DIALECT. External driver packages (better-sqlite3, postgres) are loaded via createRequire to ensure only the active dialect's driver is loaded at runtime. Exports new `dialect` and `KarakeepDBTransaction` (now dialect-agnostic) from the db package. Also updates migrate.ts to handle dialect-aware migration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…esting - Make schema.ts dialect-aware so Drizzle serializes timestamps correctly (SQLite uses epoch integers, PostgreSQL uses native timestamps) - Add bigint→Number parser for PostgreSQL COUNT/SUM aggregates (OID 20) - Fix migrate.ts to exit after PostgreSQL migration (postgres.js keeps event loop alive) - Skip SQLite-specific transaction config (behavior: "immediate") on PG - Add docker-compose.postgres.yml overlay for optional PostgreSQL with any compose file - Fix docker-compose.dev.yml node_modules volume for cross-platform builds - Update PostgreSQL docs with Docker overlay usage Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The migrations moved from drizzle/ to migrations/ (with sqlite/ and pg/ subdirectories). Update the Dockerfile to copy the new directory structure into the db_migrations bundle. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ibility Both ncc (migration bundle) and webpack (Next.js) replace createRequire(import.meta.url) with undefined, causing runtime crashes in the production Docker image. Switch to await import() which both bundlers handle correctly. Also makes getInMemoryDB async and updates test callers accordingly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Remove docs/superpowers/ planning documents (not needed in the PR) - Simplify isUniqueConstraintError to use duck-typing instead of importing SqliteError, avoiding a hard dependency on better-sqlite3 when running PostgreSQL Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…elper PostgreSQL (postgres.js) returns .count on mutation results while SQLite (better-sqlite3) returns .changes. Rather than updating ~25 call sites, patch the postgres.js Result prototype at connection time so .changes aliases .count — zero consumer changes needed. Also extracts buildPgConnectionString() into @karakeep/shared/config to DRY the identical connection string construction in drizzle.ts, drizzle.config.pg.ts, and migrate-to-pg.ts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… script The TABLE_SPECS entry referenced the Drizzle field name 'addedAt' but the SQL column produced by createdAtField() is 'createdAt'. Since the migration reads raw SQL column names via SELECT *, the timestamp conversion was being skipped for this table.

Passwords containing URI-special characters (@, :, /, #) would produce a malformed connection string. Apply encodeURIComponent() to user and password fields.

- Use documented types constructor option for bigint parsing instead of reaching into client.options.parsers (undocumented internal) - Add fail-fast verification after .changes prototype patch so incompatible postgres.js versions fail at startup, not silently at runtime - Export close() for graceful database connection shutdown Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Uses the new close() export from drizzle.ts to shut down the database connection cleanly, allowing the process to exit naturally instead of calling process.exit(0) which skips cleanup handlers.

The overlay now configures all services that need database access. Services not present in the base compose file (e.g., workers/prep in production compose) are silently ignored by docker compose.

Standard POSIX convention; prevents git diff noise on future edits.

DDL statements (CREATE TABLE) leave postgres.js .count as null, so the fail-fast guard was incorrectly triggering. Use DELETE (a DML statement) which returns a numeric count, and clean up the temp table. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

PostgreSQL requires all non-aggregated SELECT columns to appear in GROUP BY. The tags.get query selected bookmarkTags.id and name but only grouped by tagsOnBookmarks.attachedBy. SQLite is lenient about this; PostgreSQL is not. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Verifies that schema.sqlite.ts, schema.pg.ts, and the schema.ts re-export shim stay in sync — matching table names, SQL table names, and column names across both dialects. Catches drift when a table or column is added to one schema but not the other. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore(github): rebrand fork's workflows and templates for johnford2002

Resolves conflicts from upstream's API key scopes feature and rule-engine multi-list refactor, propagating both into the dialect-specific schema files (schema.sqlite.ts and schema.pg.ts) since our fork uses schema.ts as a dialect-routing entry-point shim. Conflict resolutions: - packages/db/schema.ts: kept entry-point shim (HEAD) - packages/db/schema.sqlite.ts: added apiKeys.scopes column, removed ruleEngineRulesTable.listId and its FK - packages/db/schema.pg.ts: same as sqlite, using jsonb for scopes - packages/db/package.json: combined upstream's --ignore-path flags with our migrate-to-pg/test scripts - packages/db/migrations/sqlite/: accepted upstream 0083 and 0084 migrations at the new path (renamed from drizzle/ in our fork) - packages/shared/config.ts: kept both PostgreSQL validation and upstream's OTLP endpoint validation - packages/trpc/models/lists.ts: kept isUniqueConstraintError abstraction, added KarakeepDBTransaction import - docker/docker-compose.dev.yml: accepted upstream (superset of our node_modules volume change) - pnpm-lock.yaml: regenerated via pnpm install

Adds PG equivalent of the SQLite migrations 0083 (apiKey scopes) and 0084 (rule engine multi-list refactor) brought in by the upstream merge. The auto-generated migration was missing two things needed for parity with the SQLite version: 1. A SQL DEFAULT for apiKey.scopes to backfill existing rows (matches SQLite's `DEFAULT '["fullaccess"]'`). 2. Data migration for the event JSON in ruleEngineRules: delete rules with empty listId in addedToList/removedFromList events, then transform single listId to a listIds array. Both data-migration steps mirror what upstream's SQLite migration does using SQLite's json_extract/json_set, translated to PG's jsonb operators. Verified with `drizzle-kit check` (schema/migration consistent) and the schema-sync test (column lists match across dialects).

Bumps the PostgreSQL overlay from postgres:16-alpine to postgres:18-alpine to match the production deployment, and fixes the volume mount that PG18 rejects. PG18+ stores data in a version-specific subdirectory under /var/lib/postgresql (e.g. /var/lib/postgresql/18/docker). The image refuses to start if the volume is mounted at the legacy /var/lib/postgresql/data path, even on a fresh install. Mounting one level up lets PG18 manage its own subdirectory layout. Also documents the migration-smoke-test command pattern in the overlay header: pair the overlay with a base compose file and `up postgres -d` to bring up just postgres, then run `pnpm --filter=@karakeep/db migrate` from the host. Verified end-to-end against postgres:18-alpine (PG 18.4). Upgrading an existing PG16 fork deployment to this overlay is a breaking change — the volume layout is incompatible and requires pg_upgrade or a dump/restore.

Image tag convention is unprefixed semver (e.g. karakeep:1.0.0) per the docker ecosystem. Refactor the matrix to hold base image names and compute tags (and SERVER_VERSION) in bash, stripping the leading 'v' from github.event.release.name. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@claude

Rename our actively-maintained workflows with a fork- prefix so future merges of upstream don't collide, and move all upstream workflows into .github/workflows/upstream/ where GitHub Actions won't auto-discover them. This lets future upstream syncs land cleanly; new upstream workflows show up at .github/workflows/<name>.yml and can be moved into upstream/ during the merge. Active (fork-*): - fork-ci.yml (CI checks) - fork-docker.yml (image builds to ghcr.io/johnford2002/*) - fork-claude.yml (@claude integration, gated to johnford2002) Shelved in upstream/: - android, cli, extension, ios, mcp, sdk Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chore(workflows): namespace fork workflows; shelve upstream

Resolves conflicts from upstream commits 1997bd3..e92f2f0: - .github/workflows/docker.yml: upstream-modified version shelved to .github/workflows/upstream/docker.yml (kept fork-docker.yml active). - .github/workflows/claude.yml: accepted upstream deletion; our renamed fork-claude.yml (johnford2002-gated) remains the active workflow. - pnpm-lock.yaml: regenerated with pnpm 11.2.1 against upstream's Next.js 16 + pnpm 11 upgrades. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chore: sync upstream/main (resolves PR #10 conflicts)

chore: Sync upstream/main

Sync ~30 upstream commits (karakeep-app/main @ 3afe675, "26k stars"). Conflict resolution (all in packages/db, from the fork's Postgres dialect split): - schema.ts: kept the fork's thin dialect-router (re-exports schema.sqlite / schema.pg). Upstream's only schema change was a single new column, `embeddingStatus` (bookmark embedding support, karakeep-app#2857), so the router is preserved as-is and the column is ported into both dialect schemas. - schema.sqlite.ts / schema.pg.ts: added `embeddingStatus` text column (enum pending|failure|success, default 'pending') to the bookmarks table, alongside taggingStatus/summarizationStatus. - Migrations: discarded upstream's drizzle/0085 files (their snapshot is based on upstream's monolithic schema and is incompatible with the fork's per-dialect snapshot lineage). Regenerated fork-consistent migrations via drizzle-kit: sqlite 0085_add_embedding_status, pg 0002_add_embedding_status (both a single ADD COLUMN). Verified: typecheck, lint+sherif, format, OpenAPI regen (no diff), and unit tests (db/trpc/shared/shared-server) all pass. Docker-dependent suites (workers/e2e) and the postgres:18 migration smoke test were not run locally (Docker daemon unavailable); CI covers them. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chore: Sync upstream/main (2026-06-13)

The server is silent about what the database is doing — queries aren't logged at all (Drizzle is created without a logger), which makes it hard to understand activity or diagnose latency. - DB_QUERY_LOGGING env (default false): when enabled, every SQL query is logged at debug via the app logger, including params. Gated by its own flag (not just LOG_LEVEL) because it is high-volume; when off, Drizzle never invokes the logger (zero cost). Wired into both the pg and sqlite builders. - DB connection lifecycle logs (info): "connecting to PostgreSQL host:port/db" + "connection established" (and the SQLite open path), so connects and reconnects are visible. - Documented DB_QUERY_LOGGING in the env-vars reference. Motivation: diagnosing a ~25s stall on the first concurrent wave of /api/assets requests. With query logging on we can see whether the asset request's DB query fires immediately (DB-side cost) or only after the stall (app blocks before the query). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(db): optional SQL query logging + DB connection lifecycle logs

postgres.js opens pool connections lazily, so the first burst of concurrent requests (e.g. a cold-cache bookmark page firing ~7 parallel /api/assets requests) must open several connections at once. When establishing a connection is slow — which it is in this deployment (~4–5s per connect, the signature of reverse-DNS-on-connect) — that cost stacks and produced a ~30s global DB stall on the first page load, after which the warm pool served everything in ~60ms. - DATABASE_POOL_SIZE env (default 10) sets the postgres.js pool `max`. - After the startup probe, hold `poolSize` connections open simultaneously (pg_sleep) to force the pool to fill at boot, where latency is harmless. Pre-warm failures are logged and non-fatal (the probe already validated connectivity). This is a mitigation: it moves the connection cost to startup so users never hit it. The underlying slow connect (reverse DNS on the PG server) should be fixed separately. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(db): pre-warm the Postgres connection pool at startup

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Schema: bookmarks.captureVideo (tri-state) + bookmarkLinks.videoDownloadStatus, with sqlite + pg migrations. Types and model hydration surface both fields. Logic (TDD): resolveShouldCaptureVideo (per-bookmark override over server default) and isBookmarkStillDownloadingVideo, which keeps the bookmark polling alive while a video downloads. Workers: crawler enqueues video using the resolved capture decision and marks the link pending; the video worker writes downloading/success/failure, and clears the status to null for non-video URLs so they don't show as failures. tRPC: updateBookmark accepts captureVideo and, when toggled effectively-on for an already-crawled link with no video, enqueues a download. New admin-scoped admin.videoConfig exposes the read-only server video settings. OpenAPI spec regenerated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- VideoCaptureBox in the bookmark preview sidebar: tri-state capture control (Default/On/Off) wired to updateBookmark, plus a status line (downloading / saved / failed + Retry) so progress shows next to the bookmark. - LinkContentSection video tab: enabled while a download is pending/downloading or failed (not just when present); renders a spinner or failure message instead of a dead disabled tab. - Admin overview: read-only VideoConfigCard (enabled default, max size, timeout, yt-dlp args) backed by admin.videoConfig. - clientConfig exposes crawler.videoDownloadEnabled so the UI can show the effective server default behind the per-bookmark "Default" option. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

feat(video): first-class video downloading (per-bookmark toggle, progress, admin config)

Includes the ANTHROPIC_* config schema/fields so the new client's fromConfig() typechecks under the pre-commit gate.

johnford2002 · 2026-06-15T04:25:33Z

Superseded by #2890 — the original branch was accidentally based on a downstream fork's main (which carries ~55 unrelated commits), inflating the diff to 250 files. The replacement is based directly on upstream/main and contains only the 6-file Anthropic change.

johnford2002 and others added 30 commits April 11, 2026 20:22

docs: add PostgreSQL support design spec

815e844

Spec for adding configurable PostgreSQL support alongside SQLite, enabling users with remote/NAS-hosted databases to avoid SQLite's poor network filesystem performance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

refactor(db): move SQLite migrations to migrations/sqlite/

9717fb3

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

refactor(db): rename instrumentDatabase to instrumentSqliteDatabase

7635c81

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(db): add per-dialect Drizzle Kit configs and updated scripts

034fb9f

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(db): add PostgreSQL baseline migration

2a8595d

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(db): add domainFromUrl SQL helper for cross-dialect URL parsing

008eb8f

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs: add PostgreSQL configuration environment variables

8c644ac

docs: add PostgreSQL setup guide

cad6b69

feat(db): add SQLite-to-PostgreSQL data migration script

972ff82

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix(db): URL-encode credentials in PostgreSQL connection string builder

30fb179

Passwords containing URI-special characters (@, :, /, #) would produce a malformed connection string. Apply encodeURIComponent() to user and password fields.

fix(db): replace process.exit(0) with graceful close() in migrate.ts

cd64c52

Uses the new close() export from drizzle.ts to shut down the database connection cleanly, allowing the process to exit naturally instead of calling process.exit(0) which skips cleanup handlers.

fix(docker): add workers and prep to PostgreSQL compose overlay

da8ab58

The overlay now configures all services that need database access. Services not present in the base compose file (e.g., workers/prep in production compose) are silently ignored by docker compose.

fix: add trailing newlines to .env.sample files

2bf2bbc

Standard POSIX convention; prevents git diff noise on future edits.

johnford2002 and others added 27 commits May 18, 2026 18:07

Merge pull request #6 from johnford2002/chore/fork-rebrand-github-config

1c6d417

chore(github): rebrand fork's workflows and templates for johnford2002

Merge pull request #8 from johnford2002/chore/workflows-reorg

6ec4953

chore(workflows): namespace fork workflows; shelve upstream

chore: sync upstream/main #9

0c68d6a

Merge pull request #11 from johnford2002/chore/sync-upstream-main

3c9ec6c

chore: sync upstream/main (resolves PR #10 conflicts)

Merge pull request #12 from karakeep-app/main

2e579ab

chore: Sync upstream/main

Merge pull request #14 from johnford2002/chore/sync-upstream-2026-06-13

a429906

chore: Sync upstream/main (2026-06-13)

Merge pull request #15 from johnford2002/feat/db-query-logging

098688a

feat(db): optional SQL query logging + DB connection lifecycle logs

Merge pull request #16 from johnford2002/worktree-db-pool-warmup

aa46f20

feat(db): pre-warm the Postgres connection pool at startup

docs(video): spec for first-class video downloading

3d3c4da

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Merge pull request #17 from johnford2002/feat/video-first-class

be856e2

feat(video): first-class video downloading (per-bookmark toggle, progress, admin config)

build(shared): add @anthropic-ai/sdk dependency

bd74cf8

feat(inference): add AnthropicInferenceClient text inference

d3e69a1

Includes the ANTHROPIC_* config schema/fields so the new client's fromConfig() typechecks under the pre-commit gate.

test(inference): cover Anthropic image content block

23daa4d

test(inference): assert Anthropic embeddings are unsupported

e562e0e

feat(inference): select Anthropic provider via ANTHROPIC_API_KEY

e2e2d84

docs: document the native Anthropic inference provider

ad44ae8

johnford2002 mentioned this pull request Jun 15, 2026

feat(inference): add native Anthropic (Claude) provider johnford2002/karakeep#18

Merged

1 task

johnford2002 closed this Jun 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(inference): add native Anthropic (Claude) provider#2889

feat(inference): add native Anthropic (Claude) provider#2889
johnford2002 wants to merge 61 commits into
karakeep-app:mainfrom
johnford2002:feat/native-anthropic-provider

johnford2002 commented Jun 15, 2026

Uh oh!

johnford2002 commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

johnford2002 commented Jun 15, 2026

Summary

What it does

Limitations

Test plan

Uh oh!

johnford2002 commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant