Skip to content

*: Add embedded DuckDB storage backend#6357

Open
thorfour wants to merge 1 commit into
remove-frostdb-phase3from
duckdb-backend
Open

*: Add embedded DuckDB storage backend#6357
thorfour wants to merge 1 commit into
remove-frostdb-phase3from
duckdb-backend

Conversation

@thorfour
Copy link
Copy Markdown
Contributor

@thorfour thorfour commented May 9, 2026

Summary

Adds DuckDB as a second storage backend behind a new --storage-backend={clickhouse,duckdb} flag (default clickhouse). DuckDB is embedded — single file or in-memory, no separate server. Useful for single-node deployments where running a ClickHouse server is overkill.

The architecture refactor in the FrostDB removal made this a clean drop-in: profilestore.Ingester and query.Querier are the only interfaces a backend must satisfy.

Stacks on #6356.

What's in the package

pkg/duckdb/ mirrors pkg/clickhouse/ (~2k LoC, comparable shape):

  • client.godatabase/sql connection. Empty path → in-memory; non-empty path → file (created if missing). Pinned to one connection (DuckDB is single-writer per process).

  • schema.go — DDL using DuckDB-native composite types:

    • labels: MAP(VARCHAR, VARCHAR)
    • stacktrace: STRUCT(address, mapping_*, line/function info)[] (LIST)

    Differs from the ClickHouse layout (parallel arrays in Nested, JSON labels) but maps cleanly onto the same row shape the ingester produces.

  • ingester.go — Arrow record → Appender API. Decodes the encoded location blobs into struct-shaped maps for the LIST<STRUCT> column; labels populate as duckdb.Map.

  • filter.go — Prometheus label-matcher → DuckDB SQL. Uses element_at(labels, 'k')[1] for label access (NULL when key absent) and regexp_matches for regex matchers.

  • querier.go — full Querier interface. SQL translations from ClickHouse to DuckDB:

    ClickHouse DuckDB
    arrayJoin(...) UNNEST(...) + scalar subquery
    JSONAllPaths(map) map_keys(...) + UNNEST
    intDiv(a, b) (a / b)::BIGINT
    groupArray(tuple(...)) LIST({...} ORDER BY)
    SUM(BIGINT) SUM(...)::BIGINT (downcast HUGEINT)

    Stacktrace results scan into duckdb.Composite[[]stacktraceLoc] via field-name mapping — much cleaner than CH's parallel-array dance.

  • duckdb_test.go — round-trip smoke test: builds a synthetic Arrow record, ingests, then calls every Querier method (HasProfileData, ProfileTypes, Labels, Values, QueryRange, QuerySingle, QueryMerge, GetProfileMetadataMappings, GetProfileMetadataLabels) and asserts.

Wiring

pkg/parca/parca.go branches on flags.StorageBackend to wire either the ClickHouse or DuckDB ingester + querier. Shutdown collapses to a single closeBackend func. README regenerated for the new flags.

go.mod

  • + github.com/marcboeker/go-duckdb/v2 v2.4.3. The project moved to github.com/duckdb/duckdb-go at v2.5.0+; will migrate when that release stabilises.
  • DuckDB requires CGo. Cross-compile needs CC=... CGO_ENABLED=1. Pre-built bindings are pulled in for darwin/linux × amd64/arm64 + windows/amd64 via github.com/duckdb/duckdb-go-bindings.

Test plan

  • go build ./...
  • go vet ./...
  • go test -short ./... — green, including the new pkg/duckdb round-trip test
  • CI green
  • Manual smoke: parca --storage-backend=duckdb --duckdb-path=/tmp/parca.db and ingest a profile end-to-end
  • Bench DuckDB ingest throughput against expected scrape rates

Follow-ups

  • Migrate import to github.com/duckdb/duckdb-go once v2.5+ is the canonical
  • testcontainers-based integration tests (DuckDB doesn't need a container, but the same harness shape applies)
  • Goreleaser CGo cross-compile matrix verification

🤖 Generated with Claude Code

Adds DuckDB as a second storage backend behind a new
--storage-backend={clickhouse,duckdb} flag (default clickhouse). DuckDB
is embedded — single file or in-memory, no separate server. Useful for
single-node deployments where running ClickHouse is overkill.

The architecture refactor in the FrostDB removal made this a drop-in:
profilestore.Ingester and query.Querier are the only interfaces a
backend must satisfy.

New package pkg/duckdb mirrors pkg/clickhouse:

* client.go — opens a DuckDB connection via database/sql. Empty path
  means in-memory; non-empty means a file path that will be created if
  missing. Pinned to one connection (DuckDB is single-writer).
* schema.go — DDL using DuckDB-native composite types:
  - labels:     MAP(VARCHAR, VARCHAR)
  - stacktrace: STRUCT(address, mapping_*, line/function info)[]
  Differs from the ClickHouse layout (parallel arrays in a Nested
  type, JSON labels) but maps cleanly onto the same row shape the
  ingester produces.
* ingester.go — Arrow record → Appender API. Decodes the encoded
  location blobs into a slice of struct-shaped maps for the LIST<STRUCT>
  column; labels populate as duckdb.Map.
* filter.go — Prometheus label-matcher → DuckDB SQL. Uses
  element_at(labels, 'k')[1] for label access (NULL when key absent)
  and regexp_matches for regex matchers.
* querier.go — full Querier interface (Labels, Values, ProfileTypes,
  HasProfileData, QueryRange, QuerySingle, QueryMerge,
  GetProfileMetadataMappings, GetProfileMetadataLabels). SQL ports
  from ClickHouse to DuckDB:
    arrayJoin           → UNNEST + scalar subquery
    JSONAllPaths(map)   → map_keys + UNNEST
    intDiv(a, b)        → (a / b)::BIGINT
    groupArray(tuple()) → LIST({...} ORDER BY)
    SUM(BIGINT)         → SUM(...)::BIGINT (downcast HUGEINT result)
  Stacktrace results scan into duckdb.Composite[[]stacktraceLoc] via
  field-name mapping rather than CH's parallel-array dance.
* duckdb_test.go — round-trip smoke test: build a synthetic Arrow
  record, ingest, then call every Querier method and assert.

pkg/parca/parca.go: branches on flags.StorageBackend to wire either
the ClickHouse or DuckDB ingester+querier. Shutdown collapses to a
single closeBackend func.

go.mod: + github.com/marcboeker/go-duckdb/v2 v2.4.3 (note: project
moved to github.com/duckdb/duckdb-go for v2.5.0+; we'll migrate when
that release stabilises). DuckDB requires CGo, so cross-compile needs
CC=... CGO_ENABLED=1.

README.md regenerated for the new flags.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@alwaysmeticulous
Copy link
Copy Markdown

alwaysmeticulous Bot commented May 9, 2026

✅ Meticulous spotted 0 visual differences across 288 screens tested: view results.

Meticulous evaluated ~4 hours of user flows against your PR.

Expected differences? Click here. Last updated for commit 6310c95 *: Add embedded DuckDB storage backend. This comment will update as new commits are pushed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant