*: Add embedded DuckDB storage backend#6357
Open
thorfour wants to merge 1 commit into
Open
Conversation
Adds DuckDB as a second storage backend behind a new
--storage-backend={clickhouse,duckdb} flag (default clickhouse). DuckDB
is embedded — single file or in-memory, no separate server. Useful for
single-node deployments where running ClickHouse is overkill.
The architecture refactor in the FrostDB removal made this a drop-in:
profilestore.Ingester and query.Querier are the only interfaces a
backend must satisfy.
New package pkg/duckdb mirrors pkg/clickhouse:
* client.go — opens a DuckDB connection via database/sql. Empty path
means in-memory; non-empty means a file path that will be created if
missing. Pinned to one connection (DuckDB is single-writer).
* schema.go — DDL using DuckDB-native composite types:
- labels: MAP(VARCHAR, VARCHAR)
- stacktrace: STRUCT(address, mapping_*, line/function info)[]
Differs from the ClickHouse layout (parallel arrays in a Nested
type, JSON labels) but maps cleanly onto the same row shape the
ingester produces.
* ingester.go — Arrow record → Appender API. Decodes the encoded
location blobs into a slice of struct-shaped maps for the LIST<STRUCT>
column; labels populate as duckdb.Map.
* filter.go — Prometheus label-matcher → DuckDB SQL. Uses
element_at(labels, 'k')[1] for label access (NULL when key absent)
and regexp_matches for regex matchers.
* querier.go — full Querier interface (Labels, Values, ProfileTypes,
HasProfileData, QueryRange, QuerySingle, QueryMerge,
GetProfileMetadataMappings, GetProfileMetadataLabels). SQL ports
from ClickHouse to DuckDB:
arrayJoin → UNNEST + scalar subquery
JSONAllPaths(map) → map_keys + UNNEST
intDiv(a, b) → (a / b)::BIGINT
groupArray(tuple()) → LIST({...} ORDER BY)
SUM(BIGINT) → SUM(...)::BIGINT (downcast HUGEINT result)
Stacktrace results scan into duckdb.Composite[[]stacktraceLoc] via
field-name mapping rather than CH's parallel-array dance.
* duckdb_test.go — round-trip smoke test: build a synthetic Arrow
record, ingest, then call every Querier method and assert.
pkg/parca/parca.go: branches on flags.StorageBackend to wire either
the ClickHouse or DuckDB ingester+querier. Shutdown collapses to a
single closeBackend func.
go.mod: + github.com/marcboeker/go-duckdb/v2 v2.4.3 (note: project
moved to github.com/duckdb/duckdb-go for v2.5.0+; we'll migrate when
that release stabilises). DuckDB requires CGo, so cross-compile needs
CC=... CGO_ENABLED=1.
README.md regenerated for the new flags.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
✅ Meticulous spotted 0 visual differences across 288 screens tested: view results. Meticulous evaluated ~4 hours of user flows against your PR. Expected differences? Click here. Last updated for commit |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds DuckDB as a second storage backend behind a new
--storage-backend={clickhouse,duckdb}flag (defaultclickhouse). DuckDB is embedded — single file or in-memory, no separate server. Useful for single-node deployments where running a ClickHouse server is overkill.The architecture refactor in the FrostDB removal made this a clean drop-in:
profilestore.Ingesterandquery.Querierare the only interfaces a backend must satisfy.Stacks on #6356.
What's in the package
pkg/duckdb/mirrorspkg/clickhouse/(~2k LoC, comparable shape):client.go—database/sqlconnection. Empty path → in-memory; non-empty path → file (created if missing). Pinned to one connection (DuckDB is single-writer per process).schema.go— DDL using DuckDB-native composite types:labels: MAP(VARCHAR, VARCHAR)stacktrace: STRUCT(address, mapping_*, line/function info)[](LIST)Differs from the ClickHouse layout (parallel arrays in
Nested, JSON labels) but maps cleanly onto the same row shape the ingester produces.ingester.go— Arrow record → Appender API. Decodes the encoded location blobs into struct-shaped maps for theLIST<STRUCT>column; labels populate asduckdb.Map.filter.go— Prometheus label-matcher → DuckDB SQL. Useselement_at(labels, 'k')[1]for label access (NULL when key absent) andregexp_matchesfor regex matchers.querier.go— fullQuerierinterface. SQL translations from ClickHouse to DuckDB:arrayJoin(...)UNNEST(...)+ scalar subqueryJSONAllPaths(map)map_keys(...) + UNNESTintDiv(a, b)(a / b)::BIGINTgroupArray(tuple(...))LIST({...} ORDER BY)SUM(BIGINT)SUM(...)::BIGINT(downcast HUGEINT)Stacktrace results scan into
duckdb.Composite[[]stacktraceLoc]via field-name mapping — much cleaner than CH's parallel-array dance.duckdb_test.go— round-trip smoke test: builds a synthetic Arrow record, ingests, then calls everyQueriermethod (HasProfileData,ProfileTypes,Labels,Values,QueryRange,QuerySingle,QueryMerge,GetProfileMetadataMappings,GetProfileMetadataLabels) and asserts.Wiring
pkg/parca/parca.gobranches onflags.StorageBackendto wire either the ClickHouse or DuckDB ingester + querier. Shutdown collapses to a singlecloseBackendfunc. README regenerated for the new flags.go.mod
+ github.com/marcboeker/go-duckdb/v2 v2.4.3. The project moved togithub.com/duckdb/duckdb-goat v2.5.0+; will migrate when that release stabilises.CC=... CGO_ENABLED=1. Pre-built bindings are pulled in for darwin/linux × amd64/arm64 + windows/amd64 viagithub.com/duckdb/duckdb-go-bindings.Test plan
go build ./...go vet ./...go test -short ./...— green, including the newpkg/duckdbround-trip testparca --storage-backend=duckdb --duckdb-path=/tmp/parca.dband ingest a profile end-to-endFollow-ups
github.com/duckdb/duckdb-goonce v2.5+ is the canonical🤖 Generated with Claude Code