materialize-bigtable: new connector#4357
Merged
Merged
Conversation
Materializes Flow collections to Cloud Bigtable as standard updates. Primary keys pack into FDB tuples and are used directly as the Bigtable row key, which keeps rows ordered by key. Fields and the root document are written as cells in a single column family with MaxVersions(2) — the extra version supports dirty-read detection on transaction retry.
docker-compose runs the gcloud Bigtable emulator on a known port so the boilerplate materialize suite can exercise the full lifecycle (apply, materialize, snapshot) without needing real GCP credentials.
Adds the connector to the build matrix and to the gate of materializations whose integration tests are run on every PR.
…et client metrics * Add an opt-in path for running integration tests against a real Cloud Bigtable instance, since the emulator can't exercise auth, PingAndWarm, or real network behavior. * Disable the data client's built-in metrics exporter so the connector doesn't depend on metric-publishing permissions or emit telemetry the user hasn't opted into. * Capture an initial benchmark baseline so future runs have something to compare against. This was run on a single node, base "trial" instance so it may not be reflective of actual achievable performance of production workloads, but it appears at least on-par with other materializations.
Otherwise deletes leave tombstone rows that downstream readers must filter on `_meta/op`. `DeleteRow` is unconditional rather than timestamp-bounded, which is fine for exactly-once: a replayed transaction safely re-issues the delete on an already-empty row.
jacobmarble
approved these changes
May 5, 2026
Comment on lines
+10
to
+19
| # See README.md for instructions on running against a real Cloud Bigtable | ||
| # instance. | ||
|
|
||
| # acmeCo/tests/materialize-bigtable-gcp: | ||
| # endpoint: | ||
| # local: | ||
| # command: ["go", "run", "."] | ||
| # protobuf: true | ||
| # config: config.gcp.yaml | ||
| # bindings: [] |
Contributor
There was a problem hiding this comment.
Would you mind following this pattern?
connectors/materialize-iceberg/driver_test.go
Lines 159 to 168 in 2f4876c
FWIW I'm using the pattern in the EventBridge PR as well, much simpler though: #4359
Member
Author
There was a problem hiding this comment.
Cool, that is simpler! Updated.
Real-instance tests previously required hand-editing YAML to uncomment the GCP block; switch to a `BIGTABLE_TEST_ALL` env var / `-bigtable.test-all` flag matching the pattern used by materialize-iceberg and materialize-eventbridge so the full matrix is invokable from `go test` without source edits. Drop the `_hd` suffix from hard-delete bindings since the boilerplate framework already disambiguates per-task tables via UUID. Type the `CommittedTimestamp` state field as int64 to match the underlying type of `bigtable.Timestamp`.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description:
New
materialize-bigtableconnector. Materializes Flow collections to Google Cloud Bigtable as standard updates.A few interesting design choices to be aware of:
Exactly-once via MVCC
The connector stamps every cell write with a monotonically increasing counter as the cell timestamp. The column family is created with
MaxVersions(2), so a crashed transaction's "dirty" cell is preserved alongside the prior committed value. Loads request the latest 2 versions and walk newest-first, skipping any cell at the in-flight timestamp—this surfaces the last committed value even if a previous Store partially landed.Row keys are FDB-packed tuples
These are used directly as the Bigtable row key, which preserves lexicographic ordering. This enables efficient range scans with appropriate composite key structures.
Value Encoding
Everything is stored as bytes to match Bigtable's native model:
0x00/0x01ReadModifyWrite(unused here)."99999999999999999999")No delta-updates mode
There is no meaningful semantic for a key/value store; same as
materialize-dynamodb.Closes estuary/flow#2918
Workflow steps:
(How does one use this feature, and how has it changed)
Documentation links affected:
See estuary/flow#2919
Notes for reviewers:
(anything that might help someone review this PR)