Target naming migration#2864
Open
jshearer wants to merge 2 commits into
Open
Conversation
d8ae42e to
48e7f93
Compare
…ource configs Previously, `generate_missing_materialization_configs` delegated resource config generation to the generic `stub_config` path, which always derived x-schema-name from the 2nd-to-last collection name component regardless of the materialization's configured strategy. Now resource stubs are created via `update_materialization_resource_spec`, which populates x-schema-name and x-collection-name according to the materialization's `target_naming` and `source` settings. This means `flowctl generate` produces resource configs that match what the runtime and auto-discover would produce for the same materialization.
48e7f93 to
3f88ad3
Compare
Adds `flowctl raw migrate-target-naming` to analyze all materializations and determine the appropriate `TargetNamingStrategy` for each, based on the legacy `source.targetNaming` field and endpoint configuration. For each materialization, the tool: * Looks up x-schema-name support from `connector_tags.resource_spec_schema` * Maps the legacy `TargetNaming` enum to the new `TargetNamingStrategy` (`MatchSourceStructure`, `SingleSchema`, `PrefixTableNames`) * Detects the endpoint schema from connector config, falling back to the common schema across existing resource paths * Analyzes each binding to determine whether filling in x-schema-name would change the resource path (requiring manual intervention) or target a different database schema * Falls back from `MatchSourceStructure` to `SingleSchema` when collection names don't match existing resource path schemas * Handles Snowflake's backwards-compat behavior where 1-element paths are preserved when the schema matches the endpoint default The report classifies each materialization as MIGRATE (safe to auto-migrate), MANUAL (needs human intervention due to resource path changes or ambiguous schema), or various SKIP reasons. Disabled tasks with synthetic binding-N resource paths are classified as MIGRATE since they'll backfill on re-enable. Disabled materializations without a built spec are skipped entirely.
3f88ad3 to
74d5565
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is the migration for target naming that I talk about in #2780. I decided to frame it as a new, temporary
flowctl rawsubcommand that analyzes existing materializations and determines the correctTargetNamingStrategybased on its currentsource.targetNaming, endpoint config, and built resource paths. By default it prints a dry-run report. With--execute, it publishes the changes one materialization at a time.Migration classification
Every materialization is classified into one of:
target_namingand per-bindingx-schema-namecan be set automatically without causing unintended backfilling.[table]) to 2-element ([schema, table]), or we can't determine the correct schema automatically (no endpoint config schema, no consistent schema in resource paths).x-schema-name(no schema pointer in its resource spec).Strategy selection rules
TargetNamingStrategyis derived fromsource.targetNaming:source.targetNamingWithSchemaMatchSourceStructurePrefixSchemaPrefixTableNames { schema, skip_common_defaults: false }PrefixNonDefaultSchemaPrefixTableNames { schema, skip_common_defaults: true }NoSchemaSingleSchema { schema }MatchSourceStructure, falling back toSingleSchemaif existing bindings conflictFor strategies that require a schema value, the tool resolves it from (in order):
PUBLIC).Filling x-schema-name on existing bindings
targetNamingcontrols how future bindings get their schema and table names. Existing bindings needx-schema-namefilled in separately to match where their data actually lives. Once the control plane can guarantee thatx-schema-nameis populated on every binding, connectors/#3977 can delete the fallback logic, and the endpoint-configschemashrinks to its real job (system tables likeflow_checkpoints_v1). The migration is the prerequisite that lets that cleanup happen without changing any existing binding's resource path.For many existing bindings, the
TargetNamingStrategy-generated schema matches the actual schema in the resource path, and x-schema-name can be simply set to that value. But for some bindings, the two diverge: a binding created beforex-schema-nameexisted, or via a code path that didn't populate it, would have been placed in whatever schema the endpoint config specified, which may not match what the strategy would derive from the collection name.When the customer explicitly set
source.targetNaming, the tool preserves their strategy for future bindings but fills in the existing schema (from the built resource path) on existing bindings where the strategy-derived value would conflict. The report flags these as(actual; strategy would produce "..." for new bindings).When no
source.targetNamingwas set and a binding's collection-derived schema doesn't match its resource path schema, the tool falls back toSingleSchemawith the resolved endpoint schema. If the endpoint schema also doesn't match the resource path schemas, the task is marked asMANUAL.Snowflake compatibility mode handling
materialize-snowflakeuniquely produces 1-element resource paths ([table]) when the binding's schema matches the endpoint-config's schema, and 2-element paths ([schema, table]) otherwise. The migration tool mirrors the connector's logic to determine whether settingx-schema-namewould preserve or change the resource path. When the endpoint config has no explicit schema, the tool assumes Snowflake's default ofPUBLIC.Disabled materializations
Disabled materializations with a built spec are analyzed normally. Disabled materializations without a built spec are skipped entirely, as they're old enough that re-enabling them at this point would almost certainly require a backfill anyway.
Execute mode
With
--execute, the tool publishes eachMIGRATEmaterialization individually:last_pub_idhasn't changedtargetNamingon the materializationx-schema-nameon bindings that are missing itdraft_specs