Skip to content

Target naming migration#2864

Open
jshearer wants to merge 2 commits into
masterfrom
jshearer/target_naming_migration
Open

Target naming migration#2864
jshearer wants to merge 2 commits into
masterfrom
jshearer/target_naming_migration

Conversation

@jshearer
Copy link
Copy Markdown
Contributor

@jshearer jshearer commented Apr 15, 2026

This is the migration for target naming that I talk about in #2780. I decided to frame it as a new, temporary flowctl raw subcommand that analyzes existing materializations and determines the correct TargetNamingStrategy based on its current source.targetNaming, endpoint config, and built resource paths. By default it prints a dry-run report. With --execute, it publishes the changes one materialization at a time.

Migration classification

Every materialization is classified into one of:

  • MIGRATE: target_naming and per-binding x-schema-name can be set automatically without causing unintended backfilling.
  • MANUAL: Either we can determine the correct settings, but applying them would cause resource paths to change from 1-element ([table]) to 2-element ([schema, table]), or we can't determine the correct schema automatically (no endpoint config schema, no consistent schema in resource paths).
  • SKIP_NO_SCHEMA: connector doesn't support x-schema-name (no schema pointer in its resource spec).
  • SKIP_NOT_CONNECTOR: Dekaf
  • SKIP_ALREADY_SET / SKIP_DISABLED_NO_BUILT_SPEC: already migrated, or disabled with no built spec to analyze.

Strategy selection rules

TargetNamingStrategy is derived from source.targetNaming:

source.targetNaming Proposed strategy
WithSchema MatchSourceStructure
PrefixSchema PrefixTableNames { schema, skip_common_defaults: false }
PrefixNonDefaultSchema PrefixTableNames { schema, skip_common_defaults: true }
NoSchema SingleSchema { schema }
No source capture MatchSourceStructure, falling back to SingleSchema if existing bindings conflict

For strategies that require a schema value, the tool resolves it from (in order):

  • the endpoint config's schema/dataset/namespace field
  • a unanimous schema detected from existing built resource paths
  • the connector's well-known default (Snowflake: PUBLIC).

Filling x-schema-name on existing bindings

targetNaming controls how future bindings get their schema and table names. Existing bindings need x-schema-name filled in separately to match where their data actually lives. Once the control plane can guarantee that x-schema-name is populated on every binding, connectors/#3977 can delete the fallback logic, and the endpoint-config schema shrinks to its real job (system tables like flow_checkpoints_v1). The migration is the prerequisite that lets that cleanup happen without changing any existing binding's resource path.

For many existing bindings, the TargetNamingStrategy-generated schema matches the actual schema in the resource path, and x-schema-name can be simply set to that value. But for some bindings, the two diverge: a binding created before x-schema-name existed, or via a code path that didn't populate it, would have been placed in whatever schema the endpoint config specified, which may not match what the strategy would derive from the collection name.

When the customer explicitly set source.targetNaming, the tool preserves their strategy for future bindings but fills in the existing schema (from the built resource path) on existing bindings where the strategy-derived value would conflict. The report flags these as (actual; strategy would produce "..." for new bindings).

When no source.targetNaming was set and a binding's collection-derived schema doesn't match its resource path schema, the tool falls back to SingleSchema with the resolved endpoint schema. If the endpoint schema also doesn't match the resource path schemas, the task is marked as MANUAL.

Snowflake compatibility mode handling

materialize-snowflake uniquely produces 1-element resource paths ([table]) when the binding's schema matches the endpoint-config's schema, and 2-element paths ([schema, table]) otherwise. The migration tool mirrors the connector's logic to determine whether setting x-schema-name would preserve or change the resource path. When the endpoint config has no explicit schema, the tool assumes Snowflake's default of PUBLIC.

Disabled materializations

Disabled materializations with a built spec are analyzed normally. Disabled materializations without a built spec are skipped entirely, as they're old enough that re-enabling them at this point would almost certainly require a backfill anyway.

Execute mode

With --execute, the tool publishes each MIGRATE materialization individually:

  • Re-fetches the spec and verifies last_pub_id hasn't changed
  • Sets targetNaming on the materialization
  • Fills in x-schema-name on bindings that are missing it
  • Publishes via draft_specs

@jshearer jshearer force-pushed the jshearer/target_naming_migration branch 10 times, most recently from d8ae42e to 48e7f93 Compare April 16, 2026 00:53
@jshearer jshearer self-assigned this Apr 16, 2026
…ource configs

Previously, `generate_missing_materialization_configs` delegated resource config generation to the generic `stub_config` path, which always derived x-schema-name from the 2nd-to-last collection name component regardless of the materialization's configured strategy.

Now resource stubs are created via `update_materialization_resource_spec`, which populates x-schema-name and x-collection-name according to the materialization's `target_naming` and `source` settings. This means `flowctl generate` produces resource configs that match what the runtime and auto-discover would produce for the same materialization.
@jshearer jshearer force-pushed the jshearer/target_naming_migration branch from 48e7f93 to 3f88ad3 Compare April 17, 2026 04:18
Adds `flowctl raw migrate-target-naming` to analyze all materializations and determine the appropriate `TargetNamingStrategy` for each, based on the legacy `source.targetNaming` field and endpoint configuration.

For each materialization, the tool:
* Looks up x-schema-name support from `connector_tags.resource_spec_schema`
* Maps the legacy `TargetNaming` enum to the new `TargetNamingStrategy` (`MatchSourceStructure`, `SingleSchema`, `PrefixTableNames`)
* Detects the endpoint schema from connector config, falling back to the common schema across existing resource paths
* Analyzes each binding to determine whether filling in x-schema-name would change the resource path (requiring manual intervention) or target a different database schema
* Falls back from `MatchSourceStructure` to `SingleSchema` when collection names don't match existing resource path schemas
* Handles Snowflake's backwards-compat behavior where 1-element paths are preserved when the schema matches the endpoint default

The report classifies each materialization as MIGRATE (safe to auto-migrate), MANUAL (needs human intervention due to resource path changes or ambiguous schema), or various SKIP reasons. Disabled tasks with synthetic binding-N resource paths are classified as MIGRATE since they'll backfill on re-enable. Disabled materializations without a built spec are skipped entirely.
@jshearer jshearer force-pushed the jshearer/target_naming_migration branch from 3f88ad3 to 74d5565 Compare April 17, 2026 13:52
@jshearer jshearer marked this pull request as ready for review April 17, 2026 17:18
@jshearer jshearer added the waiting This change is waiting on something else label May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

waiting This change is waiting on something else

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant