-
Notifications
You must be signed in to change notification settings - Fork 34
source-sqlserver: optionally populate _meta/source/ts_ms (#3553) #4328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
8210e6a
c73e19d
30546d0
f33ef4e
d1a0511
c041b07
c502f63
dec2b65
3f39f7d
d076fab
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -25,6 +25,24 @@ This is mostly an issue for small-scale testing, as these overheads are more or | |
| less fixed and unrelated to the actual volume of changes. But this makes our | ||
| automated test suite runs take 10-20x longer than they do on other databases. | ||
|
|
||
| ### Transaction commit timestamps (`_meta/source/ts_ms`) | ||
|
|
||
| By default the connector does not populate `_meta/source/ts_ms` on CDC events, | ||
| because SQL Server CDC change tables do not contain commit timestamps inline. | ||
| Setting the advanced option `populate_source_ts_ms` causes the connector to | ||
| range-prefetch commit times from `cdc.lsn_time_mapping` over each polling | ||
| cycle's LSN range and apply them as `_meta/source/ts_ms` on emitted change | ||
| events. The lookup is one indexed range query per polling cycle, paginated | ||
| when the range is wide enough to cross the prefetch page size. Backfill rows | ||
| do not have a meaningful commit time and will not have `ts_ms` set, matching | ||
| the behavior of the Postgres and MySQL connectors. | ||
|
|
||
| The lookup is best-effort: if it fails for any reason, the rest of the | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a concrete failure scenario in mind here or is this just typical LLM "let me add fallbacks everywhere just in case" behavior?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No concrete scenario; the fallback is gone. A |
||
| polling cycle is emitted with `ts_ms` left unset and a warning is logged. The | ||
| standard `GRANT SELECT ON SCHEMA :: cdc TO flow_capture` already grants the | ||
| necessary permissions on `cdc.lsn_time_mapping`, so no additional setup is | ||
| required. | ||
|
|
||
| ### Developing | ||
|
|
||
| Some useful commands for working with a test instance of SQL Server: | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -97,6 +97,7 @@ type advancedConfig struct { | |
| Filegroup string `json:"filegroup,omitempty" jsonschema:"title=CDC Instance Filegroup,description=When set the connector will create new CDC instances with the specified 'filegroup_name' argument. Has no effect if CDC instances are managed manually."` | ||
| RoleName string `json:"role_name,omitempty" jsonschema:"title=CDC Instance Access Role,description=When set the connector will create new CDC instances with the specified 'role_name' argument as the gating role. When unset the capture user name is used as the 'role_name' instead. Has no effect if CDC instances are managed manually."` | ||
| SourceTag string `json:"source_tag,omitempty" jsonschema:"title=Source Tag,description=When set the capture will add this value as the property 'tag' in the source metadata of each document."` | ||
| PopulateSourceTsMs bool `json:"populate_source_ts_ms,omitempty" jsonschema:"title=Populate Source ts_ms,default=false,description=When set the connector will populate the '_meta/source/ts_ms' property of CDC events with the transaction commit time. The commit time is looked up from cdc.lsn_time_mapping after each polling cycle. This adds one extra query per change batch and is therefore opt-in. Backfill rows do not have a meaningful commit time so ts_ms will not be set on them."` | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would change the description to "Populate Source Timestamps", it feels weird to put the
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Renamed to Populate Source Timestamps; the description no longer references the |
||
| FeatureFlags string `json:"feature_flags,omitempty" jsonschema:"title=Feature Flags,description=This property is intended for Estuary internal use. You should only modify this field as directed by Estuary support."` | ||
| WatermarksTable string `json:"watermarksTable,omitempty" jsonschema:"default=dbo.flow_watermarks,description=This property is deprecated for new captures as they will no longer use watermark writes by default. The name of the table used for watermark writes during backfills. Must be fully-qualified in '<schema>.<table>' form."` | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test does not belong here. The tests in this file / package are shared between
source-sqlserverandsource-sqlserver-ct, but this test cannot possibly work againstsource-sqlserver-ct.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to
source-sqlserver/capture_test.goasTestPopulateSourceTimestamps. Snapshot atsource-sqlserver/.snapshots/TestPopulateSourceTimestamps.