Skip to content

source-sqlserver: optionally populate _meta/source/ts_ms (#3553)#4328

Open
jameswinegar wants to merge 10 commits into
estuary:mainfrom
jameswinegar:feat/source-sql-server_populate-source-ts-ms
Open

source-sqlserver: optionally populate _meta/source/ts_ms (#3553)#4328
jameswinegar wants to merge 10 commits into
estuary:mainfrom
jameswinegar:feat/source-sql-server_populate-source-ts-ms

Conversation

@jameswinegar
Copy link
Copy Markdown
Contributor

@jameswinegar jameswinegar commented Apr 28, 2026

Closes #3553.

Adds an advanced capture option populate_source_ts_ms (default false) which causes the SQL Server connector to populate _meta/source/ts_ms on CDC events with the transaction commit time, prefetched once per polling cycle from cdc.lsn_time_mapping over the cycle's LSN range. This brings parity with source-postgres and source-mysql, which already populate ts_ms for CDC events.

Design

When the flag is enabled, each pollTable invocation runs an indexed range query against cdc.lsn_time_mapping over the polling cycle's (FromLSN, ToLSN] window:

SELECT TOP (@pageSize) start_lsn, tran_end_time
  FROM cdc.lsn_time_mapping
 WHERE start_lsn > @fromLSN AND start_lsn <= @toLSN
 ORDER BY start_lsn

The result is a map[string]time.Time consulted in O(1) per change row to set _meta/source/ts_ms as events stream through the existing scan-and-emit loop.

  • One query per polling cycle (in the common case). At extreme scale or after extended downtime, the prefetch is automatically paginated by tranEndTimesPageSize (default 100,000 rows) — each page does its own bounded prefetch and TVF, so memory stays bounded regardless of catch-up size.
  • Streaming-emit semantics preserved. With the flag on, change events still stream through rs.events one at a time as they're scanned. No buffering, no flush, no clear() dance.
  • Capture-level advanced option, default false. JSON-schema annotation drives the UI. Existing captures are unaffected — the flag-off code path is byte-identical to before.
  • Best effort. Lookup failures emit the rest of the polling cycle with ts_ms unset and log a warning. Missing LSNs (possible at retention boundaries) and NULL tran_end_time (possible for control-row entries) leave individual events with ts_ms unset rather than failing the cycle.
  • Backfill rows leave ts_ms unset (Millis: 0, // Not known.), matching the convention in source-postgres and source-mysql.
  • No permission changes. GRANT SELECT ON SCHEMA :: cdc TO flow_capture (already required) covers cdc.lsn_time_mapping.

Why range pre-fetch over per-batch IN-list

An earlier iteration of this PR used a per-batch WHERE start_lsn IN (?, ?, ...) lookup with chunked-at-1000 buffering. That design works fine for moderate-scale captures but doesn't scale to the kind of high-throughput SQL Server workloads we want to migrate from Debezium: the per-LSN cost stayed constant (~12.5 μs/LSN), so an enterprise workload with 50k LSNs per polling cycle paid ~625 ms of synchronous lookup time per cycle.

The range-prefetch design pulls all commit times for the cycle in a single indexed range scan that amortizes its query overhead across however many LSNs are in the range. Per-LSN cost drops from ~12.5 μs to ~0.66 μs at high scale (≈18-20× faster) and the implementation is simpler — no event buffering, no flush threshold, no dedup map, no chunking loop.

Performance

Microbenchmark — feature off vs feature on

Measured locally against Azure SQL Edge (linux/arm64-native, Apple M4 Max). The FeatureOff sub-bench iterates the same buffered events without a lookup; the delta is the per-pollTable cost the feature adds.

BenchmarkLookupTranEndTimes/FeatureOff/N=100-16     100178271     24.18 ns/op       0 B/op       0 allocs/op
BenchmarkLookupTranEndTimes/FeatureOn/N=100-16           4000    517697 ns/op    64108 B/op    1139 allocs/op
BenchmarkLookupTranEndTimes/FeatureOff/N=1000-16      4577856     515.9 ns/op       0 B/op       0 allocs/op
BenchmarkLookupTranEndTimes/FeatureOn/N=1000-16          2092   1099840 ns/op   648020 B/op   10157 allocs/op
BenchmarkLookupTranEndTimes/FeatureOff/N=5000-16       598290      4020 ns/op       0 B/op       0 allocs/op
BenchmarkLookupTranEndTimes/FeatureOn/N=5000-16           585   3599261 ns/op  3031942 B/op   50209 allocs/op
BenchmarkLookupTranEndTimes/FeatureOff/N=25000-16       91695     25666 ns/op       0 B/op       0 allocs/op
BenchmarkLookupTranEndTimes/FeatureOn/N=25000-16          142  17703543 ns/op 14358458 B/op  250428 allocs/op
BenchmarkLookupTranEndTimes/FeatureOff/N=50000-16       53343     46660 ns/op       0 B/op       0 allocs/op
BenchmarkLookupTranEndTimes/FeatureOn/N=50000-16           75  33175786 ns/op 28711142 B/op  500694 allocs/op
Distinct LSNs Feature off Feature on Δ (per-cycle feature cost) Per-LSN
100 24 ns 0.52 ms +0.52 ms 5.2 μs
1,000 516 ns 1.10 ms +1.10 ms 1.10 μs
5,000 4.0 μs 3.60 ms +3.60 ms 0.72 μs
25,000 25.7 μs 17.7 ms +17.7 ms 0.71 μs
50,000 46.7 μs 33.2 ms +33.2 ms 0.66 μs

What this shows:

  • Feature-off cost is zero. Nanoseconds-per-N and zero allocations — the existing streaming-emit path is unchanged for captures that don't enable the option.
  • Per-LSN cost falls with scale. Fixed query overhead amortizes — at 50k LSNs the per-LSN cost is 8× lower than at 100 LSNs. This is the inverse of an IN-list approach, where per-LSN cost is roughly constant.
  • Single page covers the common case. tranEndTimesPageSize = 100,000; 50k LSNs (a high-throughput cycle) fits in one page = one query.
  • Memory bound. ~30 MiB worst case per active page per worker, ~120 MiB across the 4 concurrent pollTable workers. Tunable via tranEndTimesPageSize.

Comparison with the earlier IN-list iteration

Distinct LSNs IN-list (earlier rev) Range prefetch Speedup
100 1.12 ms 0.52 ms 2.2×
1,000 12.5 ms 1.10 ms 11×
4,096 55.2 ms ~3 ms (interpolated) ~18×

Realistic workload sizing

Cost scales with distinct LSNs, not row count. Multi-row transactions (INSERT ... VALUES (...), (...), (...), bulk operations) share one start_lsn, so high-write-amplification workloads dedup heavily and see proportionally less overhead.

Order-of-magnitude estimates (workloads vary; not from production data):

Workload Txns/sec Distinct LSNs / 500 ms cycle Per-cycle cost Aggregate overhead
Light OLTP / SaaS app 10–100 5–50 ~0.5 ms negligible
Typical mid-size app 100–1,000 50–500 0.5–1 ms ~0.1–0.2% of a core
Busy e-commerce / financial 1,000–10,000 500–5,000 1–4 ms ~0.2–0.8% of a core
Peak / enterprise 10,000–100,000 5,000–50,000 4–33 ms ~0.8–6.6% of a core

(Costs assume the per-LSN curve from the benchmark above. Real Linux x86_64 SQL Server is likely 2–5× faster than the SQL Edge ARM64 numbers shown.)

polling_interval (advanced option, default 500 ms) is tunable but does not reduce this feature's CPU cost — only the dedup ratio does. At 100k tps with a 5 s polling interval, a single page (100k rows) would be needed every cycle.

Catch-up scenarios

A multi-hour catch-up means a wide LSN range that exceeds one page. The connector iterates lookupTranEndTimesPage returning pageEnd set to the last row's start_lsn whenever the page hits its row cap, then issues a TVF over the same sub-range. Each iteration: one prefetch query, one TVF query, bounded memory.

A hypothetical 10M-LSN backlog dispatches ~100 prefetch+TVF iterations at ~33 ms prefetch each = ~3.3 s of total added time across the catch-up. The change-row processing dominates by orders of magnitude.

Testing — what was actually run

This is a starting point for validation, not exhaustive. Estuary CI should exercise the full capture pipeline against a SQL Server with the SQL Agent running.

go build ./... and go vet ./... clean for both source-sqlserver and go/capture/sqlserver/tests.

Verbatim Go test output (against Azure SQL Edge, linux/arm64-native)

=== RUN   TestSpec
    main_test.go:206: opening control connection: addr="127.0.0.1:1433", user="sa"
--- PASS: TestSpec (1.70s)
=== RUN   TestLookupTranEndTimesPageIntegration
    replication_test.go:124: validated 4 rows across 3 transactions: id1/2 share ts_ms=1777419608800, id3 ts_ms=1777419609053, id4 ts_ms=1777419609310
--- PASS: TestLookupTranEndTimesPageIntegration (0.80s)
=== RUN   TestLookupTranEndTimesPageEmpty
--- PASS: TestLookupTranEndTimesPageEmpty (0.01s)
=== RUN   TestLookupTranEndTimesPagePagination
    replication_test.go:256: walked 1102 distinct LSNs across 3 pages of size 500
--- PASS: TestLookupTranEndTimesPagePagination (0.44s)
PASS
ok    github.com/estuary/connectors/source-sqlserver  3.606s

What each test proves:

  • TestSpec — JSON config schema is generated correctly with the new populate_source_ts_ms advanced field.
  • TestLookupTranEndTimesPageIntegration — runs three transactions (one with two rows, two singletons), captures the FromLSN/ToLSN bounds via sys.fn_cdc_get_max_lsn(), forces sys.sp_cdc_scan, calls lookupTranEndTimesPage, verifies the entire range fits in one page (pageEnd == toLSN), multi-row transaction equality, distinct/non-decreasing across transactions, and byte-for-byte agreement with a direct SELECT tran_end_time FROM cdc.lsn_time_mapping WHERE start_lsn = ? query.
  • TestLookupTranEndTimesPageEmpty — zero-width range returns a non-nil empty map and the requested toLSN as pageEnd.
  • TestLookupTranEndTimesPagePagination — inserts 1,100 single-row transactions in a server-side loop, walks the LSN range with pageSize=500 (forces ≥ 2 pages), asserts every distinct LSN is consumed exactly once with no gaps or duplicates and that non-final pages are exactly full.

Independent SQL probe against the same database

Issued the connector's exact range query directly through sqlcmd to confirm the underlying assumptions about cdc.lsn_time_mapping:

-- multi-row INSERT (1,'a'),(2,'b'); singletons for ids 3 and 4:
lsn_hex                op  id  val
---------------------- --- --- ---
0x0000002B00002580001B   2   1   a   <- id 1 and id 2 share start_lsn
0x0000002B00002580001B   2   2   b   <- (single transaction)
0x0000002B000025880003   2   3   c
0x0000002B000025900003   2   4   d

-- the connector's range query: SELECT start_lsn, tran_end_time
-- FROM cdc.lsn_time_mapping WHERE start_lsn > @fromLSN AND start_lsn <= @toLSN:
start_lsn              tran_end_time
---------------------- -----------------------
0x0000002B00002580001B 2026-04-28 23:08:20.090
0x0000002B000025880003 2026-04-28 23:08:20.353
0x0000002B000025900003 2026-04-28 23:08:20.617

This independently confirms: a multi-row INSERT produces a single __$start_lsn shared by both rows; three separate commits produce three distinct start_lsn values with monotonically non-decreasing tran_end_time; the connector's range query syntax/semantics match the table schema.

Not yet validated locally

The full snapshot test testPopulateSourceTsMs (in go/capture/sqlserver/tests/capture.go) which routes through flowctl preview → connector binary → SQL Server. Azure SQL Edge has no SQL Server Agent so change tables aren't auto-populated; the GODEBUG=x509negativeserial=1 flag needed for SQL Edge's older cert doesn't propagate to the connector binary spawned by flowctl. This test should run cleanly on Estuary CI which uses real SQL Server, and its snapshot file (source-sqlserver/.snapshots/TestCapture-PopulateSourceTsMs) will be auto-generated by the standard UPDATE_SNAPSHOTS=true test invocation.

Files changed

  • source-sqlserver/main.goPopulateSourceTsMs advanced option
  • source-sqlserver/replication.gopollTable drives pagination, pollTableRange is the scan-and-emit loop, lookupTranEndTimesPage is the bounded indexed range query
  • source-sqlserver/backfill.go — explicit Millis: 0 // Not known. matching Postgres/MySQL convention
  • source-sqlserver/.snapshots/TestSpec — updated config schema snapshot
  • source-sqlserver/README.md — new section explaining the option
  • source-sqlserver/replication_test.go (new) — TestLookupTranEndTimesPageIntegration, TestLookupTranEndTimesPageEmpty, TestLookupTranEndTimesPagePagination, BenchmarkLookupTranEndTimes
  • go/capture/sqlserver/tests/capture.go — new testPopulateSourceTsMs snapshot + programmatic test for the full pipeline

Test plan

  • Estuary CI runs the full go test ./source-sqlserver/... against the docker-compose SQL Server (this exercises testPopulateSourceTsMs end-to-end and auto-generates the missing snapshot)
  • Re-run BenchmarkLookupTranEndTimes on Estuary CI hardware against real SQL Server for an apples-to-apples baseline
  • Manual smoke against a staging environment with a high-throughput captured database: enable the option, confirm _meta/source/ts_ms appears on CDC events and is absent on backfill events, observe per-cycle lookup latency and memory usage
  • Watch for failed to prefetch transaction commit times warnings in production logs after rollout

@jameswinegar jameswinegar force-pushed the feat/source-sql-server_populate-source-ts-ms branch 4 times, most recently from b0f80a0 to 9450be8 Compare April 28, 2026 23:42
Add an advanced capture option `populate_source_ts_ms` (default false) which
causes CDC events to carry the transaction commit time as
`_meta/source/ts_ms`. The commit time is looked up from
`cdc.lsn_time_mapping` after each polling cycle, deduplicated by LSN, and
chunked at 1000 LSNs per query to stay under SQL Server's 2100-parameter
limit. The lookup is best-effort: failures emit the batch with ts_ms unset
and log a warning. Buffered events are flushed in 4096-event sub-batches
to bound memory during catch-up scans.

Backfill rows leave ts_ms unset (`Millis: 0, // Not known.`), matching the
behavior of source-postgres and source-mysql.

Closes estuary#3553.
@jameswinegar jameswinegar force-pushed the feat/source-sql-server_populate-source-ts-ms branch from 9450be8 to 8210e6a Compare April 28, 2026 23:55
@jameswinegar
Copy link
Copy Markdown
Contributor Author

jameswinegar commented Apr 29, 2026

Follow-up to consider: TVF + LEFT JOIN as a single query

I investigated whether replacing the prefetch + TVF pair with a single JOINed query would be a worthwhile optimization. Capturing the analysis here so it's not lost.

SELECT t.[__$start_lsn], m.tran_end_time
  FROM cdc.fn_cdc_get_all_changes_X(@p1, @p2, N'all') AS t
  LEFT JOIN cdc.lsn_time_mapping AS m ON t.[__$start_lsn] = m.start_lsn

I verified the syntax works correctly via raw sqlcmd against Azure SQL Edge — multi-row transactions correctly share their tran_end_time, distinct transactions get distinct timestamps. Memory is bounded the same way (streaming rows + 8 bytes per row for the extra column).

Potential wins

  • Single query per polling cycle (vs. prefetch + TVF), one fewer network round-trip per page
  • No client-side LSN map (~30 MB at 100k rows in the current design → 0)
  • Eliminates the pagination loop entirely — no lookupTranEndTimesPage, no tranEndTimesPageSize, no page bookkeeping. Smaller surface area than this PR's current state.

Risks worth measuring before adopting

  • The TVF is a multi-statement table-valued function. SQL Server's optimizer treats those conservatively and may pick a hash join that materializes one input, regressing the existing capture hot path. The original spec for this PR explicitly warned against this for the IN-list iteration; the warning is less applicable now that we're already doing range queries, but the optimizer concern remains.
  • Failure mode is worse: with prefetch, a cdc.lsn_time_mapping permission issue, transient error, or NULL handling problem degrades cleanly to "emit batch without ts_ms" (the existing best-effort path). With a JOIN, a query failure interrupts the whole change-row read, which is much harder to recover from in the polling loop.

Couldn't benchmark locally
I tried to A/B the two strategies but hit a go-mssqldb driver issue against Azure SQL Edge: after ~1000+ Exec calls on a connection, subsequent 2-arg QueryContext against the TVF fails with "insufficient number of arguments" even though the SQL is identical to one that works at low volume. Couldn't isolate whether that's SQL-Edge-specific, qemu-specific, or a real driver bug. The current PR code path doesn't trigger it (verified with a 100-row smoke test), but it blocked me from getting a clean A/B comparison on this machine.

Comment thread source-sqlserver/main.go Outdated
Filegroup string `json:"filegroup,omitempty" jsonschema:"title=CDC Instance Filegroup,description=When set the connector will create new CDC instances with the specified 'filegroup_name' argument. Has no effect if CDC instances are managed manually."`
RoleName string `json:"role_name,omitempty" jsonschema:"title=CDC Instance Access Role,description=When set the connector will create new CDC instances with the specified 'role_name' argument as the gating role. When unset the capture user name is used as the 'role_name' instead. Has no effect if CDC instances are managed manually."`
SourceTag string `json:"source_tag,omitempty" jsonschema:"title=Source Tag,description=When set the capture will add this value as the property 'tag' in the source metadata of each document."`
PopulateSourceTsMs bool `json:"populate_source_ts_ms,omitempty" jsonschema:"title=Populate Source ts_ms,default=false,description=When set the connector will populate the '_meta/source/ts_ms' property of CDC events with the transaction commit time. The commit time is looked up from cdc.lsn_time_mapping after each polling cycle. This adds one extra query per change batch and is therefore opt-in. Backfill rows do not have a meaningful commit time so ts_ms will not be set on them."`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change the description to "Populate Source Timestamps", it feels weird to put the ts_ms field name into the human-readable field description which will render in the UI.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to Populate Source Timestamps; the description no longer references the ts_ms field name.

Comment thread go/capture/sqlserver/tests/capture.go Outdated
// testPopulateSourceTsMs verifies the populate_source_ts_ms advanced option:
// CDC events carry transaction commit time as _meta/source/ts_ms; backfill
// rows and flag-off captures do not.
func testPopulateSourceTsMs(t *testing.T, setup testSetupFunc) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test does not belong here. The tests in this file / package are shared between source-sqlserver and source-sqlserver-ct, but this test cannot possibly work against source-sqlserver-ct.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to source-sqlserver/capture_test.go as TestPopulateSourceTimestamps. Snapshot at source-sqlserver/.snapshots/TestPopulateSourceTimestamps.

Comment thread source-sqlserver/replication_test.go Outdated
// forces sys.sp_cdc_scan so the test works without the SQL Server Agent,
// and verifies that lookupTranEndTimesPage returns the same
// start_lsn -> tran_end_time mapping that's in cdc.lsn_time_mapping.
func TestLookupTranEndTimesPageIntegration(t *testing.T) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test appears to be mostly validating SQL Server CDC behavior, and to the extent that it's actually exercising connector functionality it appears to be checking a trivial property that amounts to "is the data we read from this table the same as the data we read from this table?"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted.

Comment thread source-sqlserver/replication_test.go Outdated
}

var ctx = context.Background()
var controlURI = (&Config{
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole chunk of connection-opening code is boilerplate which should be factored out into a helper function. I would probably either use the existing blackboxTestSetup() helper (which does some other setup work which isn't strictly needed here but which is benign), or factor out just the "open control connection" logic from that into another named helper function and use it here. Either way, half of the line count of these tests should not be devoted to boilerplate connection opening code.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole file is gone. TestPopulateSourceTimestamps (capture_test.go) uses blackboxTestSetup().

Comment thread source-sqlserver/replication_test.go Outdated
require.NoErrorf(t, err, "executing %s", query)
}

runSQL(fmt.Sprintf(`IF EXISTS (SELECT 1 FROM cdc.change_tables WHERE capture_instance = N'%s') EXEC sys.sp_cdc_disable_table @source_schema = 'dbo', @source_name = 'tsms_page_%s', @capture_instance = N'%s'`, captureInstance, probeID, captureInstance))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This table creation/cleanup code is duplicating test helpers we already have.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole file is gone. Coverage is TestPopulateSourceTimestamps which uses db.CreateTable().

Comment thread source-sqlserver/replication_test.go Outdated

runSQL(fmt.Sprintf(`DECLARE @i INT = 0; WHILE @i < %d BEGIN INSERT INTO %s (id) VALUES (@i); SET @i = @i + 1; END`, numTransactions, tableName))

if _, err := conn.ExecContext(ctx, "EXEC sys.sp_cdc_scan @maxtrans = 5000, @maxscans = 10"); err != nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not entirely clear to me why extra effort was put into running this against SQL Server Edge when our normal test DB setup defined in source-sqlserver/docker-compose.yaml runs a perfectly normal SQL Server instance which doesn't need these accommodations.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The standard docker-compose.yaml is what runs in TestPopulateSourceTimestamps; no Edge-specific code remains. The original docker-compose failure on my end was that I had Docker Desktop set to "Docker VMM" instead of Apple Virtualization Framework — switching to VZ + Rosetta resolved it and SQL Server 2022 starts cleanly.

Comment thread source-sqlserver/replication_test.go Outdated
require.NoError(t, rows.Close())
require.Equalf(t, numTransactions, len(expectedLSNs), "expected %d distinct LSNs in change table; got %d", numTransactions, len(expectedLSNs))

// Walk the range in pages exactly the way pollTable does.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing a bunch of extra logic to "do something the same way as the real implementation" is a sign that something is being done wrong here. If it's worth having a test for, it's worth finding a way to factor out the real logic so it can be invoked from the test.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole file is gone. Coverage is TestPopulateSourceTimestamps. Pagination boundaries are not directly exercised by the integration test workload — if that's a concern, the right shape is making tranEndTimesPageSize a test-tunable var (like cdcCleanupInterval) and lowering it inside the integration test. Happy to add that if you'd like.

Comment thread source-sqlserver/replication_test.go Outdated
//
// GODEBUG=x509negativeserial=1 TEST_DATABASE=yes \
// go test -run='^$' -bench=BenchmarkLookupTranEndTimes -benchmem ./source-sqlserver/...
func BenchmarkLookupTranEndTimes(b *testing.B) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While performance impact is a concern with this feature, the performance of doing the map lookups was never the issue and that's basically all this is measuring. The actual concerns are things like "how will an extra database RTT impact overall polling time in production deployments where the connector and database aren't running on the same machine" and "is it worth trying to cache the LSN time mapping information across all tables within a single polling cycle" and this benchmark says nothing useful about any of that. Recommend deleting.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted.

Comment thread source-sqlserver/README.md Outdated
do not have a meaningful commit time and will not have `ts_ms` set, matching
the behavior of the Postgres and MySQL connectors.

The lookup is best-effort: if it fails for any reason, the rest of the
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a concrete failure scenario in mind here or is this just typical LLM "let me add fallbacks everywhere just in case" behavior?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No concrete scenario; the fallback is gone. A cdc.lsn_time_mapping query failure now propagates from pollTable like any other transient SQL error.

Per review feedback: don't put the field name 'ts_ms' into the
human-readable UI label and description.
…package

Per review feedback: the test in go/capture/sqlserver/tests/capture.go is
shared between source-sqlserver and source-sqlserver-ct, but
populate_source_ts_ms is a source-sqlserver-only config field. Move
TestPopulateSourceTimestamps and the extractTsMsValues helper into the
source-sqlserver package where they belong.
Per review feedback: TestLookupTranEndTimesPageIntegration was mostly
exercising SQL Server CDC behavior (multi-row transactions share LSNs,
distinct transactions get distinct commit times) rather than connector
logic, and where it did touch connector code it tautologically asserted
that the data we read from cdc.lsn_time_mapping matches the data we read
from cdc.lsn_time_mapping. Deleted.
Per review feedback: BenchmarkLookupTranEndTimes only measured the
in-process map lookup performance, which was never the actual concern.
The real questions for this feature are network round-trip impact in
production deployments and whether per-polling-cycle caching across
tables would be worthwhile, none of which this benchmark could speak to.
Per review feedback:
- Use blackboxTestSetup() for connection setup instead of duplicating the
  control-connection boilerplate.
- Use db.CreateTable() / db.QuietExec() for table create/cleanup instead
  of duplicating sp_cdc_enable_table / sp_cdc_disable_table by hand.
- Drop sys.sp_cdc_scan and other Azure-SQL-Edge-specific accommodations;
  the standard test docker-compose runs a real SQL Server instance with
  the Agent enabled.
- Replace the multi-page pagination walk (which mirrored pollTable's
  loop) with a single-call boundary test that asserts the function's
  pagination contract directly: a full page returns exactly pageSize
  entries with pageEnd advanced to the last entry's start_lsn; a
  non-full page returns all entries with pageEnd == toLSN.
Per review feedback: there was no concrete failure scenario the fallback
was defending against. A cdc.lsn_time_mapping query failure is no
different from any other transient SQL error in pollTable - propagate it
and let the connector's outer retry loop handle it. Removes the swallow-
the-error log path and the conditional re-call into pollTableRange.
Followup to review feedback: deleted replication_test.go. The empty-range
test exercised a trivial branch and the pagination boundary test, while
narrowly useful, was stylistically inconsistent with the rest of the
connector test suite (everything else is blackbox integration via
flowctl). Feature coverage lives in TestPopulateSourceTimestamps in
capture_test.go.
Generated by running the test against the standard docker-compose SQL
Server 2022 with the SQL Server Agent enabled. Also sanitize ts_ms to a
zero placeholder rather than the literal string 'REDACTED' so the
emitted JSON stays valid and the snapshot pretty-prints consistently
(numeric field stays numeric).
Restores the file to match origin/main; the trailing blank line was
introduced incidentally when removing testPopulateSourceTsMs.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

source-sqlserver: Optionally populate /_meta/source/ts_ms

2 participants