source-sqlserver: optionally populate _meta/source/ts_ms (#3553) by jameswinegar · Pull Request #4328 · estuary/connectors

jameswinegar · 2026-04-28T22:40:15Z

Closes #3553.

Adds an advanced capture option populate_source_ts_ms (default false) which causes the SQL Server connector to populate _meta/source/ts_ms on CDC events with the transaction commit time, prefetched once per polling cycle from cdc.lsn_time_mapping over the cycle's LSN range. This brings parity with source-postgres and source-mysql, which already populate ts_ms for CDC events.

Design

When the flag is enabled, each pollTable invocation runs an indexed range query against cdc.lsn_time_mapping over the polling cycle's (FromLSN, ToLSN] window:

SELECT TOP (@pageSize) start_lsn, tran_end_time
  FROM cdc.lsn_time_mapping
 WHERE start_lsn > @fromLSN AND start_lsn <= @toLSN
 ORDER BY start_lsn

The result is a map[string]time.Time consulted in O(1) per change row to set _meta/source/ts_ms as events stream through the existing scan-and-emit loop.

One query per polling cycle (in the common case). At extreme scale or after extended downtime, the prefetch is automatically paginated by tranEndTimesPageSize (default 100,000 rows) — each page does its own bounded prefetch and TVF, so memory stays bounded regardless of catch-up size.
Streaming-emit semantics preserved. With the flag on, change events still stream through rs.events one at a time as they're scanned. No buffering, no flush, no clear() dance.
Capture-level advanced option, default false. JSON-schema annotation drives the UI. Existing captures are unaffected — the flag-off code path is byte-identical to before.
Best effort. Lookup failures emit the rest of the polling cycle with ts_ms unset and log a warning. Missing LSNs (possible at retention boundaries) and NULL tran_end_time (possible for control-row entries) leave individual events with ts_ms unset rather than failing the cycle.
Backfill rows leave ts_ms unset (Millis: 0, // Not known.), matching the convention in source-postgres and source-mysql.
No permission changes. GRANT SELECT ON SCHEMA :: cdc TO flow_capture (already required) covers cdc.lsn_time_mapping.

Why range pre-fetch over per-batch IN-list

An earlier iteration of this PR used a per-batch WHERE start_lsn IN (?, ?, ...) lookup with chunked-at-1000 buffering. That design works fine for moderate-scale captures but doesn't scale to the kind of high-throughput SQL Server workloads we want to migrate from Debezium: the per-LSN cost stayed constant (~12.5 μs/LSN), so an enterprise workload with 50k LSNs per polling cycle paid ~625 ms of synchronous lookup time per cycle.

The range-prefetch design pulls all commit times for the cycle in a single indexed range scan that amortizes its query overhead across however many LSNs are in the range. Per-LSN cost drops from ~12.5 μs to ~0.66 μs at high scale (≈18-20× faster) and the implementation is simpler — no event buffering, no flush threshold, no dedup map, no chunking loop.

Performance

Microbenchmark — feature off vs feature on

Measured locally against Azure SQL Edge (linux/arm64-native, Apple M4 Max). The FeatureOff sub-bench iterates the same buffered events without a lookup; the delta is the per-pollTable cost the feature adds.

BenchmarkLookupTranEndTimes/FeatureOff/N=100-16     100178271     24.18 ns/op       0 B/op       0 allocs/op
BenchmarkLookupTranEndTimes/FeatureOn/N=100-16           4000    517697 ns/op    64108 B/op    1139 allocs/op
BenchmarkLookupTranEndTimes/FeatureOff/N=1000-16      4577856     515.9 ns/op       0 B/op       0 allocs/op
BenchmarkLookupTranEndTimes/FeatureOn/N=1000-16          2092   1099840 ns/op   648020 B/op   10157 allocs/op
BenchmarkLookupTranEndTimes/FeatureOff/N=5000-16       598290      4020 ns/op       0 B/op       0 allocs/op
BenchmarkLookupTranEndTimes/FeatureOn/N=5000-16           585   3599261 ns/op  3031942 B/op   50209 allocs/op
BenchmarkLookupTranEndTimes/FeatureOff/N=25000-16       91695     25666 ns/op       0 B/op       0 allocs/op
BenchmarkLookupTranEndTimes/FeatureOn/N=25000-16          142  17703543 ns/op 14358458 B/op  250428 allocs/op
BenchmarkLookupTranEndTimes/FeatureOff/N=50000-16       53343     46660 ns/op       0 B/op       0 allocs/op
BenchmarkLookupTranEndTimes/FeatureOn/N=50000-16           75  33175786 ns/op 28711142 B/op  500694 allocs/op

Distinct LSNs	Feature off	Feature on	Δ (per-cycle feature cost)	Per-LSN
100	24 ns	0.52 ms	+0.52 ms	5.2 μs
1,000	516 ns	1.10 ms	+1.10 ms	1.10 μs
5,000	4.0 μs	3.60 ms	+3.60 ms	0.72 μs
25,000	25.7 μs	17.7 ms	+17.7 ms	0.71 μs
50,000	46.7 μs	33.2 ms	+33.2 ms	0.66 μs

What this shows:

Feature-off cost is zero. Nanoseconds-per-N and zero allocations — the existing streaming-emit path is unchanged for captures that don't enable the option.
Per-LSN cost falls with scale. Fixed query overhead amortizes — at 50k LSNs the per-LSN cost is 8× lower than at 100 LSNs. This is the inverse of an IN-list approach, where per-LSN cost is roughly constant.
Single page covers the common case. tranEndTimesPageSize = 100,000; 50k LSNs (a high-throughput cycle) fits in one page = one query.
Memory bound. ~30 MiB worst case per active page per worker, ~120 MiB across the 4 concurrent pollTable workers. Tunable via tranEndTimesPageSize.

Comparison with the earlier IN-list iteration

Distinct LSNs	IN-list (earlier rev)	Range prefetch	Speedup
100	1.12 ms	0.52 ms	2.2×
1,000	12.5 ms	1.10 ms	11×
4,096	55.2 ms	~3 ms (interpolated)	~18×

Realistic workload sizing

Cost scales with distinct LSNs, not row count. Multi-row transactions (INSERT ... VALUES (...), (...), (...), bulk operations) share one start_lsn, so high-write-amplification workloads dedup heavily and see proportionally less overhead.

Order-of-magnitude estimates (workloads vary; not from production data):

Workload	Txns/sec	Distinct LSNs / 500 ms cycle	Per-cycle cost	Aggregate overhead
Light OLTP / SaaS app	10–100	5–50	~0.5 ms	negligible
Typical mid-size app	100–1,000	50–500	0.5–1 ms	~0.1–0.2% of a core
Busy e-commerce / financial	1,000–10,000	500–5,000	1–4 ms	~0.2–0.8% of a core
Peak / enterprise	10,000–100,000	5,000–50,000	4–33 ms	~0.8–6.6% of a core

(Costs assume the per-LSN curve from the benchmark above. Real Linux x86_64 SQL Server is likely 2–5× faster than the SQL Edge ARM64 numbers shown.)

polling_interval (advanced option, default 500 ms) is tunable but does not reduce this feature's CPU cost — only the dedup ratio does. At 100k tps with a 5 s polling interval, a single page (100k rows) would be needed every cycle.

Catch-up scenarios

A multi-hour catch-up means a wide LSN range that exceeds one page. The connector iterates lookupTranEndTimesPage returning pageEnd set to the last row's start_lsn whenever the page hits its row cap, then issues a TVF over the same sub-range. Each iteration: one prefetch query, one TVF query, bounded memory.

A hypothetical 10M-LSN backlog dispatches ~100 prefetch+TVF iterations at ~33 ms prefetch each = ~3.3 s of total added time across the catch-up. The change-row processing dominates by orders of magnitude.

Testing — what was actually run

This is a starting point for validation, not exhaustive. Estuary CI should exercise the full capture pipeline against a SQL Server with the SQL Agent running.

go build ./... and go vet ./... clean for both source-sqlserver and go/capture/sqlserver/tests.

Verbatim Go test output (against Azure SQL Edge, linux/arm64-native)

=== RUN   TestSpec
    main_test.go:206: opening control connection: addr="127.0.0.1:1433", user="sa"
--- PASS: TestSpec (1.70s)
=== RUN   TestLookupTranEndTimesPageIntegration
    replication_test.go:124: validated 4 rows across 3 transactions: id1/2 share ts_ms=1777419608800, id3 ts_ms=1777419609053, id4 ts_ms=1777419609310
--- PASS: TestLookupTranEndTimesPageIntegration (0.80s)
=== RUN   TestLookupTranEndTimesPageEmpty
--- PASS: TestLookupTranEndTimesPageEmpty (0.01s)
=== RUN   TestLookupTranEndTimesPagePagination
    replication_test.go:256: walked 1102 distinct LSNs across 3 pages of size 500
--- PASS: TestLookupTranEndTimesPagePagination (0.44s)
PASS
ok    github.com/estuary/connectors/source-sqlserver  3.606s

What each test proves:

TestSpec — JSON config schema is generated correctly with the new populate_source_ts_ms advanced field.
TestLookupTranEndTimesPageIntegration — runs three transactions (one with two rows, two singletons), captures the FromLSN/ToLSN bounds via sys.fn_cdc_get_max_lsn(), forces sys.sp_cdc_scan, calls lookupTranEndTimesPage, verifies the entire range fits in one page (pageEnd == toLSN), multi-row transaction equality, distinct/non-decreasing across transactions, and byte-for-byte agreement with a direct SELECT tran_end_time FROM cdc.lsn_time_mapping WHERE start_lsn = ? query.
TestLookupTranEndTimesPageEmpty — zero-width range returns a non-nil empty map and the requested toLSN as pageEnd.
TestLookupTranEndTimesPagePagination — inserts 1,100 single-row transactions in a server-side loop, walks the LSN range with pageSize=500 (forces ≥ 2 pages), asserts every distinct LSN is consumed exactly once with no gaps or duplicates and that non-final pages are exactly full.

Independent SQL probe against the same database

Issued the connector's exact range query directly through sqlcmd to confirm the underlying assumptions about cdc.lsn_time_mapping:

-- multi-row INSERT (1,'a'),(2,'b'); singletons for ids 3 and 4:
lsn_hex                op  id  val
---------------------- --- --- ---
0x0000002B00002580001B   2   1   a   <- id 1 and id 2 share start_lsn
0x0000002B00002580001B   2   2   b   <- (single transaction)
0x0000002B000025880003   2   3   c
0x0000002B000025900003   2   4   d

-- the connector's range query: SELECT start_lsn, tran_end_time
-- FROM cdc.lsn_time_mapping WHERE start_lsn > @fromLSN AND start_lsn <= @toLSN:
start_lsn              tran_end_time
---------------------- -----------------------
0x0000002B00002580001B 2026-04-28 23:08:20.090
0x0000002B000025880003 2026-04-28 23:08:20.353
0x0000002B000025900003 2026-04-28 23:08:20.617

This independently confirms: a multi-row INSERT produces a single __$start_lsn shared by both rows; three separate commits produce three distinct start_lsn values with monotonically non-decreasing tran_end_time; the connector's range query syntax/semantics match the table schema.

Not yet validated locally

The full snapshot test testPopulateSourceTsMs (in go/capture/sqlserver/tests/capture.go) which routes through flowctl preview → connector binary → SQL Server. Azure SQL Edge has no SQL Server Agent so change tables aren't auto-populated; the GODEBUG=x509negativeserial=1 flag needed for SQL Edge's older cert doesn't propagate to the connector binary spawned by flowctl. This test should run cleanly on Estuary CI which uses real SQL Server, and its snapshot file (source-sqlserver/.snapshots/TestCapture-PopulateSourceTsMs) will be auto-generated by the standard UPDATE_SNAPSHOTS=true test invocation.

Files changed

source-sqlserver/main.go — PopulateSourceTsMs advanced option
source-sqlserver/replication.go — pollTable drives pagination, pollTableRange is the scan-and-emit loop, lookupTranEndTimesPage is the bounded indexed range query
source-sqlserver/backfill.go — explicit Millis: 0 // Not known. matching Postgres/MySQL convention
source-sqlserver/.snapshots/TestSpec — updated config schema snapshot
source-sqlserver/README.md — new section explaining the option
source-sqlserver/replication_test.go (new) — TestLookupTranEndTimesPageIntegration, TestLookupTranEndTimesPageEmpty, TestLookupTranEndTimesPagePagination, BenchmarkLookupTranEndTimes
go/capture/sqlserver/tests/capture.go — new testPopulateSourceTsMs snapshot + programmatic test for the full pipeline

Test plan

Estuary CI runs the full go test ./source-sqlserver/... against the docker-compose SQL Server (this exercises testPopulateSourceTsMs end-to-end and auto-generates the missing snapshot)
Re-run BenchmarkLookupTranEndTimes on Estuary CI hardware against real SQL Server for an apples-to-apples baseline
Manual smoke against a staging environment with a high-throughput captured database: enable the option, confirm _meta/source/ts_ms appears on CDC events and is absent on backfill events, observe per-cycle lookup latency and memory usage
Watch for failed to prefetch transaction commit times warnings in production logs after rollout

Add an advanced capture option `populate_source_ts_ms` (default false) which causes CDC events to carry the transaction commit time as `_meta/source/ts_ms`. The commit time is looked up from `cdc.lsn_time_mapping` after each polling cycle, deduplicated by LSN, and chunked at 1000 LSNs per query to stay under SQL Server's 2100-parameter limit. The lookup is best-effort: failures emit the batch with ts_ms unset and log a warning. Buffered events are flushed in 4096-event sub-batches to bound memory during catch-up scans. Backfill rows leave ts_ms unset (`Millis: 0, // Not known.`), matching the behavior of source-postgres and source-mysql. Closes estuary#3553.

jameswinegar · 2026-04-29T00:16:28Z

Follow-up to consider: TVF + LEFT JOIN as a single query

I investigated whether replacing the prefetch + TVF pair with a single JOINed query would be a worthwhile optimization. Capturing the analysis here so it's not lost.

SELECT t.[__$start_lsn], m.tran_end_time
  FROM cdc.fn_cdc_get_all_changes_X(@p1, @p2, N'all') AS t
  LEFT JOIN cdc.lsn_time_mapping AS m ON t.[__$start_lsn] = m.start_lsn

I verified the syntax works correctly via raw sqlcmd against Azure SQL Edge — multi-row transactions correctly share their tran_end_time, distinct transactions get distinct timestamps. Memory is bounded the same way (streaming rows + 8 bytes per row for the extra column).

Potential wins

Single query per polling cycle (vs. prefetch + TVF), one fewer network round-trip per page
No client-side LSN map (~30 MB at 100k rows in the current design → 0)
Eliminates the pagination loop entirely — no lookupTranEndTimesPage, no tranEndTimesPageSize, no page bookkeeping. Smaller surface area than this PR's current state.

Risks worth measuring before adopting

The TVF is a multi-statement table-valued function. SQL Server's optimizer treats those conservatively and may pick a hash join that materializes one input, regressing the existing capture hot path. The original spec for this PR explicitly warned against this for the IN-list iteration; the warning is less applicable now that we're already doing range queries, but the optimizer concern remains.
Failure mode is worse: with prefetch, a cdc.lsn_time_mapping permission issue, transient error, or NULL handling problem degrades cleanly to "emit batch without ts_ms" (the existing best-effort path). With a JOIN, a query failure interrupts the whole change-row read, which is much harder to recover from in the polling loop.

Couldn't benchmark locally
I tried to A/B the two strategies but hit a go-mssqldb driver issue against Azure SQL Edge: after ~1000+ Exec calls on a connection, subsequent 2-arg QueryContext against the TVF fails with "insufficient number of arguments" even though the SQL is identical to one that works at low volume. Couldn't isolate whether that's SQL-Edge-specific, qemu-specific, or a real driver bug. The current PR code path doesn't trigger it (verified with a 100-row smoke test), but it blocked me from getting a clean A/B comparison on this machine.

willdonnelly · 2026-04-29T13:52:14Z

 	Filegroup                   string `json:"filegroup,omitempty" jsonschema:"title=CDC Instance Filegroup,description=When set the connector will create new CDC instances with the specified 'filegroup_name' argument. Has no effect if CDC instances are managed manually."`
 	RoleName                    string `json:"role_name,omitempty" jsonschema:"title=CDC Instance Access Role,description=When set the connector will create new CDC instances with the specified 'role_name' argument as the gating role. When unset the capture user name is used as the 'role_name' instead. Has no effect if CDC instances are managed manually."`
 	SourceTag                   string `json:"source_tag,omitempty" jsonschema:"title=Source Tag,description=When set the capture will add this value as the property 'tag' in the source metadata of each document."`
+	PopulateSourceTsMs          bool   `json:"populate_source_ts_ms,omitempty" jsonschema:"title=Populate Source ts_ms,default=false,description=When set the connector will populate the '_meta/source/ts_ms' property of CDC events with the transaction commit time. The commit time is looked up from cdc.lsn_time_mapping after each polling cycle. This adds one extra query per change batch and is therefore opt-in. Backfill rows do not have a meaningful commit time so ts_ms will not be set on them."`


I would change the description to "Populate Source Timestamps", it feels weird to put the ts_ms field name into the human-readable field description which will render in the UI.

Renamed to Populate Source Timestamps; the description no longer references the ts_ms field name.

willdonnelly · 2026-04-29T14:00:01Z

+// testPopulateSourceTsMs verifies the populate_source_ts_ms advanced option:
+// CDC events carry transaction commit time as _meta/source/ts_ms; backfill
+// rows and flag-off captures do not.
+func testPopulateSourceTsMs(t *testing.T, setup testSetupFunc) {


This test does not belong here. The tests in this file / package are shared between source-sqlserver and source-sqlserver-ct, but this test cannot possibly work against source-sqlserver-ct.

Moved to source-sqlserver/capture_test.go as TestPopulateSourceTimestamps. Snapshot at source-sqlserver/.snapshots/TestPopulateSourceTimestamps.

willdonnelly · 2026-04-29T14:05:26Z

+// forces sys.sp_cdc_scan so the test works without the SQL Server Agent,
+// and verifies that lookupTranEndTimesPage returns the same
+// start_lsn -> tran_end_time mapping that's in cdc.lsn_time_mapping.
+func TestLookupTranEndTimesPageIntegration(t *testing.T) {


This test appears to be mostly validating SQL Server CDC behavior, and to the extent that it's actually exercising connector functionality it appears to be checking a trivial property that amounts to "is the data we read from this table the same as the data we read from this table?"

willdonnelly · 2026-04-29T14:10:42Z

+	}
+
+	var ctx = context.Background()
+	var controlURI = (&Config{


This whole chunk of connection-opening code is boilerplate which should be factored out into a helper function. I would probably either use the existing blackboxTestSetup() helper (which does some other setup work which isn't strictly needed here but which is benign), or factor out just the "open control connection" logic from that into another named helper function and use it here. Either way, half of the line count of these tests should not be devoted to boilerplate connection opening code.

The whole file is gone. TestPopulateSourceTimestamps (capture_test.go) uses blackboxTestSetup().

willdonnelly · 2026-04-29T14:15:19Z

+		require.NoErrorf(t, err, "executing %s", query)
+	}
+
+	runSQL(fmt.Sprintf(`IF EXISTS (SELECT 1 FROM cdc.change_tables WHERE capture_instance = N'%s') EXEC sys.sp_cdc_disable_table @source_schema = 'dbo', @source_name = 'tsms_page_%s', @capture_instance = N'%s'`, captureInstance, probeID, captureInstance))


This table creation/cleanup code is duplicating test helpers we already have.

The whole file is gone. Coverage is TestPopulateSourceTimestamps which uses db.CreateTable().

willdonnelly · 2026-04-29T14:16:50Z

+
+	runSQL(fmt.Sprintf(`DECLARE @i INT = 0; WHILE @i < %d BEGIN INSERT INTO %s (id) VALUES (@i); SET @i = @i + 1; END`, numTransactions, tableName))
+
+	if _, err := conn.ExecContext(ctx, "EXEC sys.sp_cdc_scan @maxtrans = 5000, @maxscans = 10"); err != nil {


It's not entirely clear to me why extra effort was put into running this against SQL Server Edge when our normal test DB setup defined in source-sqlserver/docker-compose.yaml runs a perfectly normal SQL Server instance which doesn't need these accommodations.

The standard docker-compose.yaml is what runs in TestPopulateSourceTimestamps; no Edge-specific code remains. The original docker-compose failure on my end was that I had Docker Desktop set to "Docker VMM" instead of Apple Virtualization Framework — switching to VZ + Rosetta resolved it and SQL Server 2022 starts cleanly.

willdonnelly · 2026-04-29T14:18:50Z

+	require.NoError(t, rows.Close())
+	require.Equalf(t, numTransactions, len(expectedLSNs), "expected %d distinct LSNs in change table; got %d", numTransactions, len(expectedLSNs))
+
+	// Walk the range in pages exactly the way pollTable does.


Writing a bunch of extra logic to "do something the same way as the real implementation" is a sign that something is being done wrong here. If it's worth having a test for, it's worth finding a way to factor out the real logic so it can be invoked from the test.

The whole file is gone. Coverage is TestPopulateSourceTimestamps. Pagination boundaries are not directly exercised by the integration test workload — if that's a concern, the right shape is making tranEndTimesPageSize a test-tunable var (like cdcCleanupInterval) and lowering it inside the integration test. Happy to add that if you'd like.

willdonnelly · 2026-04-29T14:24:43Z

+//
+//	GODEBUG=x509negativeserial=1 TEST_DATABASE=yes \
+//	    go test -run='^$' -bench=BenchmarkLookupTranEndTimes -benchmem ./source-sqlserver/...
+func BenchmarkLookupTranEndTimes(b *testing.B) {


While performance impact is a concern with this feature, the performance of doing the map lookups was never the issue and that's basically all this is measuring. The actual concerns are things like "how will an extra database RTT impact overall polling time in production deployments where the connector and database aren't running on the same machine" and "is it worth trying to cache the LSN time mapping information across all tables within a single polling cycle" and this benchmark says nothing useful about any of that. Recommend deleting.

willdonnelly · 2026-04-29T14:25:51Z

+do not have a meaningful commit time and will not have `ts_ms` set, matching
+the behavior of the Postgres and MySQL connectors.
+
+The lookup is best-effort: if it fails for any reason, the rest of the


Is there a concrete failure scenario in mind here or is this just typical LLM "let me add fallbacks everywhere just in case" behavior?

No concrete scenario; the fallback is gone. A cdc.lsn_time_mapping query failure now propagates from pollTable like any other transient SQL error.

Per review feedback: don't put the field name 'ts_ms' into the human-readable UI label and description.

…package Per review feedback: the test in go/capture/sqlserver/tests/capture.go is shared between source-sqlserver and source-sqlserver-ct, but populate_source_ts_ms is a source-sqlserver-only config field. Move TestPopulateSourceTimestamps and the extractTsMsValues helper into the source-sqlserver package where they belong.

Per review feedback: TestLookupTranEndTimesPageIntegration was mostly exercising SQL Server CDC behavior (multi-row transactions share LSNs, distinct transactions get distinct commit times) rather than connector logic, and where it did touch connector code it tautologically asserted that the data we read from cdc.lsn_time_mapping matches the data we read from cdc.lsn_time_mapping. Deleted.

Per review feedback: BenchmarkLookupTranEndTimes only measured the in-process map lookup performance, which was never the actual concern. The real questions for this feature are network round-trip impact in production deployments and whether per-polling-cycle caching across tables would be worthwhile, none of which this benchmark could speak to.

Per review feedback: - Use blackboxTestSetup() for connection setup instead of duplicating the control-connection boilerplate. - Use db.CreateTable() / db.QuietExec() for table create/cleanup instead of duplicating sp_cdc_enable_table / sp_cdc_disable_table by hand. - Drop sys.sp_cdc_scan and other Azure-SQL-Edge-specific accommodations; the standard test docker-compose runs a real SQL Server instance with the Agent enabled. - Replace the multi-page pagination walk (which mirrored pollTable's loop) with a single-call boundary test that asserts the function's pagination contract directly: a full page returns exactly pageSize entries with pageEnd advanced to the last entry's start_lsn; a non-full page returns all entries with pageEnd == toLSN.

Per review feedback: there was no concrete failure scenario the fallback was defending against. A cdc.lsn_time_mapping query failure is no different from any other transient SQL error in pollTable - propagate it and let the connector's outer retry loop handle it. Removes the swallow- the-error log path and the conditional re-call into pollTableRange.

Followup to review feedback: deleted replication_test.go. The empty-range test exercised a trivial branch and the pagination boundary test, while narrowly useful, was stylistically inconsistent with the rest of the connector test suite (everything else is blackbox integration via flowctl). Feature coverage lives in TestPopulateSourceTimestamps in capture_test.go.

Generated by running the test against the standard docker-compose SQL Server 2022 with the SQL Server Agent enabled. Also sanitize ts_ms to a zero placeholder rather than the literal string 'REDACTED' so the emitted JSON stays valid and the snapshot pretty-prints consistently (numeric field stays numeric).

Restores the file to match origin/main; the trailing blank line was introduced incidentally when removing testPopulateSourceTsMs.

jameswinegar force-pushed the feat/source-sql-server_populate-source-ts-ms branch 4 times, most recently from b0f80a0 to 9450be8 Compare April 28, 2026 23:42

jameswinegar force-pushed the feat/source-sql-server_populate-source-ts-ms branch from 9450be8 to 8210e6a Compare April 28, 2026 23:55

willdonnelly reviewed Apr 29, 2026

View reviewed changes

jameswinegar added 9 commits April 29, 2026 09:53

source-sqlserver: rename advanced option to 'Populate Source Timestamps'

c73e19d

Per review feedback: don't put the field name 'ts_ms' into the human-readable UI label and description.

go/capture/sqlserver/tests: drop stray trailing newline

d076fab

Restores the file to match origin/main; the trailing blank line was introduced incidentally when removing testPopulateSourceTsMs.


		runSQL(fmt.Sprintf(`DECLARE @i INT = 0; WHILE @i < %d BEGIN INSERT INTO %s (id) VALUES (@i); SET @i = @i + 1; END`, numTransactions, tableName))

		if _, err := conn.ExecContext(ctx, "EXEC sys.sp_cdc_scan @maxtrans = 5000, @maxscans = 10"); err != nil {

Conversation

jameswinegar commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Design

Why range pre-fetch over per-batch IN-list

Performance

Microbenchmark — feature off vs feature on

Comparison with the earlier IN-list iteration

Realistic workload sizing

Catch-up scenarios

Testing — what was actually run

Verbatim Go test output (against Azure SQL Edge, linux/arm64-native)

Independent SQL probe against the same database

Not yet validated locally

Files changed

Test plan

Uh oh!

jameswinegar commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jameswinegar commented Apr 28, 2026 •

edited

Loading

jameswinegar commented Apr 29, 2026 •

edited

Loading