Skip to content

test(cmd/gc): cut managed-dolt + stale-key delay overhead in city runtime tests#2069

Open
hexsprite wants to merge 5 commits into
gastownhall:mainfrom
hexsprite:bugfix/test-leaks-managed-dolt
Open

test(cmd/gc): cut managed-dolt + stale-key delay overhead in city runtime tests#2069
hexsprite wants to merge 5 commits into
gastownhall:mainfrom
hexsprite:bugfix/test-leaks-managed-dolt

Conversation

@hexsprite
Copy link
Copy Markdown
Contributor

@hexsprite hexsprite commented May 13, 2026

Summary

Cuts managed-dolt spawn + stale-key wait overhead across several cmd/gc tests.

Three commits, all test-only changes (one tiny var-instead-of-const swap in production code so tests can rebind):

1. test: cleanup managed-dolt in city runtime tests (7e09c8f)

Two tests in cmd/gc/city_runtime_test.go built CityRuntime directly instead of going through newTestCityRuntime, skipping the t.Cleanup that tears down the managed dolt sql-server. Even with stub ManagedDoltHealth/Owned/Port callbacks, ensureManagedDoltPublishedForRuntime ran unconditionally and spawned a real dolt sql-server rooted at the test's temp city. Every run left an orphan.

Fix: register cleanupManagedDoltTestCity(t, cityPath) for both, matching the pattern in TestCityRuntimeRunStartupPreflightsManagedDoltBeforeSessionSnapshot.

2. test(cmd/gc): stub managed-dolt store openers to skip dolt spawn (ec04dd6)

Three more city-runtime tests still spawn real managed dolt sql-servers via the controllerState bead store + the order-tracking sweep store opened during newCityRuntime. Each spawn costs ~12s and leaks a child until the test's cleanup runs.

Extract two package-level vars (newControllerStateOpenCityStore, newCityRuntimeOpenSweepStore) so tests can swap in in-memory stores. Add stubManagedDoltStoreOpeners helper and apply to:

  • TestCityRuntimeEnsureManagedDoltPublishedForTickLogsOwnershipError
  • TestCityRuntimeRunStartupPreflightsManagedDoltBeforeSessionSnapshot
  • TestCityRuntimeControlDispatcherPreflightsManagedDoltBeforeSessionSnapshot

Production behavior unchanged; only test code rebinds the vars. The managed-dolt preflight ordering invariant is still verified via the existing fake managedDoltHealth/Owned/Port hooks installed by each test.

3. test(reconciler): shorten stale-key detect delay in circuit-breaker tests (2132a72)

Five circuit-breaker reconciler tests loop through the start path many times. Each iteration paid 2s on the cmd/gc-layer staleKeyDetectDelay plus another 2s on the matching internal/session constant, which dominated wall-time despite the Fake runtime being synchronous (the post-start IsRunning check always succeeds, so no wait is actually needed).

Promote both constants to package-level vars and add session.SetStaleKeyDetectDelayForTest, then introduce shortenStaleKeyDetectDelayForTest in cmd/gc that zeroes both delays for the duration of t. Applied to five TestReconciler_Circuit* tests.

Repro

pgrep -fl 'dolt sql-server'

Before: 10+ orphans with config paths like /var/folders/.../TestCityRuntime* accumulate after the suite.

After: only the legit managed dolt for the active city remains.

Test plan

  • Targeted tests pass: go test -run 'TestCityRuntimeEnsureManagedDoltPublished|TestCityRuntimeRunStartupPreflightsManagedDoltBeforeSessionSnapshot|TestCityRuntimeControlDispatcherPreflightsManagedDoltBeforeSessionSnapshot|TestReconciler_Circuit' ./cmd/gc/...
  • pgrep -fl 'dolt sql-server' after run — only legit dolt survives
  • make test — green

Refs hq-bgb3v

@hexsprite hexsprite requested a review from csells as a code owner May 13, 2026 15:24
@github-actions github-actions Bot added the status/needs-triage Inbox — we haven't looked at it yet label May 13, 2026
@randy-release-manager randy-release-manager Bot added kind/chore Internal improvement (refactor, tests, CI, tooling) priority/p2 Medium — real problem, workaround exists and removed status/needs-triage Inbox — we haven't looked at it yet labels May 13, 2026
@hexsprite hexsprite force-pushed the bugfix/test-leaks-managed-dolt branch from 41fee5b to 6e149bb Compare May 13, 2026 15:45
hexsprite added 2 commits May 13, 2026 10:00
`gc prompt` and `gc prompt synth` were added to the CLI without
regenerating docs/reference/cli.md from cmd/genschema. The pre-commit
hook auto-regenerates this file when staged Go changes touch CLI
definitions, so the drift surfaced on an unrelated PR's commit.

Run: `go run ./cmd/genschema`
Adds TestCLIDocsFreshness, which walks the live cobra tree and asserts
every non-hidden command has a heading in docs/reference/cli.md. The
prior pre-commit hook silently regenerated cli.md and stapled it to
whichever commit happened to touch Go files, so CLI drift leaked onto
unrelated PRs. This catches the drift in CI/pre-commit and points at
the right fix: go run ./cmd/genschema.

Structural rather than byte-equal — cobra registers `completion`/`help`
only on Execute, which the in-test render path skips.
Two tests build CityRuntime directly, skipping newTestCityRuntime.
ensureManagedDoltPublishedForRuntime spawns real dolt sql-server at
construction/tick even with stub callbacks, leaving orphaned children.

Add cleanupManagedDoltTestCity(t, cityPath) for both tests.

Refs hq-bgb3v
hexsprite added 2 commits May 13, 2026 19:05
Three city-runtime tests still spawn real managed dolt sql-servers via the
controllerState bead store + the order-tracking sweep store opened during
newCityRuntime. Each spawn costs ~12s and leaks a child until the test's
cleanup runs.

Extract two package-level vars (newControllerStateOpenCityStore,
newCityRuntimeOpenSweepStore) so tests can swap in in-memory stores.
Add stubManagedDoltStoreOpeners helper and apply to:

- TestCityRuntimeEnsureManagedDoltPublishedForTickLogsOwnershipError
- TestCityRuntimeRunStartupPreflightsManagedDoltBeforeSessionSnapshot
- TestCityRuntimeControlDispatcherPreflightsManagedDoltBeforeSessionSnapshot

Production behavior unchanged; only test code rebinds the vars. The
managed-dolt preflight ordering invariant is still verified via the existing
fake managedDoltHealth/Owned/Port hooks installed by each test.
…ests

Five circuit-breaker reconciler tests loop through the start path many
times. Each iteration paid 2s on the cmd/gc-layer staleKeyDetectDelay plus
another 2s on the matching internal/session constant, which dominated
wall-time despite the Fake runtime being synchronous (the post-start
IsRunning check always succeeds, so no wait is actually needed).

Promote both constants to package-level vars and add
session.SetStaleKeyDetectDelayForTest, then introduce
shortenStaleKeyDetectDelayForTest in cmd/gc that zeroes both delays for
the duration of t. Apply to:

- TestReconciler_CircuitTracksRestartsForSessionsCreatedThroughNamedSessions
- TestReconciler_CircuitDisabledByDefaultAllowsRepeatedWakeAttempts
- TestReconciler_CircuitUsesConfiguredDaemonThresholds
- TestReconciler_CircuitDoesNotRecordRestartForWakeBudgetDeferredNamedSession
- TestReconciler_CircuitTripsThroughRepeatedWakeAttempts

Production behavior unchanged; the helper only flips the vars during t.
@hexsprite hexsprite changed the title test: cleanup managed-dolt in city runtime tests test(cmd/gc): cut managed-dolt + stale-key delay overhead in city runtime tests May 14, 2026
Copy link
Copy Markdown
Contributor

@csells csells left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs lgtm

@csells csells requested a review from julianknutsen May 15, 2026 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/chore Internal improvement (refactor, tests, CI, tooling) priority/p2 Medium — real problem, workaround exists

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants