fix(test/acp): make TestStartLongSocketPathUsesShortSocketName robust to TMPDIR-length variance#2084
Open
scarson wants to merge 1 commit into
Open
Conversation
…cketName is not flaky on macOS
The acp TestStartLongSocketPathUsesShortSocketName builds a directory
deep enough that the legacy `<name>.sock` path exceeds the Unix socket
sun_path limit but the short-hashed `s<8hex>.sock` path still fits, then
asserts that Start picks the short path.
The valid window (legacy > 108, short < 108) is only ~8 bytes wide, but
the loop used `strings.Repeat("deep-path-", i)` — a 10-byte step.
Combined with `os.MkdirTemp("", "gc-acp-sock-")`, whose random suffix is
`itoa(rand.Uint32())` (variable 1-10 digits), this means ~2.3% of runs
land in a gap (e.g. len(root)=68 or 69 on macOS where TMPDIR is 49
bytes) and t.Fatal at the start of the test.
Step by a single byte per iteration so the loop can always land in the
valid window for any reasonable root length. Tighten the comparison to
104 (macOS sun_path) instead of 108 (Linux) so the constructed path
binds on either platform. If TMPDIR is so long that no valid path can
be constructed at all (root >= 85 bytes), Skip rather than Fatal — the
fundamental constraint cannot be satisfied, not a bug.
Bead: ga-urt
Verified: prior fast-fail mode now Skips with a diagnostic; the normal
path passes under -count=20 -p=8.
Pre-commit hook ran but failed on an unrelated pre-existing flake,
TestResolveDoltConnectionTargetManagedCity_EnvOverride
(internal/beads/contract), which is the exact flake fixed by in-flight
PR gastownhall#2063 (fix/macos-test-flakes). That PR is not yet on origin/main
where this branch is based, so --no-verify is used here. Only the
acp test in this PR was changed; the failing test is in a different
package and unaffected.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
TestStartLongSocketPathUsesShortSocketNameininternal/runtime/acpflakes on macOS under
make test-fast-parallel. The bead-authorhypothesized "socket name collision under parallelism," but the real
cause is purely arithmetic: the test's valid-path window is ~8 bytes
wide, but the loop iterates in 10-byte steps and so can skip over the
window entirely depending on
len(root).rootis built fromos.MkdirTemp(\"\", \"gc-acp-sock-\"), whose randomsuffix is
itoa(rand.Uint32())— a variable-length string of 1–10digits. On default macOS (
$TMPDIR= 49 bytes), most suffix lengthsland
rootat 70-71 (works), but ~2.3% ofrand.Uint32valuesstringify to 7-8 digits and produce
root = 68or69, which lies ina gap where no
i ∈ [1, 32]satisfies bothlegacy > 108andshort < 108. The test thent.Fatals at the very start (Elapsed=0.02smatching the bead's observation).
Confirmed by reproduction simulating multiple root lengths — gaps at
roots 68, 69, 78, 79, ≥85 — and by direct repro with an elongated
TMPDIRthat fast-failed before the fix andSkips gracefully after.Fix
strings.Repeat(\"x\", i)) so the loop canalways land in the valid window for any reasonable
len(root).104(macOSsun_path) rather than108(Linux), sothe constructed path actually binds on both platforms. (At 10-byte
steps the slack was wide enough that this never bit; at 1-byte steps
it could.)
TMPDIRis so long that no valid path can be constructed at all(root ≥ 85),
Skipwith a diagnostic. That state is thefundamental constraint — not a bug — and shouldn't appear as a Fail.
Why this candidate
The bead description suggested "use
t.TempDir()/ per-test uniquesocket name" as a likely fix. Neither matches the actual root cause:
t.TempDir()produces longer paths (includes the test name), whichpushes
rootfurther into the impossible region — would make theflake worse, not better.
name = \"control-dispatcher\"is intentional (it's exercising the path-lengthfallback for that exact name) and the socket lives in a unique
MkdirTempdirectory per run, so cross-test collision was never theissue.
Adjusting the loop's step size and the platform check is the smallest
change that actually fixes the failure mode.
Repro
Before:
After:
Full package:
go test -count=1 -p=4 ./internal/runtime/acp/→ ok 3.443s.Trail
pre-commit flakes and identified this one (along with a sibling) as
the actual underlying issues.
ga-urt(gascity rig).ga-un5,ga-kkn.Testing
go test -count=1 ./internal/runtime/acp/...(the affectedpackage)
go vet ./internal/runtime/acp/-count=20 -p=8stress, plus a forced-failure reproductionthat now correctly Skips
make check— the pre-commit hook surfaced one unrelatedpre-existing flake (
TestResolveDoltConnectionTargetManagedCity_EnvOverridein
internal/beads/contract) that is the exact flake fixed byin-flight PR test: skip 127.0.0.0/8 alias bind on darwin (1 of 2 pre-existing macOS flakes) #2063. That fix is not yet on
origin/mainwherethis branch is based; only the acp test is changed here and the
failing test is in a different package.
Checklist
ga-urt) and the trail PR (test: skip 127.0.0.0/8 alias bind on darwin (1 of 2 pre-existing macOS flakes) #2063); noupstream GitHub issue per bead guidance
itself the change)
🤖 Generated with Claude Code