Fix singleton session demand identity#2053
Conversation
This comment has been minimized.
This comment has been minimized.
quad341
left a comment
There was a problem hiding this comment.
Thanks @julianknutsen — really sorry to bounce this back after taking a swing at the merge on our side. The structural fix (poolDesiredRequestIdentity / claimDesiredPoolSlot) is genuinely good, the unit-test surfaces you picked are exactly the right pins, and the integration test reconciled cleanly. But the rebase surfaced something that's a call only you can make, so I'm flagging it rather than guessing.
What changed underneath you: main commit ff7fff79 (the Kimi merge) reworked config.Agent.SupportsInstanceExpansion() in internal/config/config.go:2360-2381 to give max_active_sessions=1 two distinct flavors:
MinActiveSessionsset ORScaleCheckset → "pool flavor" → still expands toname-1- Neither set → "named-session flavor" → canonical
name
Your new test TestBuildDesiredState_MaxOneAgentDemandUsesCanonicalIdentity uses fixture {MaxActiveSessions: 1, ScaleCheck: "printf 1"} and asserts canonical identity. Under the new SupportsInstanceExpansion(), that fixture is now explicitly pool-flavor, so your own poolDesiredRequestIdentity correctly routes it to the slot path and emits cashmaster/refinery-1 — failing the assertion. The two intents are direct opposites and there's no fixture I can substitute on your behalf that would still capture what you meant: any demand source (Min or ScaleCheck) trips pool-flavor on main, and without a demand source there's no session to inspect.
What we'd like from you — pick one:
-
Narrow the scope to true singletons (the path that's still buggy on main): keep the helper extraction, change the test fixture to
max=1withoutScaleCheckand withoutMinActiveSessions, and drive the assertion through a different demand source (e.g. an assigned work bead or a[[named_session]]entry). YourclaimDesiredPoolSlotshort-circuit to 0 is real value here — main currently still stamps a phantompool_slot="1"on true singletons viaclaimPoolSlotWithConfig, and your factoring removes that cleanly. -
Make the case for canonical identity under
max=1 + ScaleCheck— i.e. argue that the Kimi merge's pool-flavor carve-out is wrong andSupportsInstanceExpansion()should be changed too. That's a real conversation worth having if you think the new semantics are off, but it expands the scope of this PR substantially and would want its own discussion thread first.
Whichever direction you go, the helper extraction itself is keepable as-is and the integration coverage already passes — so option (1) is a fairly small follow-up. Truly appreciate the precision of the original fix and the regression-pin shape you chose; I want to ship the value here, just don't want to guess at the design call for you.
6c095ff to
fe042ee
Compare
Prevent max_active_sessions=1 agents from materializing generic demand as synthetic -1 pool instances. True expanding agents still use slot identities and pool_slot metadata.
Keep canonical singleton demand and sync metadata slot-free, adopt stale suffixed singleton sessions under the canonical agent identity, and route lifecycle/status enumeration through the canonical runtime identity for max-one scale-check pools. Add focused regressions for pool sessions, death handlers, idle and max-age trackers, rig status, city status, adoption, sync, and deferred alias recovery.
fe042ee to
aad66d5
Compare
quad341
left a comment
There was a problem hiding this comment.
Hey @julianknutsen — quick update. The CI red on TestGCLiveContract_BeadsAndEvents (rest-full-2-of-16) turned out to be a pre-existing dolt transaction flake on POST /session/{id}/suspend — not anything caused by this PR. Same flake reproduced on main earlier that morning, and the parallel rerun on your exact SHA came back clean. Your pinned tests all pass locally. The PR code is in good shape.
We're going to re-run the flaky shard and merge when it's green. Nothing for you to do on this one. The two-tick production-order reclaim and the snapshot-label safety case in particular were a really nice touch — those will keep us honest going forward.
Thanks for the rigor, and for taking the harder option-2 path. This is a real correctness win for singleton demand on refinery/cashmaster.
Maintainer Adoption ReviewThanks for the contribution, @julianknutsen! This PR fixes singleton session demand identity for max-one non-namepool pool agents, which matters because the controller, dependency-floor reuse, lifecycle cleanup, and status surfaces need to converge on the same canonical session name instead of creating or chasing phantom This PR was reviewed and adopted with maintainer fixes pushed directly to the PR branch. Original PR ReviewDecision: approve. Specific gaps fixed:
Review findings addressed:
Remaining non-gating notes:
Maintainer ChangesMaintainer-side adoption preserved contributor authorship and pushed the reviewed fix set directly to the PR branch:
Short diff stat: 33 files changed, 3424 insertions(+), 256 deletions(-). Final Review StatusReady for the merge queue: final head CI: https://github.com/gastownhall/gascity/actions/runs/25963287187 Review Iterations23 review passes were performed. The review/fix loop resolved the singleton identity contract, dependency-floor creation and reuse, stale suffix recovery, lifecycle/status routing, deterministic singleton candidate selection, and final base-refresh/CI readiness before approval. Adopted via |
quad341
left a comment
There was a problem hiding this comment.
@julianknutsen — thank you for taking the harder path on this one. Generalizing the fix into UsesCanonicalSingletonPoolIdentity and SupportsExpandedSessionIdentities as a single predicate that api/agentutil/dispatcher/lifecycle trackers all key off turned what could have been a narrow cashmaster/refinery patch into a real correctness story across the codebase.
The deferred-alias recovery is what makes this safe in production — the two-tick ordering where the reconciler closes the unselected canonical duplicate before sync reclaims the alias, with TestProductionOrderDeferredSingletonAliasReclaimsOnSecondTick pinning the exact sequence, gives us confidence the migration won't strand stale singleton beads. The snapshot-non-mutation pin and the add-only append guard are both subtle invariants future readers will appreciate having spelled out.
CI is green on every completed shard with a couple of integration shards still in flight and nothing failed. Merging this as soon as the rollup settles — nothing for you to do. Beautiful work.
Reviewers: Qwen — failed · Claude (claude-opus-4-7[1m]) — ok · Codex (gpt-5.5) — ok
Synthesis: claude-opus-4-7[1m]
Review coverage:
- completed:
claude codex - unavailable:
qwen
Fixes #1439.
Summary
max_active_sessions = 1demand on the configured canonical agent identity instead of synthesizing a-1pool instance.pool_slotmetadata for agents that actually support instance expansion.Tests
go test ./cmd/gc -run 'TestBuildDesiredState_(MaxOneAgentDemandUsesCanonicalIdentity|NewPoolSessionBeadCreatedWithConcreteIdentity)$' -count=1go test ./cmd/gc -run 'Test(BuildDesiredState_(MaxOneAgentDemandUsesCanonicalIdentity|NewPoolSessionBeadCreatedWithConcreteIdentity|NewPoolSessionBeadDefersAliasWhenConcreteAliasTaken|DoesNotCreateDuplicatePoolBeadForDiscoveredSession|PoolBeadIdentityAgreesAcrossRealizeAndCanonicalHelper)|ComputePoolDesiredStates_MaxOneTemplatesStillParticipateInDemand|CanonicalSessionIdentity)$' -count=1make testlint-changed, generated docs/spec checks,go vet ./...,make test