Skip to content

Fix agent-backed service readiness reconciliation#572

Merged
bussyjd merged 1 commit into
mainfrom
fix/agent-backed-readiness
May 30, 2026
Merged

Fix agent-backed service readiness reconciliation#572
bussyjd merged 1 commit into
mainfrom
fix/agent-backed-readiness

Conversation

@bussyjd
Copy link
Copy Markdown
Collaborator

@bussyjd bussyjd commented May 29, 2026

Summary

What changed:

  • Requeue Agent reconciliation after WaitingForDeployment so agent-backed offers advance once Hermes reports ready replicas.
  • Make obol sell demo quant wait for ServiceOffer readiness before printing the live try-it block.
  • Scope Agent teardown deletes to controller-owned children and tighten child-resource RBAC by resource name where Kubernetes allows it.

Why it matters:

  • Agent-backed services can otherwise get stuck in WaitingForDeployment until an external metadata change wakes the Agent reconciler.
  • Teardown should not delete fixed-name resources in a namespace unless they are owned by the Agent controller.

Risk level: medium

Commit under test: fa1def9

Base branch: main

Scope

  • Code
  • Charts / manifests
  • Flows / QA scripts
  • Docs / skills
  • Images / dependencies
  • Other:

Validation

CI checks:

Check Status Link
Not run yet Pending CI

Unit tests:

go test ./internal/serviceoffercontroller ./internal/embed ./internal/x402 ./cmd/obol -count=1
PASS on fa1def91

Integration tests:

kubectl apply --dry-run=server -f internal/embed/infrastructure/base/templates/x402.yaml
PASS on fa1def91

Flow tests:

Flow Network QA machine label Worktree Result Artifacts
Fresh agent-backed quant demo local k3d local local worktree Agent + ServiceOffer reached Ready without annotation kubectl status + CLI output

Release smoke:

Not run.

Live Chain Evidence

Not applicable. This PR changes controller reconciliation and RBAC, not settlement logic.

Runtime Evidence

Kubernetes / stack:

Item Value
Pod readiness Fresh quant Agent reached Ready after delayed reconcile
Cleanup result Temporary agent namespace deleted after RBAC and ownership-guard fix

Demo readiness:

Item Status Notes
Seller visible / registered Ready locally ServiceOffer catalog returned to 3 demo offers after cleanup
Buyer discovery works Not tested No live chain run in this PR
Paid route works Not tested Controller route readiness only
Settlement visible on-chain Not tested No live chain run in this PR

Review Notes

Known gaps:

  • This uses client-go delayed requeue rather than adding a Deployment informer. That is deliberate for the minimal fix; a Deployment informer would require apps/deployments list/watch and child-to-Agent mapping.

Follow-ups:

  • Consider a Deployment watcher if agent-backed services become high-volume and readiness latency matters.

Reviewer focus:

  • WaitingForDeployment requeue behavior.
  • Agent teardown ownership guard.
  • RBAC resource-name scoping for per-agent child resources.

@bussyjd bussyjd merged commit afb9c7f into main May 30, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant