Skip to content

fix(stack): refresh x402 runtime image pins#557

Open
bussyjd wants to merge 1 commit into
mainfrom
fix/x402-rc6-runtime-image-pins
Open

fix(stack): refresh x402 runtime image pins#557
bussyjd wants to merge 1 commit into
mainfrom
fix/x402-rc6-runtime-image-pins

Conversation

@bussyjd
Copy link
Copy Markdown
Collaborator

@bussyjd bussyjd commented May 27, 2026

Summary

Fixes the rc6 runtime-image mismatch behind two reported upgrade findings:

  • /skill.md could fail under the new restricted Pod Security admission because the released serviceoffer-controller image was still the rc0-era b13254e build, which did not render restricted security contexts for controller-owned httpd workloads.
  • Per-agent Hermes pods could render without the later fsGroup / profile-seed changes for the same reason: the embedded manifest pointed at an old controller image while the rc6 source and tests already contained the fix.

This PR pins the controller and buyer sidecar to branch-built multi-arch images and adds tests that prevent these refs from drifting back to stale runtime builds.

Root Cause

The rc6 source at 98fa406 contains the intended controller-render fixes, but the embedded production manifest still deployed a stale controller image:

flowchart TD
    A[v0.10.0-rc6 source tag<br/>98fa406] --> B[Restricted PSS render code present]
    A --> C[Agent pod fsGroupChangePolicy present]
    D[embedded x402.yaml] --> E[serviceoffer-controller:b13254e]
    E --> F[Controller runs old renderer]
    F --> G[obol-skill-md missing restricted fields]
    F --> H[per-agent Hermes misses later pod spec fixes]
Loading

The corrected deployment path is:

flowchart TD
    A[Fix branch source<br/>f5d94fc] --> B[Published multi-arch images<br/>f5d94fc@sha256]
    B --> C[embedded templates]
    C --> D[serviceoffer-controller renders restricted httpd workloads]
    C --> E[serviceoffer-controller renders per-agent fsGroup fields]
    D --> F[PodSecurity restricted admission succeeds]
    E --> G[per-agent Hermes PVC ownership path matches fixed source]
Loading

Changes

  • Pin serviceoffer-controller to f5d94fc@sha256:c6aa..., the branch-built image containing the controller-side hardening.
  • Pin x402-buyer to f5d94fc@sha256:0c431... so the buyer sidecar stays on the same x402 branch build.
  • Leave the verifier image to PR fix(x402): inject agent upstream auth #556, which owns the agent upstream-auth verifier change and pins its own branch-built verifier image. This avoids merge-order regressions where one PR would overwrite the other component's runtime pin.
  • Add image-pin coverage that fails if the controller or buyer refs drift back to stale runtime images.
  • Add /data/.hermes/logs to both default Hermes and controller-rendered per-agent Hermes init paths, with test assertions. This is a narrow hardening for the exact path reported in the crash loop.

Validation

go test ./internal/embed -run 'TestEmbeddedImages_X402ControllerAndBuyerUseFixPins|TestEmbeddedImages_NamedImagesAreDigestPinned' -count=1
go test ./internal/serviceoffercontroller -run 'TestAgentManifests_DeploymentUsesFSGroup|TestAgentManifests_ProfileSeedInitContainer|TestBuildSkillCatalogDeployment_RestrictedPSS|TestBuildAgentIdentityRegistrationDeployment_RestrictedPSS' -count=1
go test ./internal/embed ./internal/serviceoffercontroller ./internal/hermes -count=1

Image publication:

gh workflow run docker-publish-x402.yml --ref fix/x402-rc6-runtime-image-pins
docker buildx imagetools inspect ghcr.io/obolnetwork/serviceoffer-controller:f5d94fc --format '{{ .Manifest.Digest }}'
# sha256:c6aa6259e3a6bc61a5f4f7203d8c68cfdd861a8d365f9629d234d13b949bf48e
docker buildx imagetools inspect ghcr.io/obolnetwork/x402-buyer:f5d94fc --format '{{ .Manifest.Digest }}'
# sha256:0c431eda44e9e2fe5dd50c82cf4885f9be5037e592478781c51e9c510171265c

Live sanity check against the existing smoke cluster, without teardown:

sequenceDiagram
    participant CLI as obol agent new --create-wallet
    participant Controller as serviceoffer-controller latest local image
    participant K8s as k3d cluster
    participant Hermes as per-agent Hermes

    CLI->>K8s: Apply Agent rc6-pvc-probe
    Controller->>K8s: Render namespace, PVC, Deployment, remote-signer
    K8s->>Hermes: Start pod with fsGroupChangePolicy
    Hermes-->>K8s: Running 1/1
Loading

Observed: per-agent Hermes and remote-signer reached Running with no manual PVC chown on the live smoke cluster.

Remaining smoke gate

Full flow smoke is running from a combined branch with forced local dev images, but it is currently blocked on the local sudo prompt. The LLM-gated dual-stack flows are also blocked because silvermesh.v1337.lan:8081 is reachable on the LAN but refusing TCP connections.

@bussyjd bussyjd force-pushed the fix/x402-rc6-runtime-image-pins branch from f5d94fc to 7af6ca6 Compare May 27, 2026 11:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants