Skip to content

Add policy_stopped workload status and enforce policy gate on restart#4857

Merged
reyortiz3 merged 9 commits intomainfrom
reyortiz3/policy-stopped-status
Apr 20, 2026
Merged

Add policy_stopped workload status and enforce policy gate on restart#4857
reyortiz3 merged 9 commits intomainfrom
reyortiz3/policy-stopped-status

Conversation

@reyortiz3
Copy link
Copy Markdown
Contributor

@reyortiz3 reyortiz3 commented Apr 15, 2026

Summary

  • Policy enforcement currently only gates workload creation — running workloads are unaffected when a policy activates, and restart paths have no policy check at all. This PR adds the upstream groundwork to enforce the non_registry_servers policy on existing workloads.
  • Adds WorkloadStatusPolicyStopped as a first-class status (following the precedent of WorkloadStatusUnauthenticated) so the policy layer can set it when stopping a non-compliant workload, giving the UI and CLI a programmatic signal rather than freeform text.
  • Calls EagerCheckCreateServer in restartSingleWorkload so the policy gate blocks any restart attempt for workloads that violate the active policy.

Changes

File Change
pkg/container/runtime/types.go Add WorkloadStatusPolicyStopped constant
pkg/core/workload.go Add policy_stopped to enums tag for swagger
pkg/workloads/manager.go Map policy_stoppedBackendUnhealthy; call EagerCheckCreateServer in restart path
cmd/thv/app/list.go, status.go Add 🚫 indicator for policy_stopped in CLI output
docs/server/swagger.*, docs.go Regenerated to include new enum value

Type of change

  • New feature (non-breaking change which adds functionality)

Test plan

  • task build passes
  • task test passes (pre-existing failures in pkg/client and pkg/telemetry are unrelated to this change and reproduce on main)
  • Swagger enum updated and regenerated via task docs

Does this introduce a user-facing change?

CLI list and status commands will show 🚫 policy_stopped for workloads stopped by policy, making the reason immediately visible rather than showing a plain stopped or error.

Generated with Claude Code

Introduces WorkloadStatusPolicyStopped as a first-class workload status
to support enterprise policy enforcement. When non-registry servers are
blocked by policy, workloads can be set to this status so the UI and CLI
can surface the reason clearly, rather than showing an opaque "error".

- Add WorkloadStatusPolicyStopped constant to pkg/container/runtime/types.go
- Update enums tag in pkg/core/workload.go to include policy_stopped
- Map policy_stopped to BackendUnhealthy in mapWorkloadStatusToVMCPHealth
- Call EagerCheckCreateServer in restartSingleWorkload so the policy gate
  blocks restart of workloads that violate the active policy
- Add 🚫 indicator for policy_stopped in CLI list and status output,
  following the existing ⚠️ pattern for unauthenticated workloads
- Regenerate swagger docs to include the new status enum value

Part of stacklok/stacklok-enterprise-platform#406

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added the size/XS Extra small PR: < 100 lines changed label Apr 15, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 15, 2026

Codecov Report

❌ Patch coverage is 0% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.43%. Comparing base (8504427) to head (670265f).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
pkg/workloads/manager.go 0.00% 9 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4857   +/-   ##
=======================================
  Coverage   69.43%   69.43%           
=======================================
  Files         539      539           
  Lines       55750    55760   +10     
=======================================
+ Hits        38708    38716    +8     
- Misses      14067    14069    +2     
  Partials     2975     2975           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread docs/server/docs.go
Comment thread pkg/workloads/manager.go
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added size/XS Extra small PR: < 100 lines changed and removed size/XS Extra small PR: < 100 lines changed labels Apr 15, 2026
Add all WorkloadStatus cases to the switches in list and status
commands to satisfy the exhaustive linter.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added size/XS Extra small PR: < 100 lines changed and removed size/XS Extra small PR: < 100 lines changed labels Apr 15, 2026
@github-actions github-actions Bot added size/XS Extra small PR: < 100 lines changed and removed size/XS Extra small PR: < 100 lines changed labels Apr 15, 2026
@github-actions github-actions Bot added size/XS Extra small PR: < 100 lines changed and removed size/XS Extra small PR: < 100 lines changed labels Apr 15, 2026
@reyortiz3
Copy link
Copy Markdown
Contributor Author

@claude can you verify if the comments were addressed?

@github-actions github-actions Bot added size/XS Extra small PR: < 100 lines changed and removed size/XS Extra small PR: < 100 lines changed labels Apr 15, 2026
- Add policy_stopped to workloadStatusResponse enums tag so the swagger
  spec for the status endpoint includes the new value
- Add EagerCheckCreateServer check in maybeSetupContainerWorkload after
  loadRunnerFromState to cover the path where the outer LoadState call
  fails on a partial name but succeeds after label-based name resolution
- Regenerate swagger docs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added size/XS Extra small PR: < 100 lines changed and removed size/XS Extra small PR: < 100 lines changed labels Apr 15, 2026
@stacklok stacklok deleted a comment from claude Bot Apr 15, 2026
@reyortiz3
Copy link
Copy Markdown
Contributor Author

@claude can you review if the comments have been addressed?

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 15, 2026

Claude finished @reyortiz3's task in 56s —— View job


Review comment verification

  • Read pkg/api/v1/workload_types.go — check enums tag on workloadStatusResponse
  • Read pkg/workloads/manager.go — check placement of EagerCheckCreateServer call
  • Report findings

Both previous review comments have been addressed. ✅

Comment 1 — Missing policy_stopped in workloadStatusResponse enums tag

Status: Fixed

pkg/api/v1/workload_types.go:35 now includes policy_stopped at the end of the enums tag:

Status runtime.WorkloadStatus `json:"status" enums:"running,stopped,error,starting,stopping,unhealthy,removing,unknown,unauthenticated,policy_stopped"`

The swagger-generated files should now consistently emit all 10 enum values for the workloadStatusResponse schema, matching the other blocks.


Comment 2 — EagerCheckCreateServer bypass when LoadState fails

Status: Fixed

A second call to EagerCheckCreateServer was added inside maybeSetupContainerWorkload (pkg/workloads/manager.go:1306-1311), which is invoked by restartContainerWorkload. This covers the path where the outer runner.LoadState fails in restartSingleWorkload and execution falls through to restartContainerWorkload:

// Check policy gates before restarting. This covers the case where the caller
// could not load state via the original name but we resolved the canonical name
// from container labels above, so the check must happen here.
if err := runner.EagerCheckCreateServer(ctx, mcpRunner.Config); err != nil {
    return "", nil, fmt.Errorf("server restart blocked by policy: %w", err)
}

The policy gate is now enforced regardless of whether LoadState succeeds or fails in the outer function, closing the bypass described in the review.


Both raised issues are resolved. The PR looks good from a review-comment-resolution perspective.

rdimitrov
rdimitrov previously approved these changes Apr 15, 2026
Comment thread pkg/workloads/manager.go
Comment thread pkg/workloads/manager.go
Comment thread cmd/thv/app/list.go Outdated
…ndicator

- Call SetWorkloadStatus with WorkloadStatusPolicyStopped in both policy
  gate failure paths in restartSingleWorkload and maybeSetupContainerWorkload,
  so a blocked restart transitions the workload into the new status instead
  of leaving it as stopped
- Extract the status indicator switch (⚠️ / 🚫) from list.go and status.go
  into a single workloadStatusIndicator helper in common.go

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added size/XS Extra small PR: < 100 lines changed and removed size/XS Extra small PR: < 100 lines changed labels Apr 15, 2026
@github-actions github-actions Bot added size/XS Extra small PR: < 100 lines changed and removed size/XS Extra small PR: < 100 lines changed labels Apr 15, 2026
@reyortiz3
Copy link
Copy Markdown
Contributor Author

@claude can you review what comments are pending address?

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Apr 15, 2026

Claude finished @reyortiz3's task in 1m 18s —— View job


Pending comment review

  • Gather context from review comments
  • Check if SetWorkloadStatus(policy_stopped) is called anywhere (jhrozek comment)
  • Check if switch block deduplication was done in common.go (jhrozek nit)
  • Check jhrozek's BackendUnhealthy mapping comment status
  • Report findings

All three of @jhrozek's review comments have been addressed. Here's the breakdown:

Comment 1 — SetWorkloadStatus never transitions to policy_stopped (manager.go:1097)

Status: Addressed ✅

Both policy gate sites now call SetWorkloadStatus with WorkloadStatusPolicyStopped:

  • pkg/workloads/manager.go:1094 — in restartSingleWorkload when the outer LoadState path blocks restart
  • pkg/workloads/manager.go:1313 — in maybeSetupContainerWorkload when the inner path blocks restart
if statusErr := d.statuses.SetWorkloadStatus(ctx, name, rt.WorkloadStatusPolicyStopped, err.Error()); statusErr != nil {
    slog.Warn("Failed to set workload status to policy_stopped", ...)
}

The status is now reachable and will be set when a restart is blocked by policy.


Comment 2 — PolicyStopped maps to BackendUnhealthy instead of a dedicated variant (manager.go:248)

Status: Author provided justification, no code change

@reyortiz3 replied noting this follows the same pattern as Stopped, Error, Stopping, and Removing — all of which map to BackendUnhealthy. The comment did not request a change, it asked whether the generic mapping was intentional. The author's response clarifies intent. This is still open as an unresolved conversation thread; @jhrozek has not yet replied to confirm whether the justification is acceptable.

If you want this to be considered resolved, a follow-up reply from @jhrozek (or dismissing the thread) would close it out.


Comment 3 (Nit) — Duplicated switch block in list.go and status.go

Status: Addressed ✅

The switch was extracted to workloadStatusIndicator() in cmd/thv/app/common.go:162–174. Both list.go:168 and status.go:102 now call the helper:

status := workloadStatusIndicator(c.Status)   // list.go
status := workloadStatusIndicator(workload.Status)  // status.go

Summary

Comment Commenter Status
SetWorkloadStatus never called for policy_stopped jhrozek ✅ Fixed
BackendUnhealthy vs dedicated BackendPolicyStopped jhrozek 💬 Author justified; awaiting reviewer acknowledgement
Duplicated switch block in list.go/status.go jhrozek ✅ Fixed (extracted to common.go)

The only item that is technically still "open" is the BackendUnhealthy mapping question — but it's a design discussion, not a required change. The thread needs @jhrozek to respond or be dismissed.

Copy link
Copy Markdown
Contributor

@jhrozek jhrozek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good. The status-setting and deduplication fixes address the review comments well.

@github-actions github-actions Bot added size/XS Extra small PR: < 100 lines changed and removed size/XS Extra small PR: < 100 lines changed labels Apr 20, 2026
@reyortiz3 reyortiz3 merged commit ee823d8 into main Apr 20, 2026
42 checks passed
@reyortiz3 reyortiz3 deleted the reyortiz3/policy-stopped-status branch April 20, 2026 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XS Extra small PR: < 100 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants