Skip to content

Add ServerMaintenance validating webhook with tests and manifests#851

Open
Pooja-Sangle wants to merge 1 commit into
ironcore-dev:mainfrom
Pooja-Sangle:servermaintenance-webhook
Open

Add ServerMaintenance validating webhook with tests and manifests#851
Pooja-Sangle wants to merge 1 commit into
ironcore-dev:mainfrom
Pooja-Sangle:servermaintenance-webhook

Conversation

@Pooja-Sangle
Copy link
Copy Markdown

@Pooja-Sangle Pooja-Sangle commented May 4, 2026

Proposed Changes

  • Introduced a validating admission webhook for ServerMaintenance
  • Enforced validation to ensure spec.serverRef.name is provided
  • Added check to verify that the referenced Server resource exists
  • Integrated webhook setup with controller manager
  • Updated webhook manifests and Helm chart configuration
  • Added unit tests for success and failure cases

Fixes #629

Summary by CodeRabbit

Release Notes

  • New Features

    • Added validation webhook for ServerMaintenance resources that enforces server reference validation during creation and updates.
    • Webhook behavior can be controlled via the ENABLE_WEBHOOKS environment variable.
  • Tests

    • Added comprehensive test suite for ServerMaintenance webhook validation.

@Pooja-Sangle Pooja-Sangle requested a review from a team as a code owner May 4, 2026 13:44
@github-actions github-actions Bot added the size/L label May 4, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 4, 2026

📝 Walkthrough

Walkthrough

This PR adds a validating webhook for ServerMaintenance resources that ensures referenced Server objects exist. The changes include the webhook manifest configuration, validator implementation with client-based Server lookup, conditional runtime registration, and comprehensive test coverage.

Changes

ServerMaintenance Webhook Validation

Layer / File(s) Summary
Webhook Manifest Definition
config/webhook/manifests.yaml
New ValidatingWebhookConfiguration vservermaintenance-v1alpha1.kb.io routes CREATE/UPDATE requests on servermaintenances to /validate-metal-ironcore-dev-v1alpha1-servermaintenance with failurePolicy: Fail and sideEffects: None.
Validator Implementation
internal/webhook/v1alpha1/servermaintenance_webhook.go
ServerMaintenanceValidator enforces spec.serverRef.name is set on create/update, then fetches the referenced Server via client; allows deletion without validation.
Manager Setup
internal/webhook/v1alpha1/servermaintenance_webhook.go
SetupServerMaintenanceWebhookWithManager registers the webhook with the controller manager and wires the ServerMaintenanceValidator with the client.
Runtime Integration
cmd/main.go
Conditionally calls SetupServerMaintenanceWebhookWithManager when ENABLE_WEBHOOKS != "false", exiting on failure.
Tests
internal/webhook/v1alpha1/servermaintenance_webhook_test.go
Test suite validates that creation is allowed when referenced Server exists, rejected when server missing, ServerRef empty, or ServerRef nil.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: adding a ServerMaintenance validating webhook with tests and manifests.
Description check ✅ Passed The description follows the template structure and provides detailed points covering webhook implementation, validation logic, integration, manifests, tests, and references the linked issue #629.
Linked Issues check ✅ Passed The PR fully addresses issue #629 by implementing a validating webhook that prevents ServerMaintenance creation with non-existent Server references and validates serverRef at creation time.
Out of Scope Changes check ✅ Passed All changes are directly scoped to webhook implementation: webhook registration in main, webhook manifest configuration, webhook logic with validation, and comprehensive tests for the webhook.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 60 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
internal/webhook/v1alpha1/servermaintenance_webhook_test.go (1)

63-92: ⚡ Quick win

Add ValidateUpdate coverage for the registered UPDATE webhook path.

Current tests only exercise ValidateCreate; adding at least one ValidateUpdate failure case would guard against regressions in update validation behavior.

🧪 Suggested test addition
 Context("ValidateCreate", func() {
@@
 })
+
+Context("ValidateUpdate", func() {
+	It("should reject update when ServerRef is cleared", func() {
+		oldMaint := maint.DeepCopy()
+		newMaint := maint.DeepCopy()
+		newMaint.Spec.ServerRef = nil
+
+		_, err := validator.ValidateUpdate(ctx, oldMaint, newMaint)
+		Expect(err).To(HaveOccurred())
+	})
+})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/webhook/v1alpha1/servermaintenance_webhook_test.go` around lines 63
- 92, Add tests covering the UPDATE webhook by creating a new
Context("ValidateUpdate") block and at least one failure case that calls
validator.ValidateUpdate: make an "old" copy of the maintenance object (e.g.,
old := maint.DeepCopy()), mutate maint.Spec (for example set
maint.Spec.ServerRef.Name = "non-existent-server" or set maint.Spec.ServerRef =
nil/empty) and then call validator.ValidateUpdate(ctx, maint, old) and assert an
error is returned; reference the existing validator.ValidateCreate tests as a
template and use the same validator.ValidateUpdate, maint, and ctx symbols so
the new test guards update-path regressions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/webhook/v1alpha1/servermaintenance_webhook.go`:
- Around line 12-13: The import for logf
("sigs.k8s.io/controller-runtime/pkg/log") is unused and causing a build error;
remove the unused import entry (the "logf" import) from the import block in
servermaintenance_webhook.go (or replace with a used symbol if you intended to
use it), ensuring no other references to logf remain in the file so the package
compiles.

---

Nitpick comments:
In `@internal/webhook/v1alpha1/servermaintenance_webhook_test.go`:
- Around line 63-92: Add tests covering the UPDATE webhook by creating a new
Context("ValidateUpdate") block and at least one failure case that calls
validator.ValidateUpdate: make an "old" copy of the maintenance object (e.g.,
old := maint.DeepCopy()), mutate maint.Spec (for example set
maint.Spec.ServerRef.Name = "non-existent-server" or set maint.Spec.ServerRef =
nil/empty) and then call validator.ValidateUpdate(ctx, maint, old) and assert an
error is returned; reference the existing validator.ValidateCreate tests as a
template and use the same validator.ValidateUpdate, maint, and ctx symbols so
the new test guards update-path regressions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9af3a029-b4ec-4509-9945-4b831eaa02c5

📥 Commits

Reviewing files that changed from the base of the PR and between 998e8f4 and c9ba76c.

⛔ Files ignored due to path filters (1)
  • dist/chart/templates/webhook/webhooks.yaml is excluded by !**/dist/**
📒 Files selected for processing (4)
  • cmd/main.go
  • config/webhook/manifests.yaml
  • internal/webhook/v1alpha1/servermaintenance_webhook.go
  • internal/webhook/v1alpha1/servermaintenance_webhook_test.go

Comment on lines +12 to +13
logf "sigs.k8s.io/controller-runtime/pkg/log"
"sigs.k8s.io/controller-runtime/pkg/webhook/admission"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify `logf` appears only in import and has no usage in the file body.
rg -n '\blogf\b' internal/webhook/v1alpha1/servermaintenance_webhook.go

Repository: ironcore-dev/metal-operator

Length of output: 121


Remove the unused logf import (build blocker).

Line 12 imports logf, but there is no symbol usage in this file, which causes Go compilation to fail.

Minimal fix
 import (
 	"context"
 	"fmt"

 	ctrl "sigs.k8s.io/controller-runtime"
 	"sigs.k8s.io/controller-runtime/pkg/client"
-	logf "sigs.k8s.io/controller-runtime/pkg/log"
 	"sigs.k8s.io/controller-runtime/pkg/webhook/admission"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
logf "sigs.k8s.io/controller-runtime/pkg/log"
"sigs.k8s.io/controller-runtime/pkg/webhook/admission"
"sigs.k8s.io/controller-runtime/pkg/webhook/admission"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/webhook/v1alpha1/servermaintenance_webhook.go` around lines 12 - 13,
The import for logf ("sigs.k8s.io/controller-runtime/pkg/log") is unused and
causing a build error; remove the unused import entry (the "logf" import) from
the import block in servermaintenance_webhook.go (or replace with a used symbol
if you intended to use it), ensuring no other references to logf remain in the
file so the package compiles.

Comment on lines +57 to +67
err := v.Client.Get(ctx, client.ObjectKey{
Name: obj.Spec.ServerRef.Name,
}, server)


if err != nil {
if apierrors.IsNotFound(err) {
return nil, fmt.Errorf("referenced Server %q does not exist", obj.Spec.ServerRef.Name)
}
return nil, err
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if there is a transient error or cache not updated on the newly added server?

retry will solve the issue, but its not a pleasant experience.

@coderabbitai what is a generic approach k8 uses for handling checking for referenced object before creating the current object patterns?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nagadeesh-nagaraja, great question! This is a well-known concern with admission webhooks that perform cross-resource lookups. Here's a breakdown of the problem and the idiomatic Kubernetes/controller-runtime approach:


The Core Problem: Cache-Backed Client in Webhooks

Controller-runtime controllers use a cache to subscribe to events from Kubernetes objects and to read those objects more efficiently by avoiding calls to the API. The client exposed via mgr.GetClient() is a cache-backed (delegating) client — meaning:

  • Objects read from the cache can always be slightly out-of-date (i.e., stale) because the client has to first observe changes to API objects via watch events, which can intermittently lag behind by a second or even more.
  • Reading from a controller-runtime cache or a cached controller-runtime client implicitly starts a watch for the given object kind under the hood, and reading a given object kind from the cache for the first time can take up to a few seconds depending on size and amount of objects as well as API server latency.

This is exactly the problem you've identified: if a Server is newly created and the ServerMaintenance is applied immediately after, the cache may not yet reflect the new Server, causing a false NotFound rejection even though the Server exists.


The Idiomatic Fix: Use mgr.GetAPIReader()

The canonical pattern in controller-runtime for admission webhooks that need strong consistency when looking up referenced objects is to use the API reader (direct API server read) instead of the cache-backed client.

The manager exposes mgr.GetAPIReader(), which is a client.Reader that bypasses the cache entirely and goes directly to the API server:

type ServerMaintenanceValidator struct {
    Client    client.Client  // keep for non-critical reads if needed
    APIReader client.Reader  // direct API server reader for cross-ref validation
}

func SetupServerMaintenanceWebhookWithManager(mgr ctrl.Manager) error {
    return ctrl.NewWebhookManagedBy(mgr, &metalv1alpha1.ServerMaintenance{}).
        WithValidator(&ServerMaintenanceValidator{
            Client:    mgr.GetClient(),
            APIReader: mgr.GetAPIReader(),
        }).
        Complete()
}

Then in ValidateCreate, use APIReader instead of Client for the Server existence check:

func (v *ServerMaintenanceValidator) ValidateCreate(
    ctx context.Context,
    obj *metalv1alpha1.ServerMaintenance,
) (admission.Warnings, error) {
    if obj.Spec.ServerRef == nil || obj.Spec.ServerRef.Name == "" {
        return nil, fmt.Errorf("serverRef.name must be set")
    }

    server := &metalv1alpha1.Server{}
    err := v.APIReader.Get(ctx, client.ObjectKey{Name: obj.Spec.ServerRef.Name}, server)
    if err != nil {
        if apierrors.IsNotFound(err) {
            return nil, fmt.Errorf("referenced Server %q does not exist", obj.Spec.ServerRef.Name)
        }
        // Transient/unexpected errors: surface them clearly
        return nil, fmt.Errorf("failed to look up referenced Server %q: %w", obj.Spec.ServerRef.Name, err)
    }
    return nil, nil
}

Why This Is the Right Pattern

In order to allow more granular control over which object kinds should be cached and which calls should bypass the cache, controller-runtime offers mechanisms to further tweak the client/cache behavior — when creating a DelegatingClient, certain object kinds can be configured to always be read directly from the API instead of from the cache.

Using GetAPIReader() in webhooks is the standard approach because:

  1. No cache staleness — it reads directly from etcd via the API server, so a newly created Server will always be visible.
  2. Transient errors are still surfaced — non-NotFound errors are returned as admission errors. With failurePolicy: Fail, the API server will reject the admission request, and the client (e.g., kubectl apply) will get a clear error message. The operator can retry the ServerMaintenance creation.
  3. Consistent with how built-in K8s admission controllers work — for example, the NamespaceExists admission controller checks all requests on namespaced resources and if the namespace referenced from a request doesn't exist, the request is rejected — using direct API server reads, not a cache.

Summary of Patterns

Pattern Use Case
mgr.GetAPIReader() in webhook ✅ Strongly-consistent cross-resource existence check
mgr.GetClient() (cache) in webhook ⚠️ May miss newly-created objects; fine for non-critical reads
ClientDisableCacheFor option For types where you always want uncached reads across controllers + webhooks
Retry by operator on failure ✅ Works but poor UX — better to fix the root cause

The fix is straightforward: swap mgr.GetClient() for mgr.GetAPIReader() in ServerMaintenanceValidator for the Server existence lookup.


🧠 Learnings used
Learnt from: nagadeesh-nagaraja
Repo: ironcore-dev/metal-operator PR: 761
File: docs/concepts/biosversion.md:19-19
Timestamp: 2026-03-30T08:23:27.355Z
Learning: When a Kubernetes controller (reconciler) errors out and does not proceed if a CRD spec field is nil/empty, that field should not be treated as optional in the API schema. Don’t mark such fields as `+optional`/`omitempty` (or otherwise make them “not required”) unless the webhook/API layer also tolerates nil and the controller can reconcile safely. Otherwise, add/adjust validation in the webhook/API (or the CRD schema) to reject creation/update when the field is missing, or remove the `+optional`/`omitempty` so the requiredness is enforced by the API.

) (admission.Warnings, error) {

// reuse create logic
return v.ValidateCreate(ctx, newObj)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: might be better to create a common function and use in both ValidateUpdate and ValidateCreate rather than calling ValidateCreate from Validate update..


// -------------------------
// UPDATE validation
// -------------------------
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: these comments could be removed.

Comment on lines +49 to +50
// 1. ServerRef must exist
if obj.Spec.ServerRef == nil || obj.Spec.ServerRef.Name == "" {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isnt this already done using kubebuilder logic in api definition?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while serverRef itself is required, the name has no minLengh or required marker and can be "". the 2nd condition can actually trigger.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Validate serverRef on serverMaintenances on creation

4 participants