Add ServerMaintenance validating webhook with tests and manifests#851
Add ServerMaintenance validating webhook with tests and manifests#851Pooja-Sangle wants to merge 1 commit into
Conversation
📝 WalkthroughWalkthroughThis PR adds a validating webhook for ChangesServerMaintenance Webhook Validation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 60 minutes.Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
internal/webhook/v1alpha1/servermaintenance_webhook_test.go (1)
63-92: ⚡ Quick winAdd
ValidateUpdatecoverage for the registered UPDATE webhook path.Current tests only exercise
ValidateCreate; adding at least oneValidateUpdatefailure case would guard against regressions in update validation behavior.🧪 Suggested test addition
Context("ValidateCreate", func() { @@ }) + +Context("ValidateUpdate", func() { + It("should reject update when ServerRef is cleared", func() { + oldMaint := maint.DeepCopy() + newMaint := maint.DeepCopy() + newMaint.Spec.ServerRef = nil + + _, err := validator.ValidateUpdate(ctx, oldMaint, newMaint) + Expect(err).To(HaveOccurred()) + }) +})🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/webhook/v1alpha1/servermaintenance_webhook_test.go` around lines 63 - 92, Add tests covering the UPDATE webhook by creating a new Context("ValidateUpdate") block and at least one failure case that calls validator.ValidateUpdate: make an "old" copy of the maintenance object (e.g., old := maint.DeepCopy()), mutate maint.Spec (for example set maint.Spec.ServerRef.Name = "non-existent-server" or set maint.Spec.ServerRef = nil/empty) and then call validator.ValidateUpdate(ctx, maint, old) and assert an error is returned; reference the existing validator.ValidateCreate tests as a template and use the same validator.ValidateUpdate, maint, and ctx symbols so the new test guards update-path regressions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/webhook/v1alpha1/servermaintenance_webhook.go`:
- Around line 12-13: The import for logf
("sigs.k8s.io/controller-runtime/pkg/log") is unused and causing a build error;
remove the unused import entry (the "logf" import) from the import block in
servermaintenance_webhook.go (or replace with a used symbol if you intended to
use it), ensuring no other references to logf remain in the file so the package
compiles.
---
Nitpick comments:
In `@internal/webhook/v1alpha1/servermaintenance_webhook_test.go`:
- Around line 63-92: Add tests covering the UPDATE webhook by creating a new
Context("ValidateUpdate") block and at least one failure case that calls
validator.ValidateUpdate: make an "old" copy of the maintenance object (e.g.,
old := maint.DeepCopy()), mutate maint.Spec (for example set
maint.Spec.ServerRef.Name = "non-existent-server" or set maint.Spec.ServerRef =
nil/empty) and then call validator.ValidateUpdate(ctx, maint, old) and assert an
error is returned; reference the existing validator.ValidateCreate tests as a
template and use the same validator.ValidateUpdate, maint, and ctx symbols so
the new test guards update-path regressions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 9af3a029-b4ec-4509-9945-4b831eaa02c5
⛔ Files ignored due to path filters (1)
dist/chart/templates/webhook/webhooks.yamlis excluded by!**/dist/**
📒 Files selected for processing (4)
cmd/main.goconfig/webhook/manifests.yamlinternal/webhook/v1alpha1/servermaintenance_webhook.gointernal/webhook/v1alpha1/servermaintenance_webhook_test.go
| logf "sigs.k8s.io/controller-runtime/pkg/log" | ||
| "sigs.k8s.io/controller-runtime/pkg/webhook/admission" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify `logf` appears only in import and has no usage in the file body.
rg -n '\blogf\b' internal/webhook/v1alpha1/servermaintenance_webhook.goRepository: ironcore-dev/metal-operator
Length of output: 121
Remove the unused logf import (build blocker).
Line 12 imports logf, but there is no symbol usage in this file, which causes Go compilation to fail.
Minimal fix
import (
"context"
"fmt"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
- logf "sigs.k8s.io/controller-runtime/pkg/log"
"sigs.k8s.io/controller-runtime/pkg/webhook/admission"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| logf "sigs.k8s.io/controller-runtime/pkg/log" | |
| "sigs.k8s.io/controller-runtime/pkg/webhook/admission" | |
| "sigs.k8s.io/controller-runtime/pkg/webhook/admission" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/webhook/v1alpha1/servermaintenance_webhook.go` around lines 12 - 13,
The import for logf ("sigs.k8s.io/controller-runtime/pkg/log") is unused and
causing a build error; remove the unused import entry (the "logf" import) from
the import block in servermaintenance_webhook.go (or replace with a used symbol
if you intended to use it), ensuring no other references to logf remain in the
file so the package compiles.
| err := v.Client.Get(ctx, client.ObjectKey{ | ||
| Name: obj.Spec.ServerRef.Name, | ||
| }, server) | ||
|
|
||
|
|
||
| if err != nil { | ||
| if apierrors.IsNotFound(err) { | ||
| return nil, fmt.Errorf("referenced Server %q does not exist", obj.Spec.ServerRef.Name) | ||
| } | ||
| return nil, err | ||
| } |
There was a problem hiding this comment.
What if there is a transient error or cache not updated on the newly added server?
retry will solve the issue, but its not a pleasant experience.
@coderabbitai what is a generic approach k8 uses for handling checking for referenced object before creating the current object patterns?
There was a problem hiding this comment.
@nagadeesh-nagaraja, great question! This is a well-known concern with admission webhooks that perform cross-resource lookups. Here's a breakdown of the problem and the idiomatic Kubernetes/controller-runtime approach:
The Core Problem: Cache-Backed Client in Webhooks
Controller-runtime controllers use a cache to subscribe to events from Kubernetes objects and to read those objects more efficiently by avoiding calls to the API. The client exposed via mgr.GetClient() is a cache-backed (delegating) client — meaning:
- Objects read from the cache can always be slightly out-of-date (i.e., stale) because the client has to first observe changes to API objects via watch events, which can intermittently lag behind by a second or even more.
- Reading from a controller-runtime cache or a cached controller-runtime client implicitly starts a watch for the given object kind under the hood, and reading a given object kind from the cache for the first time can take up to a few seconds depending on size and amount of objects as well as API server latency.
This is exactly the problem you've identified: if a Server is newly created and the ServerMaintenance is applied immediately after, the cache may not yet reflect the new Server, causing a false NotFound rejection even though the Server exists.
The Idiomatic Fix: Use mgr.GetAPIReader()
The canonical pattern in controller-runtime for admission webhooks that need strong consistency when looking up referenced objects is to use the API reader (direct API server read) instead of the cache-backed client.
The manager exposes mgr.GetAPIReader(), which is a client.Reader that bypasses the cache entirely and goes directly to the API server:
type ServerMaintenanceValidator struct {
Client client.Client // keep for non-critical reads if needed
APIReader client.Reader // direct API server reader for cross-ref validation
}
func SetupServerMaintenanceWebhookWithManager(mgr ctrl.Manager) error {
return ctrl.NewWebhookManagedBy(mgr, &metalv1alpha1.ServerMaintenance{}).
WithValidator(&ServerMaintenanceValidator{
Client: mgr.GetClient(),
APIReader: mgr.GetAPIReader(),
}).
Complete()
}Then in ValidateCreate, use APIReader instead of Client for the Server existence check:
func (v *ServerMaintenanceValidator) ValidateCreate(
ctx context.Context,
obj *metalv1alpha1.ServerMaintenance,
) (admission.Warnings, error) {
if obj.Spec.ServerRef == nil || obj.Spec.ServerRef.Name == "" {
return nil, fmt.Errorf("serverRef.name must be set")
}
server := &metalv1alpha1.Server{}
err := v.APIReader.Get(ctx, client.ObjectKey{Name: obj.Spec.ServerRef.Name}, server)
if err != nil {
if apierrors.IsNotFound(err) {
return nil, fmt.Errorf("referenced Server %q does not exist", obj.Spec.ServerRef.Name)
}
// Transient/unexpected errors: surface them clearly
return nil, fmt.Errorf("failed to look up referenced Server %q: %w", obj.Spec.ServerRef.Name, err)
}
return nil, nil
}Why This Is the Right Pattern
In order to allow more granular control over which object kinds should be cached and which calls should bypass the cache, controller-runtime offers mechanisms to further tweak the client/cache behavior — when creating a DelegatingClient, certain object kinds can be configured to always be read directly from the API instead of from the cache.
Using GetAPIReader() in webhooks is the standard approach because:
- No cache staleness — it reads directly from etcd via the API server, so a newly created
Serverwill always be visible. - Transient errors are still surfaced — non-
NotFounderrors are returned as admission errors. WithfailurePolicy: Fail, the API server will reject the admission request, and the client (e.g.,kubectl apply) will get a clear error message. The operator can retry theServerMaintenancecreation. - Consistent with how built-in K8s admission controllers work — for example, the
NamespaceExistsadmission controller checks all requests on namespaced resources and if the namespace referenced from a request doesn't exist, the request is rejected — using direct API server reads, not a cache.
Summary of Patterns
| Pattern | Use Case |
|---|---|
mgr.GetAPIReader() in webhook |
✅ Strongly-consistent cross-resource existence check |
mgr.GetClient() (cache) in webhook |
|
ClientDisableCacheFor option |
For types where you always want uncached reads across controllers + webhooks |
| Retry by operator on failure | ✅ Works but poor UX — better to fix the root cause |
The fix is straightforward: swap mgr.GetClient() for mgr.GetAPIReader() in ServerMaintenanceValidator for the Server existence lookup.
🧠 Learnings used
Learnt from: nagadeesh-nagaraja
Repo: ironcore-dev/metal-operator PR: 761
File: docs/concepts/biosversion.md:19-19
Timestamp: 2026-03-30T08:23:27.355Z
Learning: When a Kubernetes controller (reconciler) errors out and does not proceed if a CRD spec field is nil/empty, that field should not be treated as optional in the API schema. Don’t mark such fields as `+optional`/`omitempty` (or otherwise make them “not required”) unless the webhook/API layer also tolerates nil and the controller can reconcile safely. Otherwise, add/adjust validation in the webhook/API (or the CRD schema) to reject creation/update when the field is missing, or remove the `+optional`/`omitempty` so the requiredness is enforced by the API.
| ) (admission.Warnings, error) { | ||
|
|
||
| // reuse create logic | ||
| return v.ValidateCreate(ctx, newObj) |
There was a problem hiding this comment.
nitpick: might be better to create a common function and use in both ValidateUpdate and ValidateCreate rather than calling ValidateCreate from Validate update..
|
|
||
| // ------------------------- | ||
| // UPDATE validation | ||
| // ------------------------- |
There was a problem hiding this comment.
nit: these comments could be removed.
| // 1. ServerRef must exist | ||
| if obj.Spec.ServerRef == nil || obj.Spec.ServerRef.Name == "" { |
There was a problem hiding this comment.
isnt this already done using kubebuilder logic in api definition?
There was a problem hiding this comment.
while serverRef itself is required, the name has no minLengh or required marker and can be "". the 2nd condition can actually trigger.
Proposed Changes
Fixes #629
Summary by CodeRabbit
Release Notes
New Features
ENABLE_WEBHOOKSenvironment variable.Tests