Skip to content

Add BMC certificate management#836

Open
stefanhipfel wants to merge 4 commits into
mainfrom
worktree-bmc-certs
Open

Add BMC certificate management#836
stefanhipfel wants to merge 4 commits into
mainfrom
worktree-bmc-certs

Conversation

@stefanhipfel
Copy link
Copy Markdown
Contributor

@stefanhipfel stefanhipfel commented Apr 24, 2026

Implements automatic TLS certificate management for BMCs using Redfish CertificateService API and Kubernetes CSR API.

Overview

This PR adds automatic certificate management for BMCs with the following features:

  • CSR generation on BMC via Redfish CertificateService.GenerateCSR
  • Integration with Kubernetes CSR API for certificate signing
  • Automatic certificate installation on BMC
  • Certificate expiration monitoring and renewal
  • Operator-level configuration model (simplified)

Key Design Decisions

Operator-Level Configuration

All certificate configuration is set at the operator level via controller flags:

  • --certificate-management-enabled: Enable by default
  • --certificate-signer-name: Signer for CSRs
  • --certificate-approval-mode: Auto or External approval
  • --certificate-renewal-threshold: When to renew
  • --certificate-subject-*: Certificate subject fields

BMCs can only enable/disable certificate management via spec.certificateManagementPolicy: Automatic|Manual. This ensures consistent certificate policy across all BMCs.

Security

  • Private keys never leave BMC hardware (generated by BMC)
  • CSR validation (public key verification, subject validation)
  • Certificate validation (time bounds, SANs, key usage)
  • Race condition prevention using Patch pattern
  • Owner references for automatic cleanup

Certificate Workflow

  1. Controller calls BMC Redfish API to generate CSR
  2. Controller creates Kubernetes CertificateSigningRequest
  3. External signer (cert-manager, admin, custom) approves and signs
  4. Controller installs signed certificate on BMC
  5. Controller monitors expiration and triggers renewal

Testing

  • Unit tests with 63.9% coverage
  • Mock Redfish server for CSR generation
  • Certificate validation tests
  • Integration with envtest

Documentation

  • Sample BMC YAML with certificate configuration
  • Operator flag documentation in code
  • Complete design document: OPERATOR_LEVEL_CERTIFICATE_CONFIG.md

Fixes #218

Summary by CodeRabbit

  • New Features
    • Per-BMC HTTPS certificate lifecycle management with configurable Manual or Automatic policies
    • Automatic CSR generation, certificate installation, and renewal with status tracking
    • Real-time certificate observability including issuer, subject, validity dates, and SHA-256 thumbprint
    • Operator-level certificate management defaults configurable via CLI flags

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 24, 2026

Warning

Rate limit exceeded

@stefanhipfel has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 43 minutes and 56 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a3ac7cd9-fa79-4622-869a-19c84f74dc0e

📥 Commits

Reviewing files that changed from the base of the PR and between 4a43188 and f0b0c1c.

⛔ Files ignored due to path filters (2)
  • dist/chart/templates/crd/metal.ironcore.dev_bmcs.yaml is excluded by !**/dist/**
  • dist/chart/templates/rbac/role.yaml is excluded by !**/dist/**
📒 Files selected for processing (12)
  • api/v1alpha1/bmc_types.go
  • api/v1alpha1/zz_generated.deepcopy.go
  • bmc/bmc.go
  • bmc/mock/server/server.go
  • bmc/redfish.go
  • cmd/main.go
  • config/crd/bases/metal.ironcore.dev_bmcs.yaml
  • config/rbac/role.yaml
  • config/samples/metal_v1alpha1_bmc_with_certificate.yaml
  • internal/controller/bmc_controller.go
  • internal/controller/bmc_controller_test.go
  • internal/controller/suite_test.go
📝 Walkthrough

Walkthrough

This PR introduces per-BMC automatic HTTPS certificate management via Redfish. It adds a new optional spec.certificateManagementPolicy field to enable/disable certificate management per BMC, extends the BMC API with certificate lifecycle methods, implements Redfish certificate operations (CSR generation, installation, listing, deletion), and adds controller reconciliation logic for CSR creation, approval, certificate installation, and renewal.

Changes

Cohort / File(s) Summary
API Type Extensions
api/v1alpha1/bmc_types.go, api/v1alpha1/zz_generated.deepcopy.go
Introduces certificate management policy enum (Manual/Automatic), new types (CertificateApprovalPolicy, CertificateSubject, CertificateInfo), and extends BMCSpec and BMCStatus with certificate-related fields. Auto-generated deepcopy methods for new types.
BMC Interface & Redfish Implementation
bmc/bmc.go, bmc/redfish.go
Adds five new BMC interface methods: GetCertificateService, GenerateCSR, InstallCertificate, GetCertificates, DeleteCertificate. Redfish implementation constructs manager-specific URIs, handles CSR validation, certificate posting/retrieval, and deletion with error tolerance.
Mock Server
bmc/mock/server/server.go
Extends mock server with in-memory certificate storage (CertificateData struct, certificates map, csrCounter). Routes certificate endpoints for GET/POST/DELETE operations, validates request payloads, generates certificate metadata, and returns appropriate HTTP status codes.
Controller & Tests
internal/controller/bmc_controller.go, internal/controller/bmc_controller_test.go
Implements complete certificate lifecycle reconciliation: default policy application, CSR generation and K8s CertificateSigningRequest creation, optional auto-approval, certificate retrieval/installation, validation (CN/SAN/EKU/public key), renewal triggering via NotAfter comparison, status updates, and condition reporting. Adds integration tests verifying automatic/manual policy behavior and certificate info persistence.
Configuration & RBAC
cmd/main.go, config/crd/bases/metal.ironcore.dev_bmcs.yaml, config/rbac/role.yaml, config/samples/metal_v1alpha1_bmc_with_certificate.yaml
CLI flags for default certificate management (signer name, approval mode, renewal threshold, subject fields). CRD schema defines spec.certificateManagementPolicy enum and status certificate observability fields. RBAC rules for certificatesigningrequests (create/delete/read/watch/approve). Sample manifests demonstrate automatic, manual, and default-inherited certificate policies.
License Headers
third_party/expansion/expand.go, third_party/expansion/expand_test.go
Adds SPDX copyright and Apache-2.0 license identifiers; no functional changes.

Sequence Diagram(s)

sequenceDiagram
    participant BMC as BMC Controller
    participant K8sAPI as Kubernetes API
    participant CSRResource as CSR Resource
    participant RedfishAPI as Redfish BMC
    
    BMC->>RedfishAPI: GetCertificateService()
    RedfishAPI-->>BMC: Certificate Service Info
    
    BMC->>RedfishAPI: GenerateCSR(CSRRequest)
    RedfishAPI-->>BMC: CSRResponse (CSR string)
    
    BMC->>K8sAPI: Create CertificateSigningRequest
    K8sAPI-->>CSRResource: CSR Created
    
    Note over BMC,K8sAPI: Optional auto-approval
    alt Auto-Approval Enabled
        BMC->>K8sAPI: Approve CSR
        K8sAPI-->>CSRResource: CSR Approved
    end
    
    BMC->>K8sAPI: Wait for CSR Signed
    K8sAPI-->>CSRResource: CSR Status: Signed
    
    CSRResource-->>BMC: Certificate (signed)
    
    BMC->>RedfishAPI: InstallCertificate(cert, type)
    RedfishAPI-->>BMC: Success
    
    BMC->>RedfishAPI: GetCertificates()
    RedfishAPI-->>BMC: []CertificateInfo
    
    BMC->>K8sAPI: Create/Update TLS Secret
    BMC->>K8sAPI: Update BMC Status (CertificateInfo)
    K8sAPI-->>BMC: Status Updated
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Suggested labels

enhancement, api-change, documentation, size/XXL, area/metal-automation

Suggested reviewers

  • afritzler
  • nagadeesh-nagaraja
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Add BMC certificate management' directly and clearly summarizes the main change: introducing certificate management capabilities for BMCs across the codebase.
Description check ✅ Passed The PR description provides a comprehensive overview including features, design decisions, security considerations, workflow, testing, and documentation. It references the linked issue (#218) but lacks explicit 'Proposed Changes' sections matching the template structure.
Linked Issues check ✅ Passed The PR fully addresses issue #218 objectives: implements Redfish CertificateService API integration for CSR generation, integrates with Kubernetes CSR API for certificate signing, supports certificate installation and renewal workflows, and establishes an operator-driven certificate management approach.
Out of Scope Changes check ✅ Passed All changes are directly related to certificate management: new API types, controller logic, mock server implementation, RBAC rules, configuration options, and documentation. Changes include ancillary updates (SPDX headers) that support the primary objective.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch worktree-bmc-certs

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🧹 Nitpick comments (7)
api/v1alpha1/bmc_types.go (2)

286-288: Prefer *v1.LocalObjectReference over *string for CertificateSigningRequestRef.

CertificateSigningRequest is cluster-scoped, so a name suffices, but using *v1.LocalObjectReference is idiomatic in this codebase and matches CertificateSecretRef. It also leaves room for future structured fields without a breaking API change.

♻️ Proposed change
 	// CertificateSigningRequestRef references the current CertificateSigningRequest.
 	// +optional
-	CertificateSigningRequestRef *string `json:"certificateSigningRequestRef,omitempty"`
+	CertificateSigningRequestRef *v1.LocalObjectReference `json:"certificateSigningRequestRef,omitempty"`

Remember to run make manifests and make generate after changing *_types.go.

As per coding guidelines: "Run make manifests and make generate after editing *_types.go files or kubebuilder markers".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/v1alpha1/bmc_types.go` around lines 286 - 288, Replace the
CertificateSigningRequestRef field's type from *string to
*v1.LocalObjectReference (to match CertificateSecretRef and idiomatic usage),
update the import to include "k8s.io/api/core/v1" if not present, adjust any
kubebuilder markers or comments if necessary to reflect the LocalObjectReference
type, and then run make manifests and make generate to update generated code and
manifests.

89-123: Consider relocating operator-only types out of the CRD API package.

CertificateApprovalPolicy and CertificateSubject are not referenced by any BMCSpec/BMCStatus field — per the PR design they're operator-level configuration consumed only by cmd/main.go and the reconciler. Placing them in api/v1alpha1 suggests they're part of the CRD surface and may be mistaken for such. Consider moving them under internal/controller (or a dedicated config package) so the CRD API package only contains types backing the CRD schema.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@api/v1alpha1/bmc_types.go` around lines 89 - 123, CertificateApprovalPolicy
and CertificateSubject are operator-only types currently declared in
api/v1alpha1 but not used in any BMCSpec/BMCStatus; move these declarations out
of the CRD API package into an operator-only package (e.g., internal/controller
or a new internal/config package), update all references/imports in cmd/main.go
and the reconciler to the new package path, and ensure package docs and go mod
imports are adjusted so the CRD API package only contains CRD-backed types.
bmc/mock/server/server.go (1)

767-789: Remove the no-op _ = csrID; the variable is actually used on line 789.

The comment claims it marks csrID as used for "potential future logging", but it's already passed to s.log.Info(..., "id", csrID, ...) a few lines below. The assignment is dead and misleading.

♻️ Proposed cleanup
 	csrString := `-----BEGIN CERTIFICATE REQUEST-----
 ...
 -----END CERTIFICATE REQUEST-----`
 
-	_ = csrID // Mark as used for potential future logging
-
 	certCollectionURI := "/redfish/v1/Managers/1/NetworkProtocol/HTTPS/Certificates"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bmc/mock/server/server.go` around lines 767 - 789, Remove the dead no-op
assignment `_ = csrID` in the CSR generation block; csrID is already used in the
logging call `s.log.Info(..., "id", csrID, ...)` so simply delete the `_ =
csrID` line to avoid misleading code and keep `csrID` intact for `s.log.Info`
and any future uses.
config/samples/metal_v1alpha1_bmc_with_certificate.yaml (1)

1-75: Include BMCSecret samples so the manifest is self-applicable.

The three BMC samples reference bmc-credentials, bmc-credentials-2, bmc-credentials-3, but no corresponding BMCSecret manifests are defined in this file. Applying this sample as-is will leave dangling bmcSecretRefs. Consider adding matching BMCSecret stubs (even with placeholder values) so the file is a complete, apply-able example.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@config/samples/metal_v1alpha1_bmc_with_certificate.yaml` around lines 1 - 75,
The sample BMC manifests reference BMCSecret names (bmc-credentials,
bmc-credentials-2, bmc-credentials-3) but do not include their definitions,
making the file non-applicable; add three corresponding BMCSecret stub resources
named exactly bmc-credentials, bmc-credentials-2, and bmc-credentials-3 (with
placeholder secretData/fields required by the BMCSecret CRD) so the examples for
BMCs bmc-with-certificate, bmc-manual-cert, and bmc-default are self-contained
and can be applied directly.
cmd/main.go (1)

466-478: Make approval-mode flag parsing case-insensitive and validate empty input.

The switch matches only lowercase "auto"/"external", but users may reasonably pass Auto/External (matching the API enum CertificateApprovalPolicyAuto/External), which currently causes a fatal exit. Also, if the flag is explicitly set to "", the default branch silently leaves certApprovalMode as the zero value. Lowercasing the input before matching and treating empty-with-enabled-management as an error will make this more robust.

♻️ Proposed fix
-	var certApprovalMode metalv1alpha1.CertificateApprovalPolicy
-	switch certificateApprovalMode {
-	case "auto":
-		certApprovalMode = metalv1alpha1.CertificateApprovalPolicyAuto
-	case "external":
-		certApprovalMode = metalv1alpha1.CertificateApprovalPolicyExternal
-	default:
-		if certificateApprovalMode != "" {
-			setupLog.Error(fmt.Errorf("invalid certificate-approval-mode: %s", certificateApprovalMode),
-				"Valid values are 'auto' or 'external'")
-			os.Exit(1)
-		}
-	}
+	var certApprovalMode metalv1alpha1.CertificateApprovalPolicy
+	switch strings.ToLower(certificateApprovalMode) {
+	case "auto":
+		certApprovalMode = metalv1alpha1.CertificateApprovalPolicyAuto
+	case "external":
+		certApprovalMode = metalv1alpha1.CertificateApprovalPolicyExternal
+	default:
+		setupLog.Error(fmt.Errorf("invalid certificate-approval-mode: %q", certificateApprovalMode),
+			"Valid values are 'auto' or 'external'")
+		os.Exit(1)
+	}

Don't forget to add "strings" to the imports.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cmd/main.go` around lines 466 - 478, Make parsing case-insensitive and reject
an explicit empty flag by first adding "strings" to imports, then validate
certificateApprovalMode != "" (log via setupLog.Error and os.Exit(1) if it is)
and perform the switch on strings.ToLower(certificateApprovalMode) so you match
metalv1alpha1.CertificateApprovalPolicyAuto and
metalv1alpha1.CertificateApprovalPolicyExternal regardless of case; keep the
existing error path in the default case using setupLog.Error with the invalid
value.
internal/controller/bmc_controller.go (2)

830-834: Minor: use ptr.To for the CSR-name pointer.

Taking &k8sCSR.Name works but ties the pointer's lifetime to the local CSR object and is inconsistent with ptr.To(...) used elsewhere in the reconciler (e.g., Line 685, 774). A small consistency fix:

-	bmcObj.Status.CertificateSigningRequestRef = &k8sCSR.Name
+	bmcObj.Status.CertificateSigningRequestRef = ptr.To(k8sCSR.Name)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/bmc_controller.go` around lines 830 - 834, The status
update currently takes the address of the local CSR name (&k8sCSR.Name) which
ties the pointer lifetime to the local k8sCSR variable; change this to use
ptr.To(k8sCSR.Name) when setting bmcObj.Status.CertificateSigningRequestRef
(keep the surrounding Patch logic that uses bmcBase and r.Status().Patch(ctx,
bmcObj, client.MergeFrom(bmcBase))); this matches the reconciler's existing
usage (see other occurrences around lines where ptr.To(...) is used) and ensures
a stable heap-allocated pointer.

683-687: Avoid mutating bmcObj.Spec to apply defaults.

applyCertificateDefaults mutates the in-memory BMC spec. This works today because the reconcile only patches Status, but it's a footgun: any future r.Update / spec Patch (or logging that includes the spec) would silently persist the operator-level default onto every BMC object. Prefer computing the effective policy locally without touching the spec.

♻️ Proposed shape
-func (r *BMCReconciler) applyCertificateDefaults(bmcObj *metalv1alpha1.BMC) {
-	if bmcObj.Spec.CertificateManagementPolicy == nil && r.DefaultCertificateManagementEnabled {
-		bmcObj.Spec.CertificateManagementPolicy = ptr.To(metalv1alpha1.CertificateManagementPolicyAutomatic)
-	}
-}
+func (r *BMCReconciler) effectiveCertificatePolicy(bmcObj *metalv1alpha1.BMC) metalv1alpha1.CertificateManagementPolicy {
+	if bmcObj.Spec.CertificateManagementPolicy != nil {
+		return *bmcObj.Spec.CertificateManagementPolicy
+	}
+	if r.DefaultCertificateManagementEnabled {
+		return metalv1alpha1.CertificateManagementPolicyAutomatic
+	}
+	return metalv1alpha1.CertificateManagementPolicyManual
+}

And update the gate in reconcileCertificate/checkCertificateExpiration accordingly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/bmc_controller.go` around lines 683 - 687, The method
applyCertificateDefaults currently mutates bmcObj.Spec by writing the operator
default, which can accidentally persist operator-level defaults into the
resource; instead, remove that mutation and compute an
"effectiveCertificatePolicy" locally (e.g., a small helper that returns
metalv1alpha1.CertificateManagementPolicyAutomatic when
bmcObj.Spec.CertificateManagementPolicy is nil and
r.DefaultCertificateManagementEnabled is true, otherwise returns the spec
value). Update callers such as reconcileCertificate and
checkCertificateExpiration to call this helper and use the returned effective
policy for gating logic rather than reading or mutating bmcObj.Spec, and delete
the applyCertificateDefaults mutation or make it a pure computed function.
Ensure references to BMCReconciler.DefaultCertificateManagementEnabled are used
only to compute the effective policy.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@api/v1alpha1/bmc_types.go`:
- Around line 290-292: CertificateSecretRef on the cluster-scoped BMC is
namespace-ambiguous; replace the v1.LocalObjectReference pointer used in the BMC
struct (CertificateSecretRef) with a namespaced reference type (e.g.,
v1.ObjectReference/corev1.ObjectReference) so the field includes Namespace and
Name, or alternatively add explicit godoc stating the Secret is in the
controller-manager/known namespace; update any code that reads/writes
status.certificateSecretRef and tests to handle the new {namespace,name} shape
(or the documented fixed namespace) accordingly.

In `@bmc/bmc.go`:
- Around line 319-324: CertificateTypeHTTPS and CertificateTypeCA are both set
to the same underlying value "PEM", causing code that checks cert.Type (e.g.,
updateCertificateInfo's cert.Type == bmc.CertificateTypeHTTPS) to be unable to
distinguish HTTPS vs CA certificates; fix by giving these purpose constants
distinct values (e.g., "HTTPS" and "CA") or by separating purpose vs Redfish
encoding (introduce separate enums/constants for CertificatePurpose vs
CertificateEncoding and update usages such as CertificateTypeHTTPS,
CertificateTypeCA and any code reading Redfish's CertificateType to reference
the encoding constants instead).

In `@bmc/redfish.go`:
- Around line 1164-1167: The check that calls r.client.GetService().Managers()
should not wrap a nil error when managers is empty; update the logic in the
function containing the managers call and the similar sites in
InstallCertificate and GetCertificates so you return a wrapped error only when
err != nil (e.g., fmt.Errorf("failed to get managers: %w", err)), and return a
plain descriptive error when err == nil but len(managers) == 0 (e.g.,
fmt.Errorf("failed to get managers: no managers found")). Alternatively, extract
a helper like getManagersOrError() used by the functions to encapsulate this:
call GetService().Managers(), if err != nil return wrapped err, if len==0 return
a clear non-wrapped error, otherwise return managers.
- Line 1171: The certCollectionURI is being built with manager.UUID which yields
an invalid Redfish ManagerId; change all uses that build the certificates
collection URI (the variable certCollectionURI and the other occurrences
currently using manager.UUID) to derive the ManagerId from the manager's ODataID
(e.g., use manager.ODataID + "/NetworkProtocol/HTTPS/Certificates" or extract
the trailing resource id from manager.ODataID and format
"/redfish/v1/Managers/{id}/NetworkProtocol/HTTPS/Certificates") so the URI
matches other Redfish calls that use manager.ODataID.

In `@internal/controller/bmc_controller_test.go`:
- Around line 787-790: The Consistently assertion on Object(manualBMC) is using
Gomega's default ~100ms window which is too short to prove "no CSR was created";
change the call to include an explicit duration (e.g.,
Consistently(Object(manualBMC), 3*time.Second) or 5*time.Second) so the test
spans multiple reconcile cycles, keep the same expectation
HaveField("Status.CertificateSigningRequestRef", BeNil()), and add the time
import if not present.

In `@internal/controller/bmc_controller.go`:
- Around line 1036-1074: checkCertificateExpiration currently duplicates renewal
actions from reconcileCertificate/needsCertificateRenewal and performs a
Status().Patch to clear Status.CertificateSecretRef which can conflict with
other reconcile flows (e.g., reconcileCertificate, initiateCertificateRequest);
change checkCertificateExpiration to only set the CertificateExpiring condition
via updateConditions (use bmcCertificateExpiringConditionType,
bmcCertificateExpiringSoonReason and bmcCertificateValidReason) and remove the
branch that deep-copies bmcObj and patches Status.CertificateSecretRef (i.e.,
eliminate any code that clears Status.CertificateSecretRef or calls
r.Status().Patch in checkCertificateExpiration) so that
reconcileCertificate/needsCertificateRenewal remains the single source of truth
for initiating renewals.
- Around line 1076-1084: The approveCSR signer allow-list is hardcoded to
DefaultCertificateSignerName while CSRs use the configurable
r.DefaultCertificateSignerName; update approveCSR to build allowedSigners using
r.DefaultCertificateSignerName (with DefaultCertificateSignerName as a fallback)
so the reconciler honors the flag, and also update the kubebuilder RBAC marker
that references the signer name to reflect the configurable value or include
both names so RBAC matches the runtime signer configuration (refer to
approveCSR, r.DefaultCertificateSignerName, DefaultCertificateSignerName, and
CertificateApprovalPolicyAuto).
- Around line 759-776: The ObjectMeta name construction for k8sCSR is unsafe:
slicing bmcObj.UID with bmcObj.UID[:8] can panic and casting to string is
redundant, and concatenating full bmcObj.Name + prefix + UID can exceed the
253-char metadata.name limit. Fix by deriving a safe uid string (uidStr :=
string(bmcObj.UID)), then produce a short, stable suffix (either safe-truncate:
suffix := uidStr[:min(8,len(uidStr))] or better: compute a short hex hash of
uidStr and take first N chars), and ensure the final name is truncated or use a
truncated bmcObj.Name (e.g., limit namePart := bmcObj.Name[:maxNameLen]) so
fmt.Sprintf("bmc-%s-%s", namePart, suffix) always stays <=253 chars; update the
CertificateSigningRequest ObjectMeta.Name construction in the k8sCSR creation
accordingly.
- Around line 986-1034: The current flow calls bmcClient.InstallCertificate
before persisting status, so a partial failure later can cause re-install on
retry; fix by persisting the fact that the certificate is considered installed
before talking to the BMC: set bmcObj.Status.CertificateSecretRef =
&corev1.LocalObjectReference{Name: fmt.Sprintf("bmc-%s-cert", bmcObj.Name)} (and
clear CertificateSigningRequestRef) and call r.Status().Patch(ctx, bmcObj,
client.MergeFrom(bmcBase)) to persist that state, then proceed to call
bmcClient.InstallCertificate(...); if InstallCertificate fails, revert the
status patch or set a failure condition via r.updateConditions; alternatively
implement a fingerprint check using r.updateCertificateInfo /
bmcClient.GetCertificates to compare fingerprints and skip InstallCertificate
when the cert is already present.
- Around line 116-120: approveCSR() currently only allows the hardcoded
DefaultCertificateSignerName, causing auto-approval to reject CSRs when a custom
signer is configured; update the allowlist check inside approveCSR to include
r.DefaultCertificateSignerName (in addition to or instead of the constant
DefaultCertificateSignerName) so the configured signer is accepted, and adjust
the RBAC kubebuilder marker that references
resourceNames=metal.ironcore.dev/bmc-https to either permit wildcard signers
(e.g., resourceNames=metal.ironcore.dev/*) or document that
certificate-signer-name cannot be customized for auto-approval mode.
- Around line 1076-1100: In approveCSR, the code currently calls
r.Status().Update(ctx, csr) which targets the /status subresource and will be
rejected for approving CSRs; change the update to use the approval subresource
by calling r.SubResource("approval").Update(ctx, csr) after appending the
CertificateApproved condition to csr.Status.Conditions (keep the same condition
fields and metav1.Now() timestamps), so the CSR approval is applied via the
intended /approval subresource; ensure the RBAC update marker already present
for certificatesigningrequests/approval remains valid.

---

Nitpick comments:
In `@api/v1alpha1/bmc_types.go`:
- Around line 286-288: Replace the CertificateSigningRequestRef field's type
from *string to *v1.LocalObjectReference (to match CertificateSecretRef and
idiomatic usage), update the import to include "k8s.io/api/core/v1" if not
present, adjust any kubebuilder markers or comments if necessary to reflect the
LocalObjectReference type, and then run make manifests and make generate to
update generated code and manifests.
- Around line 89-123: CertificateApprovalPolicy and CertificateSubject are
operator-only types currently declared in api/v1alpha1 but not used in any
BMCSpec/BMCStatus; move these declarations out of the CRD API package into an
operator-only package (e.g., internal/controller or a new internal/config
package), update all references/imports in cmd/main.go and the reconciler to the
new package path, and ensure package docs and go mod imports are adjusted so the
CRD API package only contains CRD-backed types.

In `@bmc/mock/server/server.go`:
- Around line 767-789: Remove the dead no-op assignment `_ = csrID` in the CSR
generation block; csrID is already used in the logging call `s.log.Info(...,
"id", csrID, ...)` so simply delete the `_ = csrID` line to avoid misleading
code and keep `csrID` intact for `s.log.Info` and any future uses.

In `@cmd/main.go`:
- Around line 466-478: Make parsing case-insensitive and reject an explicit
empty flag by first adding "strings" to imports, then validate
certificateApprovalMode != "" (log via setupLog.Error and os.Exit(1) if it is)
and perform the switch on strings.ToLower(certificateApprovalMode) so you match
metalv1alpha1.CertificateApprovalPolicyAuto and
metalv1alpha1.CertificateApprovalPolicyExternal regardless of case; keep the
existing error path in the default case using setupLog.Error with the invalid
value.

In `@config/samples/metal_v1alpha1_bmc_with_certificate.yaml`:
- Around line 1-75: The sample BMC manifests reference BMCSecret names
(bmc-credentials, bmc-credentials-2, bmc-credentials-3) but do not include their
definitions, making the file non-applicable; add three corresponding BMCSecret
stub resources named exactly bmc-credentials, bmc-credentials-2, and
bmc-credentials-3 (with placeholder secretData/fields required by the BMCSecret
CRD) so the examples for BMCs bmc-with-certificate, bmc-manual-cert, and
bmc-default are self-contained and can be applied directly.

In `@internal/controller/bmc_controller.go`:
- Around line 830-834: The status update currently takes the address of the
local CSR name (&k8sCSR.Name) which ties the pointer lifetime to the local
k8sCSR variable; change this to use ptr.To(k8sCSR.Name) when setting
bmcObj.Status.CertificateSigningRequestRef (keep the surrounding Patch logic
that uses bmcBase and r.Status().Patch(ctx, bmcObj, client.MergeFrom(bmcBase)));
this matches the reconciler's existing usage (see other occurrences around lines
where ptr.To(...) is used) and ensures a stable heap-allocated pointer.
- Around line 683-687: The method applyCertificateDefaults currently mutates
bmcObj.Spec by writing the operator default, which can accidentally persist
operator-level defaults into the resource; instead, remove that mutation and
compute an "effectiveCertificatePolicy" locally (e.g., a small helper that
returns metalv1alpha1.CertificateManagementPolicyAutomatic when
bmcObj.Spec.CertificateManagementPolicy is nil and
r.DefaultCertificateManagementEnabled is true, otherwise returns the spec
value). Update callers such as reconcileCertificate and
checkCertificateExpiration to call this helper and use the returned effective
policy for gating logic rather than reading or mutating bmcObj.Spec, and delete
the applyCertificateDefaults mutation or make it a pure computed function.
Ensure references to BMCReconciler.DefaultCertificateManagementEnabled are used
only to compute the effective policy.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: bf950bbd-1aaf-4796-8293-293cf57333ee

📥 Commits

Reviewing files that changed from the base of the PR and between 33a0eb0 and 4a43188.

📒 Files selected for processing (13)
  • api/v1alpha1/bmc_types.go
  • api/v1alpha1/zz_generated.deepcopy.go
  • bmc/bmc.go
  • bmc/mock/server/server.go
  • bmc/redfish.go
  • cmd/main.go
  • config/crd/bases/metal.ironcore.dev_bmcs.yaml
  • config/rbac/role.yaml
  • config/samples/metal_v1alpha1_bmc_with_certificate.yaml
  • internal/controller/bmc_controller.go
  • internal/controller/bmc_controller_test.go
  • third_party/expansion/expand.go
  • third_party/expansion/expand_test.go

Comment thread api/v1alpha1/bmc_types.go
Comment thread bmc/bmc.go
Comment on lines +319 to +324
const (
// CertificateTypeHTTPS is the HTTPS/TLS certificate type
CertificateTypeHTTPS CertificateType = "PEM"
// CertificateTypeCA is the CA certificate type
CertificateTypeCA CertificateType = "PEM"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Both certificate type constants share the same value "PEM" — filtering by type is broken.

CertificateTypeHTTPS and CertificateTypeCA are defined with the identical underlying value "PEM", so any consumer that distinguishes them (e.g. the controller's cert.Type == bmc.CertificateTypeHTTPS check in updateCertificateInfo) will treat CA and HTTPS certs as indistinguishable. Note also that Redfish's CertificateType property enumerates encoding formats (PEM, PEMchain, PKCS7) rather than purpose — if these constants are meant to express certificate purpose, they should be modeled separately from the Redfish wire value (and likely derived from the certificate's collection URI).

🐛 Minimal fix to differentiate the two values
 const (
 	// CertificateTypeHTTPS is the HTTPS/TLS certificate type
 	CertificateTypeHTTPS CertificateType = "PEM"
 	// CertificateTypeCA is the CA certificate type
-	CertificateTypeCA CertificateType = "PEM"
+	CertificateTypeCA CertificateType = "CA"
 )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const (
// CertificateTypeHTTPS is the HTTPS/TLS certificate type
CertificateTypeHTTPS CertificateType = "PEM"
// CertificateTypeCA is the CA certificate type
CertificateTypeCA CertificateType = "PEM"
)
const (
// CertificateTypeHTTPS is the HTTPS/TLS certificate type
CertificateTypeHTTPS CertificateType = "PEM"
// CertificateTypeCA is the CA certificate type
CertificateTypeCA CertificateType = "CA"
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bmc/bmc.go` around lines 319 - 324, CertificateTypeHTTPS and
CertificateTypeCA are both set to the same underlying value "PEM", causing code
that checks cert.Type (e.g., updateCertificateInfo's cert.Type ==
bmc.CertificateTypeHTTPS) to be unable to distinguish HTTPS vs CA certificates;
fix by giving these purpose constants distinct values (e.g., "HTTPS" and "CA")
or by separating purpose vs Redfish encoding (introduce separate enums/constants
for CertificatePurpose vs CertificateEncoding and update usages such as
CertificateTypeHTTPS, CertificateTypeCA and any code reading Redfish's
CertificateType to reference the encoding constants instead).

Comment thread bmc/redfish.go
Comment thread bmc/redfish.go Outdated
Comment on lines +787 to +790
By("Verifying that no CSR was created")
Consistently(Object(manualBMC)).Should(
HaveField("Status.CertificateSigningRequestRef", BeNil()),
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Consistently without an explicit duration is too short to prove "no CSR is created".

Gomega's default Consistently window is ~100ms, which can elapse before the controller runs even one reconcile cycle. Negative-assertion tests ("controller does not do X") should use a duration long enough to guarantee at least a few reconcile iterations (e.g., a few seconds).

🐛 Proposed fix
 		By("Verifying that no CSR was created")
-		Consistently(Object(manualBMC)).Should(
+		Consistently(Object(manualBMC), "5s", "200ms").Should(
 			HaveField("Status.CertificateSigningRequestRef", BeNil()),
 		)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
By("Verifying that no CSR was created")
Consistently(Object(manualBMC)).Should(
HaveField("Status.CertificateSigningRequestRef", BeNil()),
)
By("Verifying that no CSR was created")
Consistently(Object(manualBMC), "5s", "200ms").Should(
HaveField("Status.CertificateSigningRequestRef", BeNil()),
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/controller/bmc_controller_test.go` around lines 787 - 790, The
Consistently assertion on Object(manualBMC) is using Gomega's default ~100ms
window which is too short to prove "no CSR was created"; change the call to
include an explicit duration (e.g., Consistently(Object(manualBMC),
3*time.Second) or 5*time.Second) so the test spans multiple reconcile cycles,
keep the same expectation HaveField("Status.CertificateSigningRequestRef",
BeNil()), and add the time import if not present.

Comment thread internal/controller/bmc_controller.go
Comment thread internal/controller/bmc_controller.go Outdated
Comment thread internal/controller/bmc_controller.go Outdated
Comment thread internal/controller/bmc_controller.go
Comment thread internal/controller/bmc_controller.go
Copy link
Copy Markdown
Member

@maxmoehl maxmoehl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase on the latest changes.

Comment thread bmc/redfish.go
if len(managers) == 0 {
return fmt.Errorf("no managers found")
}
manager := managers[0]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work for all BMCs, see #825. It would probably make sense to wait for that PR to land and use the DiscoverManager function it introduces.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maxmoehl I agree.
However, I dont think this function should not care about filtering the BMC which supports certificate or not. As it might get complicated to do those filtering.

This method should take the manager UUID or manager obj itself and then apply to it.

the BMC which dont support the CRD should disable the management through its Spec

Comment thread bmc/redfish.go

log.Info("Installing certificate on BMC", "uri", certCollectionURI)

resp, err := r.client.Post(certCollectionURI, payload)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you not using the gofish function to install / replace the certificate? (same comment applies to other raw usages of client)

Comment thread bmc/redfish.go
if len(managers) == 0 {
return nil, fmt.Errorf("no managers found")
}
manager := managers[0]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before, see #825

Comment thread bmc/redfish.go
if len(managers) == 0 {
return fmt.Errorf("no managers found")
}
manager := managers[0]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maxmoehl I agree.
However, I dont think this function should not care about filtering the BMC which supports certificate or not. As it might get complicated to do those filtering.

This method should take the manager UUID or manager obj itself and then apply to it.

the BMC which dont support the CRD should disable the management through its Spec

Comment thread bmc/mock/server/server.go
Comment on lines +1138 to +1144
// Create a mock CSR string (PEM-encoded)
csrString := `-----BEGIN CERTIFICATE REQUEST-----
MIICvDCCAaQCAQAwdzELMAkGA1UEBhMCVVMxEzARBgNVBAgMCkNhbGlmb3JuaWEx
FjAUBgNVBAcMDVNhbiBGcmFuY2lzY28xFDASBgNVBAoMC0V4YW1wbGUgQ29ycDEX
MBUGA1UECwwOSW5mcmFzdHJ1Y3R1cmUwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAw
ggEKAoIBAQC5J3Q==
-----END CERTIFICATE REQUEST-----`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep it in a file? under data folder?

Comment thread bmc/mock/server/server.go
Comment on lines +1216 to +1222
response := map[string]any{
"@odata.type": "#CertificateCollection.CertificateCollection",
"@odata.id": "/redfish/v1/Managers/1/NetworkProtocol/HTTPS/Certificates",
"Name": "Certificate Collection",
"Members": members,
"Members@odata.count": len(members),
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be we keep this as a index.json in the data folder and read from it to follow same pattern as all the other mock datas..

Comment thread bmc/mock/server/server.go Outdated
Comment on lines +234 to +237
case strings.Contains(urlPath, "CertificateService/Actions/CertificateService.GenerateCSR"):
s.handleGenerateCSR(w, r, body)
case strings.Contains(urlPath, "/HTTPS/Certificates") && !strings.Contains(urlPath, "CertificateService"):
s.handleInstallCertificate(w, r, body)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might need a rebase :( i suspect conflict due to recent changes in mock server.

Comment thread bmc/mock/server/server.go
Comment on lines +1241 to +1258
response := map[string]any{
"@odata.type": "#Certificate.v1_0_0.Certificate",
"@odata.id": r.URL.Path,
"Id": cert.ID,
"Name": "Certificate " + cert.ID,
"CertificateString": cert.CertificateString,
"CertificateType": cert.CertificateType,
"Issuer": map[string]string{
"CommonName": cert.Issuer,
},
"Subject": map[string]string{
"CommonName": cert.Subject,
},
"ValidNotBefore": cert.ValidNotBefore,
"ValidNotAfter": cert.ValidNotAfter,
"SerialNumber": cert.SerialNumber,
"Fingerprint": cert.Fingerprint,
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would try to keep the base template in the data folder.. and merge that with the certificate we are returning..
just to follow the same pattern everywhere.

Implements automatic TLS certificate management for BMCs using:
- Redfish CertificateService API for CSR generation
- Kubernetes CSR API for certificate signing
- Operator-level configuration (simplified model)
- Automatic certificate renewal

All certificate configuration (signer, approval policy, renewal
threshold, subject fields) is set at operator level via controller
flags. BMCs can only enable/disable certificate management.

Fixes #218
- Fix CSR approval to use /approval subresource instead of /status
- Fix certificate URI construction to use ODataID instead of UUID
- Fix certificate type detection to use URI path instead of Type field
- Document CertificateSecretRef namespace in API comments
- Fix CSR name length to respect 253 char Kubernetes limit
- Fix certificate install race by persisting status before BMC call
- Fix signer allowlist to include configured signer name
- Simplify checkCertificateExpiration (remove unused error return)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Manage certificates on BMCs via Redfish / ACME DNS-01

4 participants