Skip to content

Cleanup stale metrics and event subscriptions#889

Open
asergeant01 wants to merge 1 commit into
mainfrom
feat/improve-metric-report-subscription-handling
Open

Cleanup stale metrics and event subscriptions#889
asergeant01 wants to merge 1 commit into
mainfrom
feat/improve-metric-report-subscription-handling

Conversation

@asergeant01
Copy link
Copy Markdown
Contributor

@asergeant01 asergeant01 commented May 13, 2026

Fixes #886

Summary by CodeRabbit

  • New Features

    • Automatic detection and recreation of externally deleted Redfish event and metrics subscriptions; controller now validates links before creating new ones.
  • Bug Fixes

    • Stale subscription links are cleaned from controller status when the remote subscription no longer exists.
  • Tests

    • Added a test that simulates external deletion and verifies subscriptions are recreated while unaffected links remain stable.

Review Change Stack

@asergeant01 asergeant01 requested a review from a team as a code owner May 13, 2026 12:46
@asergeant01 asergeant01 force-pushed the feat/improve-metric-report-subscription-handling branch from 41bec4e to 1892ef5 Compare May 13, 2026 12:46
@xkonni xkonni changed the title handle stale metrics and event subscritpions handle stale metrics and event subscriptions May 13, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 13, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d9be516d-5313-43ff-aa9d-dec8cbbbb7a7

📥 Commits

Reviewing files that changed from the base of the PR and between 5c3f881 and bfda8c6.

📒 Files selected for processing (5)
  • bmc/bmc.go
  • bmc/mock/server/server.go
  • bmc/redfish.go
  • internal/controller/bmc_controller.go
  • internal/controller/bmc_controller_test.go
🚧 Files skipped from review as they are similar to previous changes (4)
  • internal/controller/bmc_controller.go
  • bmc/bmc.go
  • bmc/redfish.go
  • bmc/mock/server/server.go

📝 Walkthrough

Walkthrough

Adds BMC.GetEventSubscription and a Redfish implementation that treats HTTP 404 as “not present”; controller now verifies stored metrics/events subscription links, clears stale status links, and recreates missing subscriptions; mock-server helper and integration test simulate external deletion and verify reconciliation.

Changes

Event Subscription Staleness Detection

Layer / File(s) Summary
Event Subscription Query Contract
bmc/bmc.go , bmc/redfish.go
Adds GetEventSubscription(ctx, uri) (bool, error) to the BMC interface and imports net/http for status-code checks.
Redfish Event Subscription Query Implementation
bmc/redfish.go
RedfishBaseBMC.GetEventSubscription fetches the subscription and returns (false, nil) for Redfish 404 to indicate the subscription no longer exists.
BMC Controller Stale Link Validation
internal/controller/bmc_controller.go
handleEventSubscriptions pre-checks Status.MetricsReportSubscriptionLink and Status.EventsSubscriptionLink with GetEventSubscription; clears stale links via status patch so the controller can recreate missing subscriptions.
Mock Server Subscription Deletion Helper
bmc/mock/server/server.go
MockServer.DeleteSubscription removes a subscription override and updates the parent collection to simulate external deletion on the BMC.
Stale Subscription Link Reconciliation Test
internal/controller/bmc_controller_test.go
Integration test deletes the metrics subscription on the mock server, forces reconcile, asserts the metrics link is recreated with a new value while the events link is unchanged, and performs cleanup.

Sequence Diagram

sequenceDiagram
  participant Controller as BMCReconciler
  participant BMCClient as RedfishBaseBMC
  participant Redfish as RedfishEventService
  participant Mock as MockServer

  Controller->>BMCClient: GetEventSubscription(ctx, metricsURI)
  BMCClient->>Redfish: ev.GetEventSubscription(metricsURI)
  Redfish-->>BMCClient: 404 (not found) / subscription object
  BMCClient-->>Controller: (false,nil) or (true,nil)
  alt subscription not found
    Controller->>Controller: clear status link (status patch)
    Controller->>BMCClient: create subscription
    BMCClient->>Redfish: CreateEventSubscription(...)
    Redfish-->>BMCClient: new subscription URI
    BMCClient-->>Controller: new URI
  else subscription exists
    Controller-->>Controller: keep status link
  end

  Note over Mock,Redfish: MockServer.DeleteSubscription simulates external deletion by removing the member from the subscription collection
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • ironcore-dev/metal-operator#494: Introduced subscription collection mutation logic and initial create/delete subscription handling that this PR builds on.

Suggested labels

api-change

Suggested reviewers

  • afritzler
  • stefanhipfel
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Cleanup stale metrics and event subscriptions' directly reflects the main change: validating existing subscription links and recreating them when deleted externally, addressing issue #886.
Description check ✅ Passed The PR description is minimal but follows the repository template with the issue reference; however, it lacks details about changes, rationale, or implementation.
Linked Issues check ✅ Passed The code changes comprehensively address issue #886 by adding subscription existence validation, recreating stale subscriptions, and including test coverage.
Out of Scope Changes check ✅ Passed All changes are directly related to #886: adding GetEventSubscription interface/implementation, proactive validation in the controller, and test coverage for the scenario.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/improve-metric-report-subscription-handling

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@asergeant01 asergeant01 added this to the v0.6.0 milestone May 13, 2026
Copy link
Copy Markdown
Contributor

@xkonni xkonni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good, fixes issue, minor optimizations possible.

Comment thread internal/controller/bmc_controller.go Outdated
Comment thread internal/controller/bmc_controller.go Outdated
@asergeant01 asergeant01 force-pushed the feat/improve-metric-report-subscription-handling branch from 1892ef5 to 5c3f881 Compare May 13, 2026 13:25
@asergeant01
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 13, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@bmc/mock/server/server.go`:
- Around line 1313-1321: The code can silently overwrite an existing collection
with an empty one when type assertions fail or dataFS.ReadFile/json.Unmarshal
fail; update the loading logic around the variables col, cached, Collection,
dataFS.ReadFile, json.Unmarshal and collectionKey so that: (1) when hasOverride
is true but cached is not a Collection, do NOT replace col (leave original
cached value or bail); (2) when ReadFile returns an error or json.Unmarshal
fails, do not overwrite col with the empty default—propagate/handle the error or
keep the prior value; and (3) before the persistence step that writes col (the
save at the persistence call around collection handling), add a guard to only
persist when col is valid (e.g., col.ID is non-empty or len(col.Members) > 0);
if invalid, skip saving and return/log an error so you cannot accidentally wipe
members.

In `@internal/controller/bmc_controller.go`:
- Around line 567-580: Update the error string in the patch failure to reference
the BMC status rather than "server status": when handling
bmcObj.Status.MetricsReportSubscriptionLink and calling r.Status().Patch(ctx,
bmcObj, client.MergeFrom(bmcBase)), change the fmt.Errorf message used on error
return to something like "failed to patch BMC status to clear stale metrics
subscription link: %w" so it correctly references bmcObj (BMC) and its status.
- Around line 595-608: The error message used when the status patch fails
incorrectly references "server status"; update the fmt.Errorf call in the block
handling bmcObj.Status.EventsSubscriptionLink (around GetEventSubscription and
r.Status().Patch) to say "failed to patch BMC status to clear stale events
subscription link" so it correctly names the BMC status and use the existing
variables (bmcObj, bmcBase) and error wrapping as-is.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 11823367-8cf1-4fb0-bc23-9df7749fba1a

📥 Commits

Reviewing files that changed from the base of the PR and between bf8e075 and 5c3f881.

📒 Files selected for processing (5)
  • bmc/bmc.go
  • bmc/mock/server/server.go
  • bmc/redfish.go
  • internal/controller/bmc_controller.go
  • internal/controller/bmc_controller_test.go

Comment thread bmc/mock/server/server.go
Comment thread internal/controller/bmc_controller.go
Comment thread internal/controller/bmc_controller.go
Signed-off-by: Alan Sergeant <alan.sergeant@sap.com>
@asergeant01 asergeant01 force-pushed the feat/improve-metric-report-subscription-handling branch from 5c3f881 to bfda8c6 Compare May 13, 2026 15:13
@asergeant01 asergeant01 self-assigned this May 14, 2026
@afritzler afritzler changed the title handle stale metrics and event subscriptions Cleanup stale metrics and event subscriptions May 19, 2026
@afritzler afritzler added the enhancement New feature or request label May 19, 2026
@afritzler
Copy link
Copy Markdown
Member

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 19, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Comment on lines +576 to +578
if err := r.Status().Patch(ctx, bmcObj, client.MergeFrom(bmcBase)); err != nil {
return false, fmt.Errorf("failed to patch BMC status to clear stale metrics subscription link: %w", err)
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just in memory clear the link but not patch the resource as we are doing later anyway. This will reduce the API calls.

Comment on lines +604 to +606
if err := r.Status().Patch(ctx, bmcObj, client.MergeFrom(bmcBase)); err != nil {
return false, fmt.Errorf("failed to patch BMC status to clear stale events subscription link: %w", err)
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just in memory clear the link but not patch the resource as we are doing later anyway. This will reduce the API calls.

Comment on lines +569 to +571
if err != nil {
return false, fmt.Errorf("failed to check metrics report subscription for BMC %s (%s): %w", bmcObj.Name, bmcObj.Status.IP, err)
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about logging this error e.g. in case we run into a transient error? Otherwise the subscription logic will block the BMC reconciliation.

Comment on lines +596 to +599
exists, err := bmcClient.GetEventSubscription(ctx, bmcObj.Status.EventsSubscriptionLink)
if err != nil {
return false, fmt.Errorf("failed to check events subscription for BMC %s (%s): %w", bmcObj.Name, bmcObj.Status.IP, err)
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about logging this error e.g. in case we run into a transient error? Otherwise the subscription logic will block the BMC reconciliation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

BMC event/metrics subscriptions not recreated after external deletion

4 participants