Skip to content

[LFXV2-1701] fix: downgrade health check poll logs from error to warn#58

Merged
andrest50 merged 1 commit into
mainfrom
atobon/LFXV2-1701-reduce-health-check-log-noise
May 7, 2026
Merged

[LFXV2-1701] fix: downgrade health check poll logs from error to warn#58
andrest50 merged 1 commit into
mainfrom
atobon/LFXV2-1701-reduce-health-check-log-noise

Conversation

@andrest50
Copy link
Copy Markdown
Contributor

@andrest50 andrest50 commented May 7, 2026

Summary

  • Downgrade NATS health check log messages from Error to Warn in messaging_repository.go — these fire on every K8s probe poll (~10s) during a degraded state, flooding logs with repeated errors
  • Rename and downgrade OpenSearch health check log messages from Error to Warn in storage_repository.go, adding "OpenSearch" context to the message so the failing dependency is unambiguous
  • One-time lifecycle events (e.g. NATS connection permanently closed - max reconnects exhausted) remain at Error since they fire once and identify the root cause

Ticket

LFXV2-1701

🤖 Generated with Claude Code

Health check failures logged at ERROR on every K8s probe poll (~10s)
flooded logs during degraded state. Downgraded to WARN and renamed the
OpenSearch log messages to include the dependency name for clarity.
One-time lifecycle events (e.g. NATS permanently closed) remain ERROR.

Generated with [Claude Code](https://claude.ai/code)

Signed-off-by: Andres Tobon <andrest2455@gmail.com>
Copilot AI review requested due to automatic review settings May 7, 2026 19:05
@andrest50 andrest50 requested a review from a team as a code owner May 7, 2026 19:05
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces noisy error-level logging from periodic dependency health checks (NATS and OpenSearch) so degraded states don’t flood logs during frequent Kubernetes probe polling, while preserving error logs for one-time lifecycle events.

Changes:

  • Downgrade NATS health check failure logs from Error to Warn.
  • Rename/downgrade OpenSearch health check logs to Warn and add explicit “OpenSearch” context in the messages.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
internal/infrastructure/storage/storage_repository.go Downgrades OpenSearch health check failure logs to Warn and clarifies messages with “OpenSearch” context.
internal/infrastructure/messaging/messaging_repository.go Downgrades NATS health check failure logs to Warn to reduce repeated error spam during outages.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 7, 2026

Review Change Stack

Walkthrough

Two infrastructure health-check methods were modified: MessagingRepository.HealthCheck and StorageRepository.HealthCheck. Both now log warnings instead of errors for transient failures while maintaining identical error return behavior. The messaging implementation also refactors error message formatting to use the health-check error constant.

Changes

Health-Check Error Handling

Layer / File(s) Summary
Health-Check Error Logging
internal/infrastructure/messaging/messaging_repository.go, internal/infrastructure/storage/storage_repository.go
Messaging and storage health-check failures log at warning level instead of error level. Messaging also refactors error formatting to use fmt.Errorf("%s: ...", constants.ErrHealthCheck, ...). Error return values remain unchanged.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately captures the main change: downgrading health check poll logs from error to warning level.
Description check ✅ Passed The description clearly explains the rationale for downgrading logs, provides specific details about which components are affected, and distinguishes between recurring health checks and one-time lifecycle events.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch atobon/LFXV2-1701-reduce-health-check-log-noise

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
internal/infrastructure/messaging/messaging_repository.go (1)

463-481: 💤 Low value

LGTM — log-level downgrade and error format alignment look correct.

All three failure paths consistently emit Warn instead of Error, and the returned errors now uniformly use constants.ErrHealthCheck as a prefix, matching the pattern already established in storage_repository.go. No logic changes were made.

One pre-existing observation not introduced here: the check at Line 476 (r.conn.Status() != nats.CONNECTED) is logically equivalent to !r.conn.IsConnected() in the NATS client (both read the same internal status field). The only scenario where Line 476 fires but Line 470 did not is a concurrent state change between the two reads (TOCTOU window). Consider collapsing the two checks into one for clarity if the race defence is not intentional.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/infrastructure/messaging/messaging_repository.go` around lines 463 -
481, The health-check contains two equivalent status checks
(r.conn.IsConnected() and r.conn.Status() != nats.CONNECTED) which can race and
are redundant; consolidate them into a single check using r.conn.IsConnected()
(or r.conn.Status() directly) and keep the diagnostic r.logger.Debug call and
the Warn + fmt.Errorf(constants.ErrHealthCheck...) behavior, removing the
duplicate branch that checks r.conn.Status() != nats.CONNECTED and preserving
its more detailed warning message (status and expected) in the remaining check
so the logic is clear and TOCTOU windows are avoided; reference
r.conn.IsConnected(), r.conn.Status(), and nats.CONNECTED when making the
change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@internal/infrastructure/messaging/messaging_repository.go`:
- Around line 463-481: The health-check contains two equivalent status checks
(r.conn.IsConnected() and r.conn.Status() != nats.CONNECTED) which can race and
are redundant; consolidate them into a single check using r.conn.IsConnected()
(or r.conn.Status() directly) and keep the diagnostic r.logger.Debug call and
the Warn + fmt.Errorf(constants.ErrHealthCheck...) behavior, removing the
duplicate branch that checks r.conn.Status() != nats.CONNECTED and preserving
its more detailed warning message (status and expected) in the remaining check
so the logic is clear and TOCTOU windows are avoided; reference
r.conn.IsConnected(), r.conn.Status(), and nats.CONNECTED when making the
change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 35b73a2c-a38b-44de-b174-786c904af71e

📥 Commits

Reviewing files that changed from the base of the PR and between 13040f5 and 5e10a79.

📒 Files selected for processing (2)
  • internal/infrastructure/messaging/messaging_repository.go
  • internal/infrastructure/storage/storage_repository.go

@andrest50 andrest50 merged commit b24e82e into main May 7, 2026
13 checks passed
@andrest50 andrest50 deleted the atobon/LFXV2-1701-reduce-health-check-log-noise branch May 7, 2026 19:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants