[LFXV2-1701] fix: downgrade health check poll logs from error to warn#58
Conversation
Health check failures logged at ERROR on every K8s probe poll (~10s) flooded logs during degraded state. Downgraded to WARN and renamed the OpenSearch log messages to include the dependency name for clarity. One-time lifecycle events (e.g. NATS permanently closed) remain ERROR. Generated with [Claude Code](https://claude.ai/code) Signed-off-by: Andres Tobon <andrest2455@gmail.com>
There was a problem hiding this comment.
Pull request overview
This PR reduces noisy error-level logging from periodic dependency health checks (NATS and OpenSearch) so degraded states don’t flood logs during frequent Kubernetes probe polling, while preserving error logs for one-time lifecycle events.
Changes:
- Downgrade NATS health check failure logs from
ErrortoWarn. - Rename/downgrade OpenSearch health check logs to
Warnand add explicit “OpenSearch” context in the messages.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| internal/infrastructure/storage/storage_repository.go | Downgrades OpenSearch health check failure logs to Warn and clarifies messages with “OpenSearch” context. |
| internal/infrastructure/messaging/messaging_repository.go | Downgrades NATS health check failure logs to Warn to reduce repeated error spam during outages. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
WalkthroughTwo infrastructure health-check methods were modified: ChangesHealth-Check Error Handling
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
internal/infrastructure/messaging/messaging_repository.go (1)
463-481: 💤 Low valueLGTM — log-level downgrade and error format alignment look correct.
All three failure paths consistently emit
Warninstead ofError, and the returned errors now uniformly useconstants.ErrHealthCheckas a prefix, matching the pattern already established instorage_repository.go. No logic changes were made.One pre-existing observation not introduced here: the check at Line 476 (
r.conn.Status() != nats.CONNECTED) is logically equivalent to!r.conn.IsConnected()in the NATS client (both read the same internal status field). The only scenario where Line 476 fires but Line 470 did not is a concurrent state change between the two reads (TOCTOU window). Consider collapsing the two checks into one for clarity if the race defence is not intentional.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/infrastructure/messaging/messaging_repository.go` around lines 463 - 481, The health-check contains two equivalent status checks (r.conn.IsConnected() and r.conn.Status() != nats.CONNECTED) which can race and are redundant; consolidate them into a single check using r.conn.IsConnected() (or r.conn.Status() directly) and keep the diagnostic r.logger.Debug call and the Warn + fmt.Errorf(constants.ErrHealthCheck...) behavior, removing the duplicate branch that checks r.conn.Status() != nats.CONNECTED and preserving its more detailed warning message (status and expected) in the remaining check so the logic is clear and TOCTOU windows are avoided; reference r.conn.IsConnected(), r.conn.Status(), and nats.CONNECTED when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@internal/infrastructure/messaging/messaging_repository.go`:
- Around line 463-481: The health-check contains two equivalent status checks
(r.conn.IsConnected() and r.conn.Status() != nats.CONNECTED) which can race and
are redundant; consolidate them into a single check using r.conn.IsConnected()
(or r.conn.Status() directly) and keep the diagnostic r.logger.Debug call and
the Warn + fmt.Errorf(constants.ErrHealthCheck...) behavior, removing the
duplicate branch that checks r.conn.Status() != nats.CONNECTED and preserving
its more detailed warning message (status and expected) in the remaining check
so the logic is clear and TOCTOU windows are avoided; reference
r.conn.IsConnected(), r.conn.Status(), and nats.CONNECTED when making the
change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 35b73a2c-a38b-44de-b174-786c904af71e
📒 Files selected for processing (2)
internal/infrastructure/messaging/messaging_repository.gointernal/infrastructure/storage/storage_repository.go
Summary
ErrortoWarninmessaging_repository.go— these fire on every K8s probe poll (~10s) during a degraded state, flooding logs with repeated errorsErrortoWarninstorage_repository.go, adding "OpenSearch" context to the message so the failing dependency is unambiguousNATS connection permanently closed - max reconnects exhausted) remain atErrorsince they fire once and identify the root causeTicket
LFXV2-1701
🤖 Generated with Claude Code