fix(libp2p): make AutoNAT log messages actionable for operators#4260
fix(libp2p): make AutoNAT log messages actionable for operators#4260
Conversation
- Per-probe failures (autonat::OutboundProbeEvent::Error) demoted from warn! to
debug!. Each failed probe is a single peer momentarily unable to dial back; the
operator cannot fix any individual probe and the WARN spam drowns out the
actually-useful aggregate signal.
- StatusChanged events promoted from debug! to actionable messages keyed off
NatStatus:
Public(addr): info! - confirms reachability, includes the address
Private: error! - operator MUST act, message names the flag and the
network-level checks (firewall, NAT, security group)
Unknown: debug! - genuinely uncertain, not actionable
- Before this change, an operator behind a misconfigured NAT/NodePort saw a
flood of opaque "AutoNAT Probe failed to peer ... with error: OutboundRequest(
Io(Kind(UnexpectedEof)))" lines and no clear signal that their node was
unreachable. After this change they see a single ERROR line naming
--libp2p-advertise-address (env ESPRESSO_NODE_LIBP2P_ADVERTISE_ADDRESS) and
the firewall checks they need to perform.
- Drive-by: fix a "fales" typo in a comment and whitelist the `bimap` crate name
in .typos.toml so the pre-commit spell-check passes.
|
One of the things I don't really know due to not being familiar with libp2p is when the status changes. I will check that soon. |
There was a problem hiding this comment.
Code Review
This pull request updates the typo configuration, fixes a typo in a comment, and improves AutoNAT logging in the libp2p networking layer. The logging changes now distinguish between public, private, and unknown NAT statuses, providing actionable error messages for private nodes. I have kept the review comment regarding the tight coupling of consensus-specific terminology in the networking layer, as it identifies a significant architectural improvement opportunity.
So it seems this would fire if the first check fails which is bad, will fix. |
- Demote Unknown -> Private to warn since confidence is 0 on the first flip and a single transient probe failure would otherwise fire the loud error. - Emit the operator-facing error only once confidence reaches confidence_max in Private status, indicating the result is reproduced by repeated probes. - Latch the error per Private episode and reset on transitions back to Public or Unknown so recovery + re-loss re-fires.
Nextest failures (1) in this run
See the step summary for flaky tests and slowest tests. |
Per-probe failures (autonat::OutboundProbeEvent::Error) demoted from warn! to debug!. Each failed probe is a single peer momentarily unable to dial back; the operator cannot fix any individual probe and the
WARNspam drowns out the actually-useful aggregate signal.StatusChangedevents promoted from debug! to actionable messages keyed offNatStatus:Before this change, an operator behind a misconfigured NAT/NodePort saw a flood of opaque
"AutoNAT Probe failed to peer ... with error: OutboundRequest( Io(Kind(UnexpectedEof)))"lines and no clear signal that their node was unreachable. After this change they see a singleERRORline naming--libp2p-advertise-address(envESPRESSO_NODE_LIBP2P_ADVERTISE_ADDRESS) and the firewall checks they need to perform.Drive-by: fix a "fales" typo in a comment and whitelist the
bimapcrate name in .typos.toml so the pre-commit spell-check passes.