Skip to content

Optional heartbeat/keepalive to detect silent link failures on idle sessions #219

@SeanTAllen

Description

@SeanTAllen

Background

PR #218 routes lori's _on_closed through the session state machine so peer-initiated TCP close is detected in every session state. That covers the cases where the OS tells us the connection is gone — FIN (graceful close) or RST (abrupt close).

It does not cover cases where nobody tells us. If the server host loses power, a VM is hard-killed, a NAT box drops the connection from its tracking table, or a network partition drops packets without generating a RST, the client-side OS has no way to know the connection is dead. lori's read never returns, _on_closed never fires, and an idle session sitting in _QueryReady with nothing to send hangs indefinitely.

Why this is a low-priority enhancement, not a bug

In practice, this situation is self-limiting:

  • The next query attempt writes to the dead socket; eventually (once OS retransmits give up) lori reports the failure and _on_closed fires.
  • If the application is shutting down, it doesn't matter — the actor is torn down anyway.

The only scenario where it bites is a long-idle session that the application wants to keep "warm" without issuing any traffic.

Possible approaches

  • OS-level TCP keepalive on the socket (simplest — kernel does the probing)
  • Application-level heartbeat (periodic no-op query or Sync message)
  • Idle timeout via lori's idle_timeout() mechanism
  • Some combination, configurable via ServerConnectInfo

Design should consider whether this is opt-in (default off, activated by passing a keepalive/heartbeat interval) to avoid changing behavior for existing users.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions