Feat/auto-restart tunnels on ping failure (+ optional fallback tunnel) by naonak · Pull Request #1182 · wgtunnel/android

naonak · 2026-02-28T13:49:02Z

Auto-restart tunnel on ping failure

Summary

Adds an optional auto-restart mechanism that monitors active WireGuard tunnel and automatically restart it when ping monitoring detects sustained connectivity failure. Entirely opt-in, configurable under Settings → Tunnel Monitoring → Auto-restart. See #1036.

Problem

WireGuard tunnel can silently stop passing traffic when configured ping target is unreachable. Without manual intervention, the tunnel stays "Up" in the UI while being effectively dead.

What's new

Functional

Auto-restart on ping failure — restarts the tunnel after N consecutive ping-failure intervals reported by the existing ping monitor
Post-restart verification — 5 s after the tunnel comes back UP, performs a fresh ping to confirm recovery; cooldown only starts if verification fails
Early recovery during cooldown — periodic pings continue running during cooldown; if they succeed before the timer expires, the next restart is skipped
Exponential backoff — optionally doubles the cooldown between each attempt
Give-up action — after max attempts: either keep monitoring (do nothing) or stop the tunnel entirely
Recovery notifications — always active; snackbar when the tunnel recovers or when max attempts are exhausted (no per-setting toggle)
No restart when internet is unavailable — pings are skipped and marked NoConnectivity when connectivityManager.allNetworks reports no physical network with NET_CAPABILITY_VALIDATED; prevents spurious restarts during ISP outages or mobile data being disabled
Real-time status in tunnel list — the tunnel card shows live restart progress with attempt counter on every phase (restarting 1/3…, verifying 1/3…, restart 1/3 · next in 30s), plus a cumulative restart counter inline with uptime (uptime: 3m · ↺ 4)
Fallback tunnel — when max attempts are exhausted, optionally switch to a designated fallback tunnel instead of stopping or doing nothing. Configurable globally (default fallback for all tunnels) and per-tunnel (override). Emits a SwitchedToFallback notification. Self-reference is prevented to avoid restart loops.

Configuration

Setting	Default	Description
Restart cooldown	30 s	Minimum time between restart attempts
Consecutive failures before restart	3	Ping-failure streak required to trigger
Exponential backoff	off	Double cooldown on each attempt
Max attempts	5	Give up after N failed restart attempts
Give-up action	Do nothing	Do nothing or stop the tunnel
Fallback tunnel	off	Switch to a fallback tunnel after max attempts
Default fallback tunnel	—	Global fallback used unless overridden per-tunnel

Technical design

`HandshakeRestartHandler`

Core of the feature. One monitoring coroutine per active tunnel, started when the tunnel appears in activeTunnels and cancelled when it leaves (via StateFlow observation). A Mutex serialises job lifecycle to prevent races during rapid tunnel transitions.

Trigger logic (awaitPingFailures)
Waits for pingFailuresBeforeRestart consecutive ping cycles where all targets report unreachable, using distinctUntilChanged on pingStates to track actual new cycles rather than reacting to every stats emission.

Restart / verify / cooldown cycle

awaitPingFailures()   <- N consecutive failures

loop attempt++:

  [RESTARTING]  -- periodic pings suppressed
  stopTunnel -> delay(300 ms)
  guard: if another tunnel became active -> abort (auto-tunnel took over)
  startTunnel -> wait UP (30 s timeout)

  [VERIFYING]  -- periodic pings suppressed
  delay(5 s settle)
  direct ping
  ok  -> ConnectionRestored, attempt=0, re-arm awaitPingFailures
  fail -> ...

  if attempt >= maxAttempts ->
    if fallback enabled -> SwitchedToFallback, stop current, start fallback, return
    else -> ConnectionPermanentlyLost / stop

  [COOLDOWN]  -- periodic pings ACTIVE
  race(cooldownMs):
    pingStates all reachable -> ConnectionRestored, attempt=0, re-arm
    timeout -> loop (attempt++)

Ping suppression during restart
TunnelMonitorHandler checks restartProgress before issuing periodic pings and skips the cycle only while isRestarting or isVerifying. Pings remain active during cooldown so early recovery can be detected.

Auto-tunnel coordination
After stopping the tunnel, before restarting it, the handler checks whether another tunnel became active (e.g. auto-tunnel switched to a mobile-data tunnel). If so, the restart is aborted cleanly — the auto-tunnel's decision takes priority.

Recovery flow

Ping streak detected -> ConnectionDegrading notification (attempt N/max)
Tunnel stopped + restarted -> 5 s settle -> verification ping
- 2a. Ping succeeds -> ConnectionRestored, attempt counter resets, monitor re-arms
- 2b. Ping fails -> cooldown (with live ping monitoring for early exit) -> loop
Max attempts reached:
- 3a. Fallback enabled -> SwitchedToFallback, stop current tunnel, start fallback, handler exits
- 3b. DO_NOTHING -> ConnectionPermanentlyLost, suspends until natural ping recovery then re-arms
- 3c. STOP_TUNNEL -> ConnectionPermanentlyLost, tunnel stopped, handler exits

UI — restart progress sequence

restarting 1/3…             (pings suppressed)
verifying 1/3…              (pings suppressed — direct ping)
restart 1/3 · next in 30s   (pings active -> early recovery possible)
restarting 2/3…
verifying 2/3…
restart 2/3 · next in 60s   (backoff)
restarting 3/3…
verifying 3/3…
-> awaiting ping recovery     (pings active — natural recovery)
-> or tunnel stopped
-> or switched to fallback tunnel

TunnelRestartProgress is a pure in-memory domain type flowing HandshakeRestartHandler -> TunnelManager -> SharedAppViewModel -> TunnelsUiState -> TunnelList — not persisted.

Database

MonitoringSettings entity extended with new fields (sane defaults via auto-migration)
TunnelConfig entity extended with fallbackTunnelId (DB v35)

Also included

fix: pingWithStats() transmitted always 0 on ping timeout — stats.transmitted was only set inside if (rttList.isNotEmpty()), so when all pings failed (timeout), transmitted stayed 0. This prevented awaitPingFailures() from ever triggering a restart (see fix(ping): transmitted always 0 when all pings fail (timeout) #1197)
fix: always poll WireGuard stats regardless of Doze mode — stats polling was gated on isDeviceIdleMode; removed the gate so handshake timestamps remain up to date in the background fix(core): always poll WireGuard stats regardless of Doze mode #1177

Test plan

Happy path

Enable ping monitoring + auto-restart; block peer ICMP -> tunnel does not restart until pingFailuresBeforeRestart consecutive failure cycles are observed, then restarts
Progress visible in tunnel list: restarting 1/N… -> verifying 1/N… -> restart 1/N · next in Xs (countdown live) -> cleared on success
"Connection restored" notification emitted after successful verification ping; attempt counter resets, monitor re-arms
totalRestarts counter increments and is shown inline with uptime (uptime: 3m · ↺ 2) across multiple recovery cycles
Health dot forced to UNHEALTHY (red) throughout the restart cycle, even if WireGuard briefly reports healthy

Cooldown early recovery

Block endpoint -> restart -> verify fails -> during cooldown, unblock endpoint -> pings succeed -> "Connection restored" without triggering next restart

Exponential backoff

With backoff enabled, cooldown doubles each attempt: 30s -> 60s -> 120s…
With backoff disabled, cooldown stays constant

Max attempts — DO_NOTHING

After max attempts: ConnectionPermanentlyLost notification (indicates tunnel still running), progress freezes on awaiting ping recovery
No verifying… flash before settling on awaiting ping recovery (no false-positive race)
When connectivity naturally recovers: "Connection restored" notification, progress cleared, monitor re-arms automatically

Max attempts — STOP_TUNNEL

After max attempts: ConnectionPermanentlyLost notification (indicates tunnel stopped), tunnel is actually stopped, progress cleared
Handler exits; no further restart attempts

Max attempts — Fallback tunnel

After max attempts with fallback enabled: SwitchedToFallback notification, current tunnel stops, fallback tunnel starts
Per-tunnel fallback overrides the global default fallback
Setting the fallback to the same tunnel as the source is prevented (no restart loop)
ConnectionPermanentlyLost is NOT emitted when a fallback is available
awaiting recovery progress clears immediately when fallback switch begins (not after)
If the configured fallback tunnel no longer exists, falls back to give-up action (DO_NOTHING / STOP_TUNNEL)
Toggling the failing tunnel OFF during a fallback switch: restart loop aborted, tunnel stays off
Fallback tunnel screen shows current fallback name per tunnel; selecting a new one updates immediately
DB upgrade from v34: fallbackTunnelId defaults to null (no fallback) for all existing tunnels

Auto-tunnel interaction

With auto-tunnel enabled (WiFi->A, mobile->B): trigger restart on A, then switch to mobile data mid-restart -> B activates, A's restart handler aborts cleanly, no flip-flop

Settings changes mid-cycle

Disabling auto-restart (or disabling ping) mid-cycle: current restart cancelled, progress cleared immediately
Re-enabling auto-restart: monitoring re-arms on the next ping cycle

Manual intervention

Toggling the tunnel switch OFF during an active restart: restart cancelled cleanly, no phantom progress remains
Deleting the tunnel while restarting: job cancelled, no crash, no orphaned progress

DB migration

Upgrade from version 29: no crash, monitoring_settings created with all defaults (auto-restart off, cooldown 30s, max 5 attempts, DO_NOTHING)

naonak · 2026-03-01T17:42:27Z

Hey everyone 👋

The auto-restart feature is ready for broader testing — I've been running it for a while and can't reproduce any more bugs. If you've been waiting for a way to automatically recover from silent tunnel failures, now's a great time to give it a try.

You can find it under Settings → Tunnel Monitoring → Auto-restart (requires ping monitoring to be enabled first).

Any feedback — edge cases, unexpected behaviour, UI quirks — is welcome. Thanks!

Introduces MonitoringSettings Room entity and domain model to persist auto-restart configuration: enabled flag, ping failure threshold, cooldown duration, max restart attempts, exponential backoff toggle, and on-max-attempts action (keep waiting or stop tunnel). BackendMessage sealed class defines typed tunnel lifecycle events: ConnectionDegrading, ConnectionRestored, ConnectionPermanentlyLost. TunnelRestartProgress domain state tracks the full restart lifecycle (idle → restarting → verifying → cooldown → awaiting recovery). DB migrated from version 29 to 35. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…fication Implements HandshakeRestartHandler, a coroutine-based state machine that monitors ping health and automatically restarts the tunnel when consecutive ping failures exceed the configured threshold. Restart flow: 1. N consecutive ping failures → stop + restart tunnel (attempt 1/max) 2. 5 s verification ping after tunnel comes UP confirms recovery 3. On verification failure → exponential (or fixed) cooldown, then retry 4. Pings remain active during cooldown → early recovery skips next restart 5. After max attempts: emit ConnectionPermanentlyLost; if DO_NOTHING, suspend until natural ping recovery then re-arm automatically 6. On successful verification or natural recovery → emit ConnectionRestored, reset counter, re-arm monitor Edge cases handled: - Abort restart cycle when auto-tunnel switches to a different tunnel - Skip unnecessary restart when ping recovers during cooldown - Always poll WireGuard stats regardless of Doze mode (prerequisite fix) TunnelMonitoringHandler wires HandshakeRestartHandler alongside the existing ping/handshake monitors. TunnelManager exposes restart progress state. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

naonak · 2026-03-10T13:25:47Z

I have kept testing and everything looks good up to this point. I grouped the commits to facilitate the code review process.

AutoRestartScreen: configures auto-restart (enable/disable, ping failures before restart, cooldown, max attempts, exponential backoff, on-max-attempts action). Accessible from Settings → Tunnel monitoring. TunnelList: inline restart progress label below tunnel name shows the current phase — "restarting 1/3…", "verifying 1/3…", "restart 1/3 · next in 28s", "awaiting ping recovery" — and total restart counter alongside uptime ("uptime: 4m · ↺ 3"). Dot color forced to UNHEALTHY during active restart. MonitoringViewModel bridges MonitoringSettings persistence and exposes restartProgress state from TunnelManager to the UI layer. Snackbar notifications emitted on ConnectionRestored and ConnectionPermanentlyLost (always active, no per-setting toggle). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

When all pings fail (timeout -> Icmp.PingResult.Failed), rttList stays empty and stats.transmitted was never assigned, leaving it at 0. Move stats.transmitted = count before the rttList.isNotEmpty() check so it always reflects the number of attempted pings, matching the expected semantics of "packets transmitted". This unblocks HandshakeRestartHandler.awaitPingFailures() (introduced in wgtunnel#1182) which requires transmitted > 0 to distinguish a real failure from pings not routed through the tunnel.

Remove the cooldownMs > pingIntervalMs guard. The withTimeoutOrNull block already handles both cases correctly — it expires after cooldownMs when no recovery is detected, and exits early if pings succeed. This enables early recovery detection even when cooldown <= pingInterval, at zero extra cost. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- DB v35: add fallbackTunnelId to TunnelConfig, isFallbackEnabled and defaultFallbackTunnelId to MonitoringSettings - HandshakeRestartHandler: switch to fallback on max failures, emit SwitchedToFallback notification; fix race condition (keep restarting=true until after startTunnel); prevent self-reference fallback loop; clear "awaiting recovery" progress before fallback switch; only emit ConnectionPermanentlyLost when no fallback available - TunnelConfig.equals(): include fallbackTunnelId so StateFlow emits on fallback change and FallbackTunnelScreen recomposes correctly - FallbackTunnelScreen: per-tunnel fallback picker with SurfaceRow expandedContent pattern - AutoRestartScreen: global fallback toggle + default fallback dropdown, disabled state grays out dropdown - DropdownSelector: add enabled param with disabled color - Navigation: Route.FallbackTunnel + navbar state Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

When restarting=true blocks job cancellation during a stop/start cycle, the activeTunnels collector skips cleanup. After restarting=false the collector won't re-fire since activeTunnels hasn't changed, leaving the job alive to restart the tunnel indefinitely even after a user toggle-off. Fix: after clearing restarting flag, check if the tunnel is still in activeTunnels. If absent, it was stopped externally during the protected window — clear progress and return to break the loop. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

naonak · 2026-03-19T09:32:26Z

I've added a fallback tunnel feature to this PR.

When max restart attempts are exhausted, the tunnel can now automatically switch to a designated fallback tunnel instead of stopping or doing nothing. Configurable globally (default fallback for all tunnels) and per-tunnel. Full details in the updated PR description.

I'm stopping improvements here — the feature set is complete and the code is ready for review.

fiveseven7 · 2026-04-08T07:37:35Z

Is it possible to also restart the tunnel when the log monitor detects failed handshake initiations?

naonak mentioned this pull request Feb 28, 2026

Feat/auto-restart tunnels on stale handshake or ping failure 1036 EXPERIMENTAL #1176

Closed

9 tasks

naonak and others added 2 commits March 10, 2026 14:16

naonak force-pushed the feat/auto-restart-v2 branch from a9eee47 to 48e50c2 Compare March 10, 2026 13:17

naonak force-pushed the feat/auto-restart-v2 branch 2 times, most recently from af67ffa to e2c65f2 Compare March 10, 2026 14:15

naonak force-pushed the feat/auto-restart-v2 branch from e2c65f2 to b39a60d Compare March 10, 2026 14:32

naonak mentioned this pull request Mar 11, 2026

fix(ping): transmitted always 0 when all pings fail (timeout) #1197

Open

2 tasks

naonak force-pushed the feat/auto-restart-v2 branch from 24979f9 to ca72d74 Compare March 11, 2026 19:46

feat(ping): skip auto-restart when physical internet is unavailable

3443313

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

naonak force-pushed the feat/auto-restart-v2 branch from 6370b11 to 3443313 Compare March 18, 2026 18:57

naonak and others added 2 commits March 19, 2026 09:47

Merge remote-tracking branch 'origin/master' into feat/auto-restart-v2

702f0bb

naonak changed the title ~~Feat/auto-restart tunnels on ping failure 1036~~ Feat/auto-restart tunnels on ping failure (+ optional fallback tunnel) Mar 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat/auto-restart tunnels on ping failure (+ optional fallback tunnel)#1182

Feat/auto-restart tunnels on ping failure (+ optional fallback tunnel)#1182
naonak wants to merge 9 commits intowgtunnel:masterfrom
naonak:feat/auto-restart-v2

naonak commented Feb 28, 2026 •

edited

Loading

Uh oh!

naonak commented Mar 1, 2026 •

edited

Loading

Uh oh!

naonak commented Mar 10, 2026

Uh oh!

naonak commented Mar 19, 2026

Uh oh!

fiveseven7 commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

naonak commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Auto-restart tunnel on ping failure

Summary

Problem

What's new

Functional

Configuration

Technical design

HandshakeRestartHandler

UI — restart progress sequence

Database

Also included

Test plan

Uh oh!

naonak commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

naonak commented Mar 10, 2026

Uh oh!

naonak commented Mar 19, 2026

Uh oh!

fiveseven7 commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

naonak commented Feb 28, 2026 •

edited

Loading

`HandshakeRestartHandler`

naonak commented Mar 1, 2026 •

edited

Loading