Feat/auto-restart tunnels on stale handshake or ping failure 1036 EXPERIMENTAL by naonak · Pull Request #1176 · wgtunnel/android

naonak · 2026-02-26T13:44:23Z

Auto-restart tunnels on stale handshake or ping failure (EXPERIMENTAL)

WORK IN PROGRESS.

Summary

Adds an optional auto-restart mechanism that monitors the active WireGuard tunnel and automatically restarts it when a connection problem is detected. The feature is entirely opt-in and configurable through a new screen under Settings → Tunnel Monitoring → Auto-restart.

Problem

WireGuard tunnels can silently stop passing traffic when:

The last handshake becomes stale (~3.5 min without a successful re-key)
All configured ping targets are unreachable for several consecutive intervals

Without manual intervention the tunnel stays "Up" in the UI while being effectively dead.

What's new

Functional

Auto-restart on stale handshake — restarts the tunnel when the WireGuard handshake threshold is exceeded, without requiring the user to toggle the tunnel manually
Restart on ping failure — optionally also restarts after N consecutive ping-failure intervals reported by the existing ping monitor
Pre-restart verification — when ping is enabled, performs a fresh ping series just before each restart attempt; skips the restart if any target is reachable (tunnel recovered on its own)
Exponential backoff — optionally doubles the cooldown between each attempt, up to a configurable number of attempts
Give-up action — after max attempts: either keep monitoring (do nothing) or stop the tunnel entirely
Recovery notifications — notifies the user when the tunnel that was restarting comes back healthy
Real-time status in tunnel list — the tunnel card shows live restart progress: attempt count, countdown to next retry, trigger reason (stale handshake / ping failure), and failing ping targets

Configuration

Setting	Default	Description
Restart cooldown	30s	Minimum time between restart attempts
Startup grace period	15s	Delay before first check after tunnel start
Restart on ping failure	off	Use ping failures as an additional trigger
Consecutive failures before restart	3	Ping failure streak required
Exponential backoff	off	Double cooldown on each attempt
Max attempts (backoff)	5	Give up after N attempts with backoff
Max attempts (no backoff)	10	Max attempts per hour without backoff
Give-up action	Do nothing	Do nothing or stop tunnel
Recovery notifications	on	Notify when tunnel recovers

Technical design

`HandshakeRestartHandler`

The core of the feature. A monitoring coroutine is started when the tunnel comes up and cancelled when it goes down (via activeTunnels StateFlow). A Mutex serialises job lifecycle.

Trigger logic (shouldTrigger)

Stale handshake — always checked first (WireGuard already waits ~3.5 min)
Ping failure — all attempted pings unreachable, gated on isPingMonitoringEnabled

False-positive protection

Startup grace — waits for the tunnel to reach a healthy state before the first check; prevents false triggers from stale kernel stats retained across tunnel restarts
Ping streak threshold — ping failures must repeat for N consecutive intervals; waits for an actual new ping cycle (not just any stats update) using lastPingAttemptMillis comparison
Pre-restart verification — re-pings targets fresh before committing to a restart, using NetworkUtils.pingWithStats()
Post-restart grace — mirrors startup grace after each restart; prevents rapid-fire loops when cooldown < WireGuard re-keying time, since isTunnelStale() can still fire on stale stats before the new handshake completes

Rate limiting

Timestamps are recorded in an ArrayDeque<Long>
Without backoff: timestamps older than 1 hour are pruned on each check
With backoff: cooldown × 2^(attempt-1), capped at attempt 31 to prevent Long overflow

Network change reactivity

networkChangeFlow observes connectivity state changes (WiFi ↔ Cellular ↔ Ethernet) and wakes the monitoring loop immediately after a 3 s grace, avoiding the full ~3.5 min stale-handshake wait after a network switch

Give-up

DO_NOTHING — suspends until the tunnel recovers or goes down, then resets timestamps and resumes monitoring
STOP_TUNNEL — calls stopTunnel(id) and returns (job terminates)

Database

MonitoringSettings entity extended with 9 new fields (all with sane defaults via auto-migration)
DB version 29 → 31 (two auto-migrations)
MaxAttemptsAction stored as a string enum via DatabaseConverters
TunnelRestartProgress is a pure domain state type — not persisted, lives only in memory

UI

AutoRestartScreen exposes all settings through MonitoringViewModel (Orbit MVI pattern)
AutoRestartScreen shows a warning banner when battery optimization is enabled, with a direct tap-to-disable shortcut — battery optimization can prevent auto-restart from firing reliably on some devices
LabelledNumberDropdown added as a reusable component for numeric option lists
Backoff give-up dropdown shows estimated total wait time (e.g. 5 attempts (~4m35s)) computed from computeCooldown() so the user can reason about the effective timeout
TunnelRestartProgress flows from HandshakeRestartHandler → TunnelManager → SharedAppViewModel → TunnelsUiState → TunnelList

Also included

fix: reduce network change grace period from 10 s to 3 s

HandshakeRestartHandler observes network transitions (WiFi ↔ LTE ↔ Ethernet) and wakes the restart loop early to avoid waiting the full ~3.5 min stale-handshake window. The previous grace period of 10 s was longer than necessary: 3 s is enough to distinguish a real network switch from a momentary drop, while still reacting quickly enough to restart the tunnel before the user notices the outage.

fix: show battery optimization warning in auto-restart screen

If Android battery optimization is active, the app process can be throttled or delayed in the background, preventing auto-restart from firing reliably (especially on devices running Android < 14 with aggressive OEM power management). A contextual warning banner is now shown at the top of the Auto-restart screen whenever battery optimization is not disabled, with a tap action that opens the system exemption prompt directly.

Test plan

Enable auto-restart, disconnect network — tunnel restarts within cooldown + grace period
Enable ping failure trigger, block ICMP — restart triggers after N consecutive failures
Verify pre-restart verification skips restart when tunnel self-recovers mid-cooldown
Enable backoff — confirm cooldown doubles each attempt
Reach max attempts with STOP_TUNNEL action — tunnel stops, notification shown
Reach max attempts with DO_NOTHING — monitoring resumes after manual recovery
Toggle tunnel off manually during auto-restart — restart cancelled cleanly
Startup grace: toggle tunnel on/off rapidly — no spurious restart on startup
Recovery notification shown when tunnel comes back healthy

Introduces the data model for the auto-restart feature: - MonitoringSettings entity/domain model with all configurable fields: isAutoRestartEnabled, restartCooldownSeconds, maxHandshakeRestartAttempts, startupGraceSeconds, isRecoveryNotificationEnabled, isPingMonitoringEnabled, pingFailuresBeforeRestart, isBackoffEnabled, backoffMaxAttempts, maxAttemptsAction - MaxAttemptsAction enum: DO_NOTHING or STOP_TUNNEL when max attempts reached - TunnelRestartProgress domain state for real-time UI feedback - BackendMessage extended with restart-related events (restarting, recovered, max attempts reached) - MonitoringSettingsMapper for entity ↔ domain conversion - DatabaseConverters updated for new types - AppDatabase bumped to v31 with auto-migrations (v29→30, v30→31) - DB schema snapshots for v30 and v31 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

HandshakeRestartHandler runs a coroutine per active tunnel and handles all auto-restart logic. Key behaviours: Restart triggers - Stale handshake: restarts when last handshake exceeds the WireGuard threshold (always active when auto-restart is enabled) - Ping failure streak: optionally restarts after N consecutive ping failures reported by the ping monitor (isPingMonitoringEnabled) False-positive protection - Startup grace period: skips restart checks for configurable seconds after the tunnel first starts, avoiding false triggers during the initial WireGuard handshake - Post-restart grace period: waits after each restart before re-checking, preventing rapid-fire loops when the cooldown is shorter than the WireGuard re-keying time - Pre-restart verification pings: when ping is enabled, performs a fresh ping series just before restarting; skips restart if any target is reachable (tunnel self-recovered) Rate limiting & give-up - Configurable cooldown between attempts (restartCooldownSeconds) - Optional exponential backoff: doubles cooldown each attempt up to backoffMaxAttempts, then triggers maxAttemptsAction - maxAttemptsAction: DO_NOTHING (keep monitoring) or STOP_TUNNEL Observability - Emits TunnelRestartProgress events consumed by the UI for real-time status display (countdown, attempt count, restart reason) - Recovery notifications via NotificationMonitor when a tunnel that was restarting comes back healthy Integration - TunnelManager creates one HandshakeRestartHandler per tunnel start and cancels it on stop - TunnelLifecycleManager and TunnelProvider updated to expose the required state flows Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

New AutoRestartScreen accessible from Settings > Tunnel Monitoring: - Enable/disable auto-restart toggle - Restart cooldown dropdown (5s → 5min) - Startup grace period dropdown (0 → 60s) - Restart on ping failure toggle (gated on ping being enabled) - Consecutive ping failures threshold (1–5) - Exponential backoff toggle with give-up attempts dropdown; dropdown label shows estimated total wait time for quick tuning - Max attempts action: do nothing or stop tunnel - Recovery notifications toggle Navigation: added AutoRestart route, entry in MainActivity nav graph, and navbar state mapping. MonitoringViewModel exposes all settings as state with individual update intents. LabelledNumberDropdown added as a new reusable component for numeric option lists. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Each tunnel card now displays live restart progress when HandshakeRestartHandler is active: - "Restarting… (attempt N)" during a restart - "Next restart in Xs" countdown during cooldown - Restart reason: stale handshake or ping failure - Ping target when the trigger is a ping failure - "Max attempts reached" when give-up action fires - Status clears automatically on tunnel recovery SharedAppViewModel collects TunnelRestartProgress from TunnelManager and exposes it as a StateFlow. TunnelsUiState carries the progress map keyed by tunnel ID. SettingsViewModel passes it through to the tunnels screen. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

naonak · 2026-02-26T13:48:37Z

#1036

- AppDatabase: consolidate auto-migrations 31→32→33→34→35 into a single AutoMigration(31, 35); intermediate schema files were never committed so Room could not generate the migration code - TunnelManager: remove `override val restartCounts` which was absent from the TunnelProvider interface (restartCounts is managed internally by HandshakeRestartHandler; attemptNumber in TunnelRestartProgress serves the same purpose externally) - SharedAppViewModel: remove redundant restartCounts from the combine; use restartProgress only (which already contains attemptNumber) - TunnelList: remove restartCount parameter from TunnelStatisticsRow call (parameter was removed from the composable signature) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Add description under "Restart on ping failure" (requires ping monitoring) - Add description under "Startup grace period" - Tune defaults: grace 30→10s, cooldown 30→15s, ping failures before restart 1→2 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

10 seconds of silent traffic failure on WiFi→LTE transitions was too aggressive. 3 seconds is sufficient to distinguish a real network switch from a brief drop, while limiting unnecessary downtime. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…creen When auto-restart is active but battery optimization is NOT disabled, Android may restrict the monitoring process (especially on pre-Android-14 devices). A contextual banner now appears at the top of the screen with a direct link to the system battery optimization settings. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

naonak · 2026-02-28T13:50:47Z

new version #1182

naonak and others added 4 commits February 26, 2026 14:35

This was referenced Feb 26, 2026

Feat/auto restart stale handshake 1036 #1175

Closed

[FEATURE] - Restart tunnel after handshake exceeds certain time #1036

Open

naonak and others added 4 commits February 26, 2026 15:14

naonak mentioned this pull request Feb 27, 2026

fix(core): always poll WireGuard stats regardless of Doze mode #1177

Merged

naonak changed the title ~~Feat/auto-restart tunnels on stale handshake or ping failure 1036~~ Feat/auto-restart tunnels on stale handshake or ping failure 1036 EXPERIMENTAL Feb 27, 2026

naonak closed this Feb 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat/auto-restart tunnels on stale handshake or ping failure 1036 EXPERIMENTAL#1176

Feat/auto-restart tunnels on stale handshake or ping failure 1036 EXPERIMENTAL#1176
naonak wants to merge 8 commits intowgtunnel:masterfrom
naonak:feat/auto-restart-pr

naonak commented Feb 26, 2026 •

edited

Loading

Uh oh!

naonak commented Feb 26, 2026

Uh oh!

naonak commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

naonak commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Auto-restart tunnels on stale handshake or ping failure (EXPERIMENTAL)

Summary

Problem

What's new

Functional

Configuration

Technical design

HandshakeRestartHandler

Database

UI

Also included

fix: reduce network change grace period from 10 s to 3 s

fix: show battery optimization warning in auto-restart screen

Test plan

Uh oh!

naonak commented Feb 26, 2026

Uh oh!

naonak commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

naonak commented Feb 26, 2026 •

edited

Loading

`HandshakeRestartHandler`