If the server process is killed ungracefully (OOMKill, SIGKILL, container crash), a sync that was in progress is left in IN_PROGRESS with no ended_at. On restart, the sync record is preserved as-is (startup initialization skips existing rows). Because ShouldSync returns early for any source in Syncing state with no timeout or age check, the source is silently skipped on every subsequent coordinator cycle and never retried.
The fix should be a background watchdog goroutine (independent of the coordinator loop) that periodically resets IN_PROGRESS rows older than a configurable threshold to FAILED. The threshold must be configurable to account for varying worst-case sync durations. This also covers the case where PerformSync hangs indefinitely in a live process and the coordinator loop is blocked.
If the server process is killed ungracefully (OOMKill, SIGKILL, container crash), a sync that was in progress is left in IN_PROGRESS with no ended_at. On restart, the sync record is preserved as-is (startup initialization skips existing rows). Because ShouldSync returns early for any source in Syncing state with no timeout or age check, the source is silently skipped on every subsequent coordinator cycle and never retried.
The fix should be a background watchdog goroutine (independent of the coordinator loop) that periodically resets IN_PROGRESS rows older than a configurable threshold to FAILED. The threshold must be configurable to account for varying worst-case sync durations. This also covers the case where PerformSync hangs indefinitely in a live process and the coordinator loop is blocked.