fix(jobscheduler): cancel periodic re-arm goroutines on shutdown by worstell · Pull Request #311 · block/cachew

worstell · 2026-05-15T18:50:09Z

The graceful-shutdown change in #307 exposed a latent bug: after SIGTERM the scheduler sets q.done=true and workers exit, but the in-process re-arm goroutine spawned by SubmitPeriodicJob is a bare go func() { time.Sleep(interval); ... }() with no awareness of ctx. It dies with the pod, and the next pod's warmExistingRepos re-registers each periodic job — but periodicDelay reads the bbolt lastRun and sleeps the remaining interval before the next firing. Net effect: each periodic job skips up to one full interval per pod restart, dropping background snapshot/repack/fetch throughput sharply during and after rolling deploys.

Store the cancellable ctx on RootScheduler.
Replace time.Sleep in the re-arm and initial-delay goroutines with a select on q.ctx.Done() and a timer (new sleepThenSubmit helper).
Drop Submit() calls once q.done is set so re-arm goroutines that win the race against ctx cancellation can't enqueue to a dead scheduler.

Adds regression tests for both behaviours.

The graceful-shutdown change in #307 exposed a latent bug: after SIGTERM the scheduler sets q.done=true and workers exit, but the in-process re-arm goroutine spawned by SubmitPeriodicJob is a bare 'go func() { time.Sleep(interval); ... }()' with no awareness of ctx. It dies with the pod, and the next pod's warmExistingRepos re-registers each periodic job — but periodicDelay reads the bbolt lastRun and sleeps the remaining interval before the next firing. Net effect: each periodic job skips up to one full interval per pod restart, dropping background snapshot/repack/fetch throughput by ~90% during and after a rolling deploy. Fix: - Store the cancellable ctx on RootScheduler. - Replace time.Sleep in the re-arm and initial-delay goroutines with a select on q.ctx.Done() and a timer. - Drop Submit() calls once q.done is set so re-arm goroutines that win the race against ctx cancellation can't enqueue to a dead scheduler. Adds regression tests for both behaviours. Amp-Thread-ID: https://ampcode.com/threads/T-019e2cde-c5c3-729a-9364-202a128cf43e Co-authored-by: Amp <amp@ampcode.com>

worstell marked this pull request as ready for review May 15, 2026 18:50

worstell requested a review from a team as a code owner May 15, 2026 18:50

worstell requested review from jrobotham-square and removed request for a team May 15, 2026 18:50

joshfriend approved these changes May 15, 2026

View reviewed changes

worstell merged commit 26f531a into main May 15, 2026
8 checks passed

worstell deleted the fix/scheduler-periodic-rearm-cancel branch May 15, 2026 18:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(jobscheduler): cancel periodic re-arm goroutines on shutdown#311

fix(jobscheduler): cancel periodic re-arm goroutines on shutdown#311
worstell merged 1 commit into
mainfrom
fix/scheduler-periodic-rearm-cancel

worstell commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

worstell commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants