Skip to content

fix(durabletask): detect shutdown and break gRPC worker loop#1727

Open
javier-aliaga wants to merge 3 commits intodapr:masterfrom
javier-aliaga:fix/grpc-worker-shutdown-loop
Open

fix(durabletask): detect shutdown and break gRPC worker loop#1727
javier-aliaga wants to merge 3 commits intodapr:masterfrom
javier-aliaga:fix/grpc-worker-shutdown-loop

Conversation

@javier-aliaga
Copy link
Copy Markdown
Contributor

@javier-aliaga javier-aliaga commented Apr 14, 2026

Description

Summary

  • Fix DurableTaskGrpcWorker.startAndBlock() loop not exiting when close()/stop() is called while the gRPC stream is blocking
  • The while (true) loop only broke on InterruptedException during the 5s retry sleep — a CANCELLED from channel shutdown was logged but retried indefinitely

Changes

durabletask-client — DurableTaskGrpcWorker

  • Replace while (true) with while (!isNormalShutdown && !Thread.currentThread().isInterrupted()) to check shutdown signals each iteration
  • Break out of the retry path when CANCELLED is received during a normal shutdown, avoiding a stale 5s sleep cycle
  • Re-set the thread interrupt flag before breaking on InterruptedException to preserve the interrupt contract for upstream callers

Issue reference

We strive to have all PR being opened based on an issue, where the problem or feature have been discussed prior to implementation.

Please reference the issue this PR will close: #1728

Checklist

Please make sure you've completed the relevant tasks for this PR, out of the following list:

  • Code compiles correctly
  • Created/updated tests
  • Extended the documentation

@javier-aliaga javier-aliaga force-pushed the fix/grpc-worker-shutdown-loop branch from 60ba79d to 5634b8d Compare April 14, 2026 09:21
The worker loop ran `while (true)` and only exited if an
InterruptedException happened during the 5-second retry sleep.
If `close()` was called while the gRPC stream was blocking, the
CANCELLED exception was logged but the loop kept retrying.

- Replace `while (true)` with a check on `isNormalShutdown` and
  the thread interrupt flag so the loop exits promptly.
- Break out of the retry path on CANCELLED when `isNormalShutdown`
  is set, avoiding a misleading 5-second sleep after `close()`.
- Re-set the interrupt flag before breaking on InterruptedException
  to preserve the interrupt contract for callers higher up.

Signed-off-by: Javier Aliaga <javier@diagrid.io>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes DurableTaskGrpcWorker.startAndBlock() not terminating promptly during shutdown when the gRPC getWorkItems() stream is blocking, aligning behavior with issue #1728 expectations.

Changes:

  • Replaces the infinite worker loop with a loop that checks isNormalShutdown and thread interrupt status.
  • Exits the retry path on CANCELLED during normal shutdown and preserves interrupt status when sleep is interrupted.
Comments suppressed due to low confidence (1)

durabletask-client/src/main/java/io/dapr/durabletask/DurableTaskGrpcWorker.java:179

  • The outer loop now checks isNormalShutdown, but the worker can still remain inside the inner while (workItemStream.hasNext()) and keep pulling/dispatching work items after shutdown has been requested. This is especially problematic because close() shuts down workerPool before the gRPC channel is closed, so workerPool.submit(...) can start throwing RejectedExecutionException (uncaught) during shutdown. Consider checking isNormalShutdown/Thread.interrupted() inside the work-item loop and breaking out (or short-circuiting submissions) as soon as shutdown is requested, to avoid dispatching new work during shutdown and to exit promptly.
    while (!this.isNormalShutdown && !Thread.currentThread().isInterrupted()) {
      try {
        OrchestratorService.GetWorkItemsRequest getWorkItemsRequest = OrchestratorService.GetWorkItemsRequest
            .newBuilder().build();
        Iterator<OrchestratorService.WorkItem> workItemStream = this.sidecarClient.getWorkItems(getWorkItemsRequest);
        while (workItemStream.hasNext()) {

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Capture workerThread in startAndBlock() so close() can interrupt
  the thread even when startAndBlock() is called directly (not via
  start()), fixing the case where the 5s sleep blocks shutdown.
- Add isNormalShutdown guard before the retry sleep so any exception
  code (UNAVAILABLE, CANCELLED, etc.) exits promptly during shutdown.
- Add DurableTaskGrpcWorkerShutdownTest with 3 scenarios:
  - start() + close() terminates the worker thread promptly
  - startAndBlock() on a separate thread exits on close()
  - startAndBlock() exits on thread interrupt

Signed-off-by: Javier Aliaga <javier@diagrid.io>
@javier-aliaga javier-aliaga force-pushed the fix/grpc-worker-shutdown-loop branch from 31a1d32 to e00a33e Compare April 14, 2026 10:45
@javier-aliaga javier-aliaga marked this pull request as ready for review April 14, 2026 11:00
@javier-aliaga javier-aliaga requested review from a team as code owners April 14, 2026 11:00
@javier-aliaga javier-aliaga requested a review from Copilot April 14, 2026 11:38
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes DurableTaskGrpcWorker.startAndBlock() not exiting promptly during shutdown by making the worker loop aware of normal shutdown and thread interrupts, and by avoiding retries when gRPC disconnects due to shutdown.

Changes:

  • Update the worker loop condition to stop on normal shutdown and on thread interrupt.
  • Break out of the retry path when gRPC returns CANCELLED during a normal shutdown, and preserve interrupt status when catching InterruptedException.
  • Add unit tests intended to validate shutdown/interrupt behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.

File Description
durabletask-client/src/main/java/io/dapr/durabletask/DurableTaskGrpcWorker.java Adjusts the worker loop to exit on shutdown/interrupt and handles CANCELLED during shutdown.
durabletask-client/src/test/java/io/dapr/durabletask/DurableTaskGrpcWorkerShutdownTest.java Adds tests for prompt termination on close() and on interrupt.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Mark workerThread as volatile for cross-thread visibility
- Remove unused imports (ManagedChannel, ManagedChannelBuilder)
- Fail test explicitly when reflection fails instead of silently
  returning null
- Assert interrupt status is preserved in startAndBlockExitsOnInterrupt

Signed-off-by: Javier Aliaga <javier@diagrid.io>
@javier-aliaga javier-aliaga force-pushed the fix/grpc-worker-shutdown-loop branch from 3578c26 to 944bf24 Compare April 14, 2026 13:41
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.90%. Comparing base (e5cf3f3) to head (944bf24).

Additional details and impacted files
@@            Coverage Diff            @@
##             master    #1727   +/-   ##
=========================================
  Coverage     72.90%   72.90%           
  Complexity     2257     2257           
=========================================
  Files           242      242           
  Lines          7415     7415           
  Branches        738      738           
=========================================
  Hits           5406     5406           
  Misses         1648     1648           
  Partials        361      361           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DurableTaskGrpcWorker loop does not exit on close() or stop()

2 participants