fix(kube-client): propagate abort signal through list requests by petersutter · Pull Request #2954 · gardener/dashboard

petersutter · 2026-05-11T17:54:08Z

How to categorize this PR?
/area robustness
/kind bug

What this PR does / why we need it:
Symptom observed: one of two dashboard pods failed to populate the Shoot backend cache on startup. The list request for Shoots never received a response and the reflector stalled silentl - no error, no retry, no timeout.

List requests in the backend cache could silently stall forever when an HTTP/2 stream hung without emitting an error event. Two bugs combined to cause this:

ListWatcher.list() never forwarded the abort signal to the list function, so a hung stream could not be cancelled - unlike watch(), which already passed the signal correctly.
Client.fetch() awaited stream.getHeaders() indefinitely; the existing responseTimeout option was defined but never enforced.

The fix propagates the abort signal through ListWatcher.list() and applies responseTimeout (default 60 s) to the getHeaders() await via a setTimeout / try-finally guard. The previous default of 15 s was never enforced, so it had no practical effect; with enforcement now in place, the default is raised to 60 s to accommodate large list requests against busy clusters.

Additionally, this PR introduces a package-level KUBE_CLIENT_RESPONSE_TIMEOUT environment variable that sets the default responseTimeout for all kube-client instances (dashboard client, per-user clients, and derived kubeconfig clients). The Helm chart renders .Values.global.dashboard.kubeClient.responseTimeout into the container environment so operators can tune the timeout without code changes. Per-call options can still override the default.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:
As a possible future follow-up, we could evaluate whether the package-level config-dependent singletons should be replaced by an explicit createClientSet() factory that receives responseTimeout and other transport options as constructor arguments. In that model, the backend would instantiate the client set at startup from its loaded config and inject it into route handlers, services, and hooks. This is intentionally not part of this PR: it would touch every backend module that imports from @gardener-dashboard/kube-client and should only be considered as a separate refactoring effort if we decide the reduced coupling is worth the larger change surface.

Release note:

Fix an issue where the dashboard backend cache could stop updating if Kubernetes API list requests became unresponsive.

Add `KUBE_CLIENT_RESPONSE_TIMEOUT` environment variable to configure the default Kubernetes API response-header timeout for dashboard backend requests. When using the Helm chart, this can be configured via `.Values.global.dashboard.kubeClient.responseTimeout`.

Summary by CodeRabbit

New Features
- Add configurable response-header timeout via environment/configuration (can be set per-deployment).
Bug Fixes
- List operations now receive abort signals like watch operations.
- Added response-header timeout enforcement and increased default timeout to reduce hung requests.
- Ensure request options default to an object to avoid undefined-parameter issues.
Tests
- Added tests for abort-signal handling and response-header timeout behavior.

gardener-prow · 2026-05-11T17:54:14Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign petersutter for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2026-05-11T17:54:32Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c376e0d0-f2b1-4208-aff9-3ee2c226e378

📥 Commits

Reviewing files that changed from the base of the PR and between 373a442 and 88b3c9c.

📒 Files selected for processing (8)

charts/__tests__/gardener-dashboard/runtime/dashboard/deployment.spec.js
charts/gardener-dashboard/charts/runtime/templates/dashboard/deployment.yaml
charts/gardener-dashboard/values.yaml
packages/kube-client/__tests__/index.spec.js
packages/kube-client/lib/index.js
packages/request/__tests__/acceptance.spec.js
packages/request/__tests__/client.spec.js
packages/request/lib/Client.js

🚧 Files skipped from review as they are similar to previous changes (1)

packages/request/lib/Client.js

📝 Walkthrough

Walkthrough

For list APIs, abort signals are validated and ListWatcher.list forwards a set signal to listFunc. For requests, Client.fetch enforces a response-header timeout (destroying the stream with TimeoutError on timeout) and SessionPool.request defaults options. Kube-client reads KUBE_CLIENT_RESPONSE_TIMEOUT from env and exposes it via Helm values and tests.

Changes

Abort Signal Propagation in ListWatcher

Layer / File(s)	Summary
Signal Validation in List Methods `packages/kube-client/lib/mixins.js`	`ClusterScoped.Readable.list`, `NamespaceScoped.Readable.list`, and `listAllNamespaces` call `assertSignal(signal)` before options/search-params validation and request execution.
ListWatcher Signal Forwarding `packages/kube-client/lib/cache/ListWatcher.js`	`ListWatcher.list(query)` conditionally attaches `this.signal` to `options` passed to `listFunc` when `setAbortSignal()` was called.
ListWatcher Signal Test `packages/kube-client/__tests__/cache.list-watcher.spec.js`	New unit test verifies `ListWatcher#list` with an abort signal calls `listFunc` with the `signal` and merged `searchParams`.

Response Timeout Protection

Layer / File(s)	Summary
TimeoutError Import `packages/request/lib/Client.js`	`Client.js` now imports `TimeoutError` from the errors module.
SessionPool Request Defaults `packages/request/lib/SessionPool.js`	`SessionPool.request(headers, options = {})` defaults `options` to `{}` so `session.request` never receives `undefined`.
Client Fetch Timeout Implementation `packages/request/lib/Client.js`	`Client.fetch()` starts a `responseTimeout` timer before `stream.getHeaders()`; if headers aren't received within `responseTimeout` the request stream is destroyed with a `TimeoutError`. The timer is cleared in `try/finally`. The default `responseTimeout` getter was increased to 60000 ms.
Client Timeout & Acceptance Tests `packages/request/__tests__/client.spec.js`, `packages/request/__tests__/acceptance.spec.js`	Unit and acceptance tests added/updated: unit tests assert default timeout and `fetch` rejects with `TimeoutError` when headers don't arrive; acceptance tests add `/delay` route and a real-connection timeout case.

Kube-client responseTimeout env and Helm wiring

Layer / File(s)	Summary
parseResponseTimeout and Client defaultOptions `packages/kube-client/lib/index.js`	Parses and validates `KUBE_CLIENT_RESPONSE_TIMEOUT` from env, merges it into `defaultOptions` passed to `Client`/`BaseClient`; `createClient` and `createDashboardClient` now default `options = {}` and `dashboardClient` is created via `createDashboardClient()`.
kube-client env propagation tests `packages/kube-client/__tests__/index.spec.js`	Tests assert `KUBE_CLIENT_RESPONSE_TIMEOUT` is propagated into `request.extend` for package-level/dashboard/derived clients, that per-call overrides take precedence, and invalid env values fail fast.
Helm values and deployment env injection `charts/gardener-dashboard/values.yaml`, `charts/gardener-dashboard/charts/runtime/templates/dashboard/deployment.yaml`, `charts/__tests__/gardener-dashboard/runtime/dashboard/deployment.spec.js`	Adds `global.dashboard.kubeClient.responseTimeout` value, conditionally injects `KUBE_CLIENT_RESPONSE_TIMEOUT` into container env when set, and adds a chart test verifying the env variable is rendered.

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers:

holgerkoser
klocke-io

"I set a signal and watch the clock,
If headers lag, I give a knock.
Streams fold neat, timeouts named with care,
Env and charts pass settings everywhere. 🐰"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title accurately summarizes the main change: propagating the abort signal through list requests in ListWatcher.
Description check	✅ Passed	The PR description is comprehensive and follows the template with all required sections completed, including area/kind categorization, clear problem explanation, solution details, and appropriate release notes.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/propagate-abort-signal-through-list-requests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/request/__tests__/client.spec.js`:
- Around line 247-282: The test is using Jest APIs that don't exist under
Vitest; replace jest.fn() with vi.fn() for mocking (e.g., the mock for
stream.getHeaders and stream.destroy) and replace jest.advanceTimersByTime(...)
with vi.advanceTimersByTime(...) so the timer fast-forward works under Vitest;
update any other jest.* usages in this test (references around client.fetch,
stream.getHeaders, stream.destroy) to their vi equivalents so the test runs with
Vitest.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d7ca7124-6130-4b6d-a38e-dca1cc9a598f

📥 Commits

Reviewing files that changed from the base of the PR and between d2a7ab9 and 150fcc9.

📒 Files selected for processing (6)

packages/kube-client/__tests__/cache.list-watcher.spec.js
packages/kube-client/lib/cache/ListWatcher.js
packages/kube-client/lib/mixins.js
packages/request/__tests__/client.spec.js
packages/request/lib/Client.js
packages/request/lib/SessionPool.js

List requests silently stalled forever when the HTTP/2 stream hung because the abort signal was never forwarded by ListWatcher.list(), and stream.getHeaders() had no timeout. - ListWatcher.list() now passes this.signal, matching watch() - Client.fetch() enforces responseTimeout on getHeaders() via try/finally

Read a positive-integer millisecond timeout from the environment at module load and apply it as the default responseTimeout for all clients (dashboard, user, and derived kubeconfig clients). Per-call options can still override. Helm chart renders the value from .Values.global.dashboard.kubeClient.responseTimeout.

The previous 15 s default was never enforced, so it had no effect. With the timeout now applied to getHeaders(), 15 s is too aggressive for large list requests against busy clusters.

petersutter requested review from grolu, holgerkoser and klocke-io as code owners May 11, 2026 17:54

gardener-prow Bot added the area/robustness Robustness, reliability, resilience related label May 11, 2026

gardener-prow Bot added kind/bug Bug cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 11, 2026

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

Comment thread packages/request/__tests__/client.spec.js

petersutter force-pushed the fix/propagate-abort-signal-through-list-requests branch from 150fcc9 to f1f6bbb Compare May 12, 2026 07:21

petersutter force-pushed the fix/propagate-abort-signal-through-list-requests branch from f1f6bbb to 373a442 Compare May 12, 2026 08:30

petersutter added 2 commits May 14, 2026 12:57

fix(request): increase default response timeout to 60 s

88b3c9c

The previous 15 s default was never enforced, so it had no effect. With the timeout now applied to getHeaders(), 15 s is too aggressive for large list requests against busy clusters.

gardener-prow Bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(kube-client): propagate abort signal through list requests#2954

fix(kube-client): propagate abort signal through list requests#2954
petersutter wants to merge 3 commits into
masterfrom
fix/propagate-abort-signal-through-list-requests

petersutter commented May 11, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

gardener-prow Bot commented May 11, 2026

Uh oh!

coderabbitai Bot commented May 11, 2026 •

edited

Loading

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

petersutter commented May 11, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

gardener-prow Bot commented May 11, 2026

Uh oh!

coderabbitai Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

petersutter commented May 11, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 11, 2026 •

edited

Loading