fix(kube-client): propagate abort signal through list requests#2954
fix(kube-client): propagate abort signal through list requests#2954petersutter wants to merge 3 commits into
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (8)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughFor list APIs, abort signals are validated and ListWatcher.list forwards a set signal to listFunc. For requests, Client.fetch enforces a response-header timeout (destroying the stream with TimeoutError on timeout) and SessionPool.request defaults options. Kube-client reads KUBE_CLIENT_RESPONSE_TIMEOUT from env and exposes it via Helm values and tests. ChangesAbort Signal Propagation in ListWatcher
Response Timeout Protection
Kube-client responseTimeout env and Helm wiring
🎯 4 (Complex) | ⏱️ ~45 minutes Suggested reviewers:
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/request/__tests__/client.spec.js`:
- Around line 247-282: The test is using Jest APIs that don't exist under
Vitest; replace jest.fn() with vi.fn() for mocking (e.g., the mock for
stream.getHeaders and stream.destroy) and replace jest.advanceTimersByTime(...)
with vi.advanceTimersByTime(...) so the timer fast-forward works under Vitest;
update any other jest.* usages in this test (references around client.fetch,
stream.getHeaders, stream.destroy) to their vi equivalents so the test runs with
Vitest.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d7ca7124-6130-4b6d-a38e-dca1cc9a598f
📒 Files selected for processing (6)
packages/kube-client/__tests__/cache.list-watcher.spec.jspackages/kube-client/lib/cache/ListWatcher.jspackages/kube-client/lib/mixins.jspackages/request/__tests__/client.spec.jspackages/request/lib/Client.jspackages/request/lib/SessionPool.js
150fcc9 to
f1f6bbb
Compare
List requests silently stalled forever when the HTTP/2 stream hung because the abort signal was never forwarded by ListWatcher.list(), and stream.getHeaders() had no timeout. - ListWatcher.list() now passes this.signal, matching watch() - Client.fetch() enforces responseTimeout on getHeaders() via try/finally
f1f6bbb to
373a442
Compare
Read a positive-integer millisecond timeout from the environment at module load and apply it as the default responseTimeout for all clients (dashboard, user, and derived kubeconfig clients). Per-call options can still override. Helm chart renders the value from .Values.global.dashboard.kubeClient.responseTimeout.
The previous 15 s default was never enforced, so it had no effect. With the timeout now applied to getHeaders(), 15 s is too aggressive for large list requests against busy clusters.
How to categorize this PR?
/area robustness
/kind bug
What this PR does / why we need it:
Symptom observed: one of two dashboard pods failed to populate the Shoot backend cache on startup. The list request for Shoots never received a response and the reflector stalled silentl - no error, no retry, no timeout.
List requests in the backend cache could silently stall forever when an HTTP/2 stream hung without emitting an error event. Two bugs combined to cause this:
ListWatcher.list()never forwarded the abort signal to the list function, so a hung stream could not be cancelled - unlikewatch(), which already passed the signal correctly.Client.fetch()awaitedstream.getHeaders()indefinitely; the existingresponseTimeoutoption was defined but never enforced.The fix propagates the abort signal through
ListWatcher.list()and appliesresponseTimeout(default 60 s) to thegetHeaders()await via asetTimeout/try-finallyguard. The previous default of 15 s was never enforced, so it had no practical effect; with enforcement now in place, the default is raised to 60 s to accommodate large list requests against busy clusters.Additionally, this PR introduces a package-level
KUBE_CLIENT_RESPONSE_TIMEOUTenvironment variable that sets the defaultresponseTimeoutfor all kube-client instances (dashboard client, per-user clients, and derived kubeconfig clients). The Helm chart renders.Values.global.dashboard.kubeClient.responseTimeoutinto the container environment so operators can tune the timeout without code changes. Per-call options can still override the default.Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
As a possible future follow-up, we could evaluate whether the package-level config-dependent singletons should be replaced by an explicit
createClientSet()factory that receives responseTimeout and other transport options as constructor arguments. In that model, the backend would instantiate the client set at startup from its loaded config and inject it into route handlers, services, and hooks. This is intentionally not part of this PR: it would touch every backend module that imports from@gardener-dashboard/kube-clientand should only be considered as a separate refactoring effort if we decide the reduced coupling is worth the larger change surface.Release note:
Summary by CodeRabbit
New Features
Bug Fixes
Tests