fix(scaling): guard GetCurrentReplicas against nil ScaleTargetGVKR by ggarb · Pull Request #7661 · kedacore/keda

ggarb · 2026-04-17T17:32:38Z

Problem

At ~10k ScaledObjects being created at 10/s, the KEDA operator panics:

panic: runtime error: invalid memory address or nil pointer dereference
pkg/scaling/resolver/scale_resolvers.go:731 GetCurrentReplicas
pkg/scaling/executor/scale_scaledobjects.go:44 RequestScale
pkg/scaling/scale_handler.go:282 checkScalers
pkg/scaling/scale_handler.go:199 startScaleLoop

Root cause is the same informer-cache race described in #4389 / tracked
in #4955: scaledObject.Status.ScaleTargetGVKR can be nil when the
scale loop first invokes GetCurrentReplicas. The existing code
dereferences .Group / .Kind on the nil pointer, crashing the whole
operator process and taking down every other scale loop with it.

ResolveScaleTargetPodSpec already defends against this race
(scale_resolvers.go L103-L119). GetCurrentReplicas does not.
This PR applies the same pattern.

Fix

If Status.ScaleTargetGVKR is nil on entry, re-fetch the ScaledObject
via the client. If it is still nil after re-fetch, return a descriptive
error instead of panicking.

Repro context

Observed during a 10k-ScaledObject KWOK load test at Netflix, after
raising --kube-api-qps/burst to eliminate client-go throttling
(previous 1k bottleneck). Once client-side throttling was gone, fast
ScaledObject creation widened the cache-race window enough that the
nil-pointer panic fired reliably before the 750th object was created.

Tests

Added TestGetCurrentReplicas_NilScaleTargetGVKR with three cases:
- nil on input, re-fetch succeeds with populated GVKR → returns
  correct replica count (Deployment path)
- nil on input, re-fetch also returns nil → returns a descriptive
  "probably invalid ScaledObject cache" error
- nil on input, re-fetch fails (SO missing) → returns fetch error
All existing tests in ./pkg/scaling/... pass.

Fixes / refs

Refs Constants crashes in keda operator after deploying service controlled by scaledobject #4389
Refs ScaledObject no being cached correctly at all times #4955 (root-cause tracking issue; this PR adds another guard
site, doesn't close the underlying race)
Refs Keda Operator - invalid memory address or nil pointer dereference during large scaler failures #6176 (symptom-matching closed-stale issue)

github-actions · 2026-04-17T17:32:49Z

Thank you for your contribution! 🙏

Please understand that we will do our best to review your PR and give you feedback as soon as possible, but please bear with us if it takes a little longer as expected.

While you are waiting, make sure to:

Add an entry in our changelog in alphabetical order and link related issue
Update the documentation, if needed
Add unit & e2e tests for your changes
GitHub checks are passing
Is the DCO check failing? Here is how you can fix DCO issues

Once the initial tests are successful, a KEDA member will ensure that the e2e tests are run. Once the e2e tests have been successfully completed, the PR may be merged at a later date. Please be patient.

Learn more about our contribution guide.

snyk-io · 2026-04-17T17:32:54Z

✅ Snyk checks have passed. No issues have been found so far.

Status	Scan Engine	Critical	High	Medium	Low	Total (0)
✅	Open Source Security	0	0	0	0	0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

JorTurFer

Thanks for the fix!
As now it's starting to be in multiple places, does it make sense to extract that code into a funciton that we can reuse and implicitly tracks all the usages of this hack?

When the informer cache races with ScaledObject creation, scaledObject.Status.ScaleTargetGVKR can be nil at the point the scale loop invokes GetCurrentReplicas. The current code then dereferences .Group / .Kind on a nil pointer and panics, taking down the operator. This applies the same defensive pattern already used in ResolveScaleTargetPodSpec: re-fetch the ScaledObject via the client when Status.ScaleTargetGVKR is nil, and if it is still nil after re-fetch, return a descriptive error instead of panicking. Observed in a 10k-ScaledObject KWOK load test where kube-burner created ScaledObjects at 10/s; the cache-race window opened wide enough that the panic fired reliably within the first 750 objects. Refs: kedacore#4389, kedacore#4955, kedacore#6176 Signed-off-by: Greg Garber <ggarb@netflix.com>

ggarb requested a review from a team as a code owner April 17, 2026 17:32

keda-automation requested a review from a team April 17, 2026 17:32

ggarb force-pushed the fix-getcurrentreplicas-nil-gvkr branch from 3d1ea68 to 4414c85 Compare April 17, 2026 17:46

ggarb mentioned this pull request Apr 20, 2026

perf(webhooks): remove eager json.MarshalIndent from admission validation hot paths #7670

Open

ggarb force-pushed the fix-getcurrentreplicas-nil-gvkr branch 2 times, most recently from f55e568 to db90396 Compare April 23, 2026 15:11

JorTurFer reviewed Apr 26, 2026

View reviewed changes

ggarb force-pushed the fix-getcurrentreplicas-nil-gvkr branch from db90396 to 07cd22a Compare May 4, 2026 17:20

keda-automation requested a review from a team May 4, 2026 17:21

JorTurFer approved these changes May 7, 2026

View reviewed changes

rickbrouwer added Awaiting/2nd-approval This PR needs one more approval review merge-conflict This PR has a merge conflict labels May 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(scaling): guard GetCurrentReplicas against nil ScaleTargetGVKR#7661

fix(scaling): guard GetCurrentReplicas against nil ScaleTargetGVKR#7661
ggarb wants to merge 1 commit intokedacore:mainfrom
ggarb:fix-getcurrentreplicas-nil-gvkr

ggarb commented Apr 17, 2026

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

snyk-io Bot commented Apr 17, 2026 •

edited

Loading

Uh oh!

JorTurFer left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ggarb commented Apr 17, 2026

Problem

Fix

Repro context

Tests

Fixes / refs

Uh oh!

github-actions Bot commented Apr 17, 2026

Uh oh!

snyk-io Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Snyk checks have passed. No issues have been found so far.

Uh oh!

JorTurFer left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

snyk-io Bot commented Apr 17, 2026 •

edited

Loading