Skip to content

fix: skip redundant lastActiveTime updates to prevent cascading cooldown reset#7614

Closed
Fedosin wants to merge 1 commit intokedacore:mainfrom
Fedosin:fix-lastactivetime-redundant-update
Closed

fix: skip redundant lastActiveTime updates to prevent cascading cooldown reset#7614
Fedosin wants to merge 1 commit intokedacore:mainfrom
Fedosin:fix-lastactivetime-redundant-update

Conversation

@Fedosin
Copy link
Copy Markdown
Contributor

@Fedosin Fedosin commented Apr 7, 2026

Summary

  • Add a guard in RequestScale that skips redundant lastActiveTime writes when the existing timestamp is still within the current polling interval
  • Under client-side API throttling, the operator re-evaluates ScaledObjects before their previous status PATCH has been applied. The stale Active=True status causes updateLastActiveTime to refresh the timestamp even though the trigger is already inactive, resetting the cooldown timer and delaying scale-to-zero indefinitely
  • The guard prevents this cascading cooldown reset by recognizing that a write within the polling interval cannot meaningfully advance the cooldown window

See also: #7610
Fixes: #7613

Test plan

  • New unit test TestSkipRedundantLastActiveTimeUpdate verifies that lastActiveTime is not updated when it was already set within the polling interval (5s < 30s default)
  • New unit test TestUpdateLastActiveTimeWhenExpired verifies that lastActiveTime IS updated when enough time has elapsed (60s > 30s default)
  • All existing executor tests pass unchanged

@Fedosin Fedosin requested a review from a team as a code owner April 7, 2026 16:27
@snyk-io
Copy link
Copy Markdown

snyk-io Bot commented Apr 7, 2026

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@keda-automation keda-automation requested a review from a team April 7, 2026 16:27
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 7, 2026

Thank you for your contribution! 🙏

Please understand that we will do our best to review your PR and give you feedback as soon as possible, but please bear with us if it takes a little longer as expected.

While you are waiting, make sure to:

  • Add an entry in our changelog in alphabetical order and link related issue
  • Update the documentation, if needed
  • Add unit & e2e tests for your changes
  • GitHub checks are passing
  • Is the DCO check failing? Here is how you can fix DCO issues

Once the initial tests are successful, a KEDA member will ensure that the e2e tests are run. Once the e2e tests have been successfully completed, the PR may be merged at a later date. Please be patient.

Learn more about our contribution guide.

client.EXPECT().Status().Return(statusWriter).Times(3)
statusWriter.EXPECT().Patch(gomock.Any(), gomock.Any(), gomock.Any()).Times(3)

scaleExecutor.RequestScale(context.TODO(), &scaledObject, true, false, &ScaleExecutorOptions{})
Copy link
Copy Markdown

@semgrep-code-kedacore semgrep-code-kedacore Bot Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider to use well-defined context

🚀 Fixed in commit e388b98 🚀

@semgrep-code-kedacore
Copy link
Copy Markdown

Semgrep found 1 context-todo finding:

  • pkg/scaling/executor/scale_scaledobjects_test.go

Consider to use well-defined context

@Fedosin Fedosin force-pushed the fix-lastactivetime-redundant-update branch 2 times, most recently from 38a4dee to e388b98 Compare April 8, 2026 10:54
@wozniakjan wozniakjan added the waiting-on-other-pr All PR's that are waiting for an other PR which must be merged first label Apr 9, 2026
@wozniakjan
Copy link
Copy Markdown
Member

is it ok if we wait for #7610 to merge first? I'll review the logic and guarantees promised by lastActiveTime field meanwhile

@Fedosin
Copy link
Copy Markdown
Contributor Author

Fedosin commented Apr 10, 2026

@wozniakjan sure thing! no rush 👍

@rickbrouwer
Copy link
Copy Markdown
Member

Just for your information, the PR we were waiting for has merged.

@rickbrouwer rickbrouwer added merge-conflict This PR has a merge conflict and removed waiting-on-other-pr All PR's that are waiting for an other PR which must be merged first labels Apr 13, 2026
@Fedosin Fedosin force-pushed the fix-lastactivetime-redundant-update branch from e388b98 to 4fe1d05 Compare April 20, 2026 22:26
Every polling cycle where triggers are active, RequestScale unconditionally
sets result.LastActiveTime to time.Now(). Since the timestamp changes every
time, handleResult's DeepEqual check never short-circuits, producing a
status PATCH on every cycle for every active ScaledObject — even when
lastActiveTime only moved by a few seconds.

Add a guard that skips the lastActiveTime write when the existing timestamp
is still within the current polling interval. When nothing else changed,
DeepEqual detects no diff and the PATCH is skipped entirely, reducing
unnecessary API server load at scale.

Note: the original race condition described in kedacore#7613 (where delayed PATCHes
could reset lastActiveTime after triggers went inactive) was fixed by kedacore#7610,
which serialized evaluations under a per-object mutex with a single atomic
status PATCH. This guard is now a performance optimization rather than a
correctness fix.

Signed-off-by: Mikhail Fedosin <mfedosin@redhat.com>
@Fedosin Fedosin force-pushed the fix-lastactivetime-redundant-update branch from 4fe1d05 to 7781c96 Compare April 20, 2026 22:39
@Fedosin Fedosin changed the title fix: skip redundant lastActiveTime updates to prevent cascading cooldown reset fix: skip redundant lastActiveTime updates to reduce API server load Apr 20, 2026
@Fedosin Fedosin changed the title fix: skip redundant lastActiveTime updates to reduce API server load fix: skip redundant lastActiveTime updates to prevent cascading cooldown reset Apr 20, 2026
@Fedosin
Copy link
Copy Markdown
Contributor Author

Fedosin commented Apr 20, 2026

Closing this PR. The underlying race condition (cascading cooldown reset caused by multiple inline status PATCHes under API throttling) was fixed by #7610, which refactored the executor to return a ScaleResult and the handler to apply it as a single atomic status PATCH under a per-object mutex.

After retesting on the rebased branch, I can no longer reproduce the issue — evaluations are fully serialized and lastActiveTime stays consistent.

The guard proposed here would also introduce a minor correctness flaw: if the skip fires due to timing jitter and triggers go inactive before the next cycle, the cooldown could expire up to one polling interval early.

@Fedosin Fedosin closed this Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-conflict This PR has a merge conflict

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Client-side API throttling silently resets lastActiveTime, delaying scale-to-zero at 100+ ScaledObjects

3 participants