scheduler: requeue unschedulable bindings on cluster status changes by Tej-Katika · Pull Request #7369 · karmada-io/karmada

Tej-Katika · 2026-04-08T00:19:36Z

What this PR does / why we need it

When PriorityBasedScheduling is enabled, bindings that fail with UnschedulableError are placed in unschedulableBindings and only flushed back to activeQ by a 5-minute timer. This happens because updateCluster only reacts to ClusterSpec changes — ClusterStatus fields never increment Generation so they were silently dropped.

Two concrete problems this fixes:

ResourceSummary / Conditions — after a cluster frees capacity or transitions to Ready, affected bindings wait up to 5 minutes before retry instead of ~10 seconds.
APIEnablements — after a CRD is installed on a member cluster, bindings stuck with "0/N clusters: API resource not found" are never retried at all (workaround today is to manually patch spec.rescheduleTriggeredAt).

Changes

updateCluster: adds a new case to the existing switch that calls clusterReconcileWorker.Add(newCluster) when ResourceSummary, Conditions, or APIEnablements change. This reuses the existing reconcileCluster → enqueueAffectedBindings → priorityQueue.Push → moveToActiveQ path, which already handles moving bindings out of unschedulableBindings.
addCluster: adds clusterReconcileWorker.Add(cluster) so bindings stuck waiting for a new cluster are retried immediately on join rather than after 5 minutes.

No changes to the SchedulingQueue interface are needed.

Which issue(s) this PR fixes

Part of #7344

Prerequisites

#7340 (already merged) — fixes %w wrapping so errors.As(err, &unschedulableErr) correctly identifies the error, ensuring bindings land in unschedulableBindings rather than backoffQ.

Special notes for your reviewer

The new status-change case is intentionally placed after the generation/labels cases in the switch. If spec and status change simultaneously, the generation case fires and the existing path already requeues affected bindings — no double-trigger.

Does this PR introduce a user-facing change?

NONE

gemini-code-assist · 2026-04-08T00:21:18Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical issue where the scheduler's unschedulableBindings queue was not promptly reacting to changes in cluster status, leading to significant delays in retrying bindings. By introducing logic to trigger cluster reconciliation upon relevant status updates and new cluster additions, the system ensures that bindings are requeued and rescheduled much more efficiently, improving overall scheduling responsiveness and reducing bottlenecks.

Highlights

Improved Binding Requeue on Cluster Status Changes: Bindings that become unschedulable due to cluster status changes (e.g., ResourceSummary, Conditions, APIEnablements) will now be requeued immediately instead of waiting for a 5-minute timer, significantly reducing retry delays.
Immediate Requeue for Newly Added Clusters: Bindings waiting for a new cluster to join will now be retried immediately upon the cluster's addition, rather than experiencing a delay.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request updates the scheduler's event handler to trigger cluster reconciliation when a cluster is added or when specific status fields, such as ResourceSummary, Conditions, and APIEnablements, are modified. It also includes comprehensive updates to the unit tests to verify these new reconciliation triggers. A review comment suggests that triggering a full reconciliation on every ResourceSummary update could lead to performance issues in large-scale environments, recommending that these updates be throttled or made more selective to reduce overhead.

gemini-code-assist · 2026-04-08T00:24:02Z

+	case !equality.Semantic.DeepEqual(oldCluster.Status.ResourceSummary, newCluster.Status.ResourceSummary) ||
+		!equality.Semantic.DeepEqual(oldCluster.Status.Conditions, newCluster.Status.Conditions) ||
+		!equality.Semantic.DeepEqual(oldCluster.Status.APIEnablements, newCluster.Status.APIEnablements):
+		s.clusterReconcileWorker.Add(newCluster)


Triggering a full reconciliation of all bindings on every ResourceSummary update might lead to performance overhead in large-scale environments. In Karmada, ResourceSummary (specifically the Allocated field) can change frequently as pods are scheduled or removed in member clusters. Each such update now triggers enqueueAffectedBindings, which performs a full list and scan of all ResourceBinding and ClusterResourceBinding objects.

Consider if it's possible to:

Throttle these updates in the clusterReconcileWorker.

Be more selective: For example, only trigger a requeue when resources are freed (i.e., Allocated decreases or Allocatable increases), as those are the cases where an unschedulable binding is most likely to now fit.

codecov-commenter · 2026-04-08T00:39:23Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 61.53846% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 41.94%. Comparing base (773dcf0) to head (03485e0).
⚠️ Report is 34 commits behind head on master.

Files with missing lines	Patch %	Lines
pkg/scheduler/internal/queue/scheduling_queue.go	0.00%	5 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #7369      +/-   ##
==========================================
- Coverage   42.15%   41.94%   -0.22%     
==========================================
  Files         875      879       +4     
  Lines       53618    54339     +721     
==========================================
+ Hits        22602    22790     +188     
- Misses      29315    29828     +513     
- Partials     1701     1721      +20

Flag	Coverage Δ
unittests	`41.94% <61.53%> (-0.22%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

This PR implements event-driven requeue for unschedulable bindings when cluster status changes occur, complementing PR #7340 which fixed error type propagation. The changes enable the scheduler to immediately reprocess bindings that are waiting due to cluster resource unavailability or missing API enablements, instead of waiting up to 5 minutes for a timer-based flush.

Changes:

Modify addCluster to always queue cluster reconciliation, triggering immediate requeue of affected bindings when a new cluster joins
Add a new status-change case to updateCluster to detect ResourceSummary, Conditions, or APIEnablements changes and trigger cluster reconciliation
Expand test coverage to validate both estimator and reconcile worker behavior in cluster event handlers

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
pkg/scheduler/event_handler.go	Adds cluster reconciliation to `addCluster` and status change detection to `updateCluster` switch statement
pkg/scheduler/event_handler_test.go	Refactors and expands tests for cluster event handlers to cover new status change handling and worker interactions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

whitewindmills · 2026-04-08T10:30:58Z

/assign

seanlaii

Hi @Tej-Katika , thanks for the PR!
/assign

seanlaii

One thing I'm slightly concerned about is the CPU cost when ResourceSummary changes frequently — since it updates every 10s by default and enqueueAffectedBindings does a full scan of all bindings, we could end up with a lot of unnecessary work: listing all bindings, doing affinity matching, and requeueing bindings that will just be popped and short-circuited in doScheduleBinding without any actual scheduling needed.

What do you think about adding a guard so we only trigger reconciliation when there are bindings that could actually benefit from it? Something like:

case !equality.Semantic.DeepEqual(oldCluster.Status.Conditions, newCluster.Status.Conditions) || !equality.Semantic.DeepEqual(oldCluster.Status.APIEnablements, newCluster.Status.APIEnablements):  
    s.clusterReconcileWorker.Add(newCluster)                                                            
case !equality.Semantic.DeepEqual(oldCluster.Status.ResourceSummary, newCluster.Status.ResourceSummary):
    if features.FeatureGate.Enabled(features.PriorityBasedScheduling) && s.priorityQueue.HasUnschedulableBindings() {                                                    
         s.clusterReconcileWorker.Add(newCluster)                                                        
    }

This way Conditions/APIEnablements changes (low frequency, semantically significant) always trigger reconciliation, while ResourceSummary changes (high frequency, often just noise) only trigger it when there are unschedulable bindings that might actually benefit from rescheduling.

seanlaii · 2026-04-14T17:02:44Z

Also, A few edge cases around switch-case ordering might be worth adding to guard against future regressions:
For example:

Generation + status change simultaneously — generation case should take precedence (expect 2 adds, not 3)
DeletionTimestamp + status change simultaneously — deletion case should take precedence (expect 1 add)
Identical non-nil status — DeepEqual returns true, should not trigger reconcile (expect 0 adds)

These would catch accidental reordering of the switch cases in the future.

seanlaii · 2026-04-14T17:03:29Z

 	for _, tt := range tests {
 		t.Run(tt.name, func(t *testing.T) {
-			mockWorker := &mockAsyncWorker{}
+			estimatorWorker := &mockAsyncWorker{}


Could you elaborate the reason of validating estimator here?

It seems that this is not related to this PR, but it is a good enhancement. Maybe we can separate this to a new PR. WDYT?

The original TestAddCluster already validated the estimator worker using a single mockWorker. When this PR introduced clusterReconcileWorker.Add(cluster) in addCluster, I needed a separate mock for each worker to independently assert their call counts — otherwise the counts would be conflated across both workers. Keeping the estimator assertions in the same test ensures the refactoring didn't accidentally break the existing estimator path while adding the reconcile path. It tests the full behavior of addCluster in one place rather than leaving the estimator path uncovered.

Yeah, that's a fair point. The worker split was structurally necessary to test the new reconcile call independently, but the new estimator else-branch assertions (checking addCount == 0 when estimator is disabled) are purely defensive additions that weren't in the original test. I can strip those back out to keep the diff tightly scoped to this PR. I'll revert the estimator side to match the original assertion style, and only keep what's needed to validate the new clusterReconcileWorker.Add behavior.

seanlaii · 2026-04-14T17:11:39Z

cc @RainbowMango @zhzhuang-zju to take a look as well. Thanks!

Tej-Katika · 2026-04-15T06:24:43Z

ResourceSummary is updated by the cluster-status-controller on every sync cycle (~10s by default), so unconditionally running enqueueAffectedBindings on each update would be expensive in large-scale clusters with many bindings.
The proposed split makes sense. One thing to flag before implementing it: HasUnschedulableBindings() doesn't currently exist on the SchedulingQueue interface or prioritySchedulingQueue. We'd need to add it — something like:
// in SchedulingQueue interface:
HasUnschedulableBindings() bool

// on prioritySchedulingQueue:
func (bq *prioritySchedulingQueue) HasUnschedulableBindings() bool {
bq.lock.RLock()
defer bq.lock.RUnlock()
return bq.unschedulableBindings.Len() > 0
}
Alternatively, since s.priorityQueue is already nil when PriorityBasedScheduling is disabled (it's only initialized inside the feature gate check in New()), the guard could be written as:

case !equality.Semantic.DeepEqual(oldCluster.Status.ResourceSummary, newCluster.Status.ResourceSummary):
if s.priorityQueue != nil && s.priorityQueue.HasUnschedulableBindings() {
s.clusterReconcileWorker.Add(newCluster)
}
This is equivalent to the feature gate check but avoids importing the features package in event_handler.go. Does adding HasUnschedulableBindings() to the SchedulingQueue interface sound reasonable, or would you prefer a different approach?
I'll update the PR with the split cases + the new method once you confirm the direction.

seanlaii · 2026-04-18T01:19:58Z

ResourceSummary is updated by the cluster-status-controller on every sync cycle (~10s by default), so unconditionally running enqueueAffectedBindings on each update would be expensive in large-scale clusters with many bindings. The proposed split makes sense. One thing to flag before implementing it: HasUnschedulableBindings() doesn't currently exist on the SchedulingQueue interface or prioritySchedulingQueue. We'd need to add it — something like: // in SchedulingQueue interface: HasUnschedulableBindings() bool

Adding HasUnschedulableBindings() sounds good to me.

if s.priorityQueue != nil

Make sense to me.

karmada-bot · 2026-04-18T04:59:36Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from seanlaii. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

pkg/scheduler/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Tej-Katika · 2026-04-19T06:34:47Z

cc @seanlaii @RainbowMango @zhzhuang-zju

Please take a look. Thanks!

whitewindmills · 2026-04-28T08:31:27Z

 	if s.enableSchedulerEstimator {
 		s.schedulerEstimatorWorker.Add(cluster.Name)
 	}
+	s.clusterReconcileWorker.Add(cluster)


addCluster is invoked for existing Cluster objects during informer initial cache population, not only for real new-cluster joins. This unconditional enqueue means a scheduler restart queues one cluster reconciliation per existing cluster; after cache sync each item scans all ResourceBindings/ClusterResourceBindings and may requeue matching bindings even though no cluster changed. In large installations this can turn startup into O(clusters * bindings) reconcile work. Consider gating this like the ResourceSummary path, for example only enqueueing when s.priorityQueue != nil && s.priorityQueue.HasUnschedulableBindings(), or otherwise distinguishing real post-start joins from initial informer replay.

Addressed in 843cd37b — gated on s.priorityQueue != nil && s.priorityQueue.HasUnschedulableBindings() and dropped the unconditional reconcile-worker enqueue in favor of MoveAllToActive().

whitewindmills · 2026-04-28T08:35:04Z

 	// Len returns the length of activeQ.
 	Len() int

+	// HasUnschedulableBindings reports whether the unschedulableBindings sub-queue is non-empty.


emm, precise, it is not a queue.

whitewindmills · 2026-04-28T08:55:44Z

+		s.clusterReconcileWorker.Add(newCluster)
+	case !equality.Semantic.DeepEqual(oldCluster.Status.ResourceSummary, newCluster.Status.ResourceSummary):
+		if s.priorityQueue != nil && s.priorityQueue.HasUnschedulableBindings() {
+			s.clusterReconcileWorker.Add(newCluster)


can we focus on unschedulableBindings instead of a full scan of all bindings?

Both the Conditions/APIEnablements case and the ResourceSummary case now call MoveAllToActive() directly, so
we no longer fan out into a full binding scan via clusterReconcileWorker.Add → enqueueAffectedBindings.

karmada-bot · 2026-04-29T14:22:05Z

@Tej-Katika: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

whitewindmills · 2026-04-30T07:55:02Z

 	Len() int

+	// HasUnschedulableBindings reports whether the unschedulableBindings map is non-empty.
+	HasUnschedulableBindings() bool


MoveAllToActive can be called safely, does we still need this?

Agreed. Since MoveAllToActive() is already safe to call when the unschedulable map is empty, the
HasUnschedulableBindings() guard is redundant. I'll drop the interface method and its implementation, and simplify the three call sites in event_handler.go to just if s.priorityQueue != nil { s.priorityQueue.MoveAllToActive()}.

Dropped HasUnschedulableBindings() and simplified the three call sites to if s.priorityQueue != nil { s.priorityQueue.MoveAllToActive() }. Updated the docstring on MoveAllToActive to note it's safe to call when the unschedulable map is empty.

When a cluster's status changes (conditions, API enablements, or resource summary), previously unschedulable bindings may now be schedulable. This commit adds event handlers so such changes flush unschedulable bindings directly to activeQ via MoveAllToActive(), avoiding a full scan of all ResourceBindings. - addCluster: call MoveAllToActive() so bindings stuck waiting for a new cluster to join are retried immediately - updateCluster: replace clusterReconcileWorker.Add with MoveAllToActive() for status-change cases; the direct flush is cheaper than enqueueing a full binding scan via reconcileWorker - MoveAllToActive() is safe to call when unschedulableBindings is empty, so no separate guard is needed Signed-off-by: Tejashwar Reddy Katika <tejashwar1029@gmail.com>

karmada-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. labels Apr 8, 2026

karmada-bot requested review from Garrybest and mrlihanbo April 8, 2026 00:19

karmada-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 8, 2026

Tej-Katika force-pushed the fix/scheduler-requeue-on-cluster-status-change branch from e9e4414 to a2fea2f Compare April 8, 2026 00:22

karmada-bot removed the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Apr 8, 2026

gemini-code-assist Bot reviewed Apr 8, 2026

View reviewed changes

Tej-Katika force-pushed the fix/scheduler-requeue-on-cluster-status-change branch from a2fea2f to d25796a Compare April 8, 2026 01:24

Tej-Katika marked this pull request as ready for review April 8, 2026 02:26

Copilot AI review requested due to automatic review settings April 8, 2026 02:26

karmada-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 8, 2026

karmada-bot requested review from whitewindmills and zhzhuang-zju April 8, 2026 02:26

Copilot started reviewing on behalf of Tej-Katika April 8, 2026 02:27 View session

Copilot AI reviewed Apr 8, 2026

View reviewed changes

Tej-Katika mentioned this pull request Apr 8, 2026

Scheduler: add event-driven requeue for unschedulable bindings on cluster status changes #7344

Open

karmada-bot assigned whitewindmills Apr 8, 2026

seanlaii reviewed Apr 14, 2026

View reviewed changes

karmada-bot assigned seanlaii Apr 14, 2026

seanlaii reviewed Apr 14, 2026

View reviewed changes

seanlaii mentioned this pull request Apr 17, 2026

[Proposal] Binding preemption Design for Karmada scheduler #7327

Open

Tej-Katika force-pushed the fix/scheduler-requeue-on-cluster-status-change branch 2 times, most recently from 9a1096f to 8c15034 Compare April 18, 2026 20:04

whitewindmills reviewed Apr 28, 2026

View reviewed changes

Tej-Katika force-pushed the fix/scheduler-requeue-on-cluster-status-change branch 2 times, most recently from 465e9fb to 843cd37 Compare April 29, 2026 04:45

whitewindmills reviewed Apr 30, 2026

View reviewed changes

Tej-Katika force-pushed the fix/scheduler-requeue-on-cluster-status-change branch from 843cd37 to 03485e0 Compare April 30, 2026 13:10

Tej-Katika requested review from seanlaii and whitewindmills May 5, 2026 01:51

Conversation

Tej-Katika commented Apr 8, 2026

Uh oh!

gemini-code-assist Bot commented Apr 8, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

whitewindmills commented Apr 8, 2026

Uh oh!

seanlaii left a comment

Choose a reason for hiding this comment

Uh oh!

seanlaii left a comment

Choose a reason for hiding this comment

Uh oh!

seanlaii commented Apr 14, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seanlaii commented Apr 14, 2026

Uh oh!

Tej-Katika commented Apr 15, 2026

Uh oh!

seanlaii commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

karmada-bot commented Apr 18, 2026

Uh oh!

Tej-Katika commented Apr 19, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karmada-bot commented Apr 29, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

codecov-commenter commented Apr 8, 2026 •

edited

Loading

seanlaii commented Apr 18, 2026 •

edited

Loading