[Proposal] scheduler estimator reservation#7416
[Proposal] scheduler estimator reservation#7416karmada-bot merged 1 commit intokarmada-io:masterfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses the resource over-commitment issue in the Karmada scheduler caused by the architectural latency between scheduling decisions and the actual reflection of resource consumption in the estimator's snapshot. By implementing a reservation mechanism that tracks in-flight scheduling decisions and deducts them from available cluster capacity, the system ensures more accurate resource estimation and prevents scheduling conflicts in high-throughput scenarios. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a proposal for multi-cluster workload scheduling with resource reservation to mitigate resource over-commitment during rapid scheduling. It details a scheduler-side reservation cache and estimator-side deduction using a First Fit algorithm. The review feedback points out a missing "Namespace" field in the "ReservedWorkload" structure required for namespaced resource quota checks and highlights an inconsistency in the documentation concerning the reservation release trigger.
There was a problem hiding this comment.
Pull request overview
Adds a new scheduling proposal document describing a “reservation” mechanism to avoid stale resource snapshots in karmada-scheduler-estimator during rapid consecutive aggregated scheduling decisions (ref #6783).
Changes:
- Introduces a detailed design proposal for scheduler-side reservation caching and estimator-side deduction via extended gRPC requests.
- Documents reservation lifecycle, release strategy options, risks/mitigations, and a test plan.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #7416 +/- ##
=======================================
Coverage 41.92% 41.92%
=======================================
Files 879 879
Lines 54328 54328
=======================================
+ Hits 22778 22779 +1
+ Misses 29828 29827 -1
Partials 1722 1722
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
1ff5134 to
c8b8d4b
Compare
mszacillo
left a comment
There was a problem hiding this comment.
Just looked over this proposal and left a few thoughts. Thank you for writing this up, it is incredibly thorough!
Out of curiosity, do we think its worth adding a note on how this feature will work with the node autoscaler support for estimator being discussed: #7375? The implementation is probably not too different from the other estimators discussed in this proposal, but perhaps worth mentioning.
Thanks @mszacillo let me do it. |
c8b8d4b to
99e7bfc
Compare
|
/retest |
99e7bfc to
08b05a6
Compare
08b05a6 to
d0a5f03
Compare
|
/retest |
d0a5f03 to
05b1b49
Compare
RainbowMango
left a comment
There was a problem hiding this comment.
Generally, it looks good to me.
Just briefly summarize what we need to do with this proposal here(ensure we are all on the same page):
- Maintain a list of pending
ResourceBindings' in thekarmada-scheduler` cache- Append ResourceBinding to the list after its scheduling is completed.
- Remove the ResourceBinding from the list when its status becomes healthy.
- Set a TTL to ensure that the ResourceBinding won't stay in the list forever.
- When evaluating available replicas, the
karmada-schedulerpasses this list to the estimator. - When the estimator evaluates resources, it first deducts the resources required by these pending ResourceBindings.
PS: Although several further optimization plans are listed in the secondary-verification-en.md file, none of them are ideal. So, this attempt will be excluded from this proposal. (A separate proposal is needed when there is a need.)
@mszacillo Michas, do you have any further comments or concerns?
05b1b49 to
0c6dfe8
Compare
fb71dca to
378afb4
Compare
|
All comments have been updated ~ |
|
Did one more pass over the proposal, overall it looks good! Thank you for addressing all the comments @XiShanYongYe-Chang. I left a single nit comment, and will review the secondary verification proposal when ready. Let me know when you create the umbrella item for this proposal! I'd be happy to contribute. |
378afb4 to
8a286f1
Compare
Signed-off-by: changzhen <changzhen5@huawei.com>
8a286f1 to
fc8341f
Compare
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: RainbowMango The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @mszacillo, I created an umbrella issue to track the task. Welcome to claim the tasks you are interested in. If there are any missing tasks, please feel free to point them out. Thank you. :) |
What type of PR is this?
/kind feature
/kind documentation
What this PR does / why we need it:
ref #6783
Which issue(s) this PR fixes:
Fixes #6783
Special notes for your reviewer:
Does this PR introduce a user-facing change?: