test: reproduce offering cache invalidation bug across nodeclasses after ICE#9000
Open
abhinavdahiya wants to merge 1 commit intoaws:mainfrom
Open
test: reproduce offering cache invalidation bug across nodeclasses after ICE#9000abhinavdahiya wants to merge 1 commit intoaws:mainfrom
abhinavdahiya wants to merge 1 commit intoaws:mainfrom
Conversation
…ter ICE When two EC2NodeClasses have subnets in different availability zones, the offering cache's lastUnavailableOfferingsSeqNum (keyed by instance type name only, not by the full cache key including zones hash) causes stale cache hits. After an InsufficientInstanceCapacity error, whichever nodeclass is queried first updates the shared seqNum, causing the other nodeclass to skip cache rebuild and return pre-ICE availability data. Additionally, when a single-zone NodeClass has its only zone ICE'd, the launch path returns a generic CreateError instead of InsufficientCapacityError, causing NodeClaims to stay stuck instead of being deleted and retried. These tests reproduce both issues observed after upgrading from v1.5.1 to v1.8.x.
88bfce8 to
af53292
Compare
Contributor
|
Preview deployment ready! Preview URL: https://pr-9000.d18coufmbnnaag.amplifyapp.com Built from commit |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
lastUnavailableOfferingsSeqNum(keyed by instance type name only, not by the full cache key including zones hash) causes stale cache hits across different EC2NodeClasses after an InsufficientInstanceCapacity error.CreateErrorinstead ofInsufficientCapacityError, causing NodeClaims to stay stuck instead of being deleted and retried.Both issues were observed after upgrading from v1.5.1 to v1.8.x. The tests currently assert the buggy behavior and are annotated with comments indicating what the correct behavior should be once fixed.
xref: #8909