Skip to content

test: reproduce offering cache invalidation bug across nodeclasses after ICE#9000

Open
abhinavdahiya wants to merge 1 commit intoaws:mainfrom
abhinavdahiya:fix/offering-cache-ice-regression
Open

test: reproduce offering cache invalidation bug across nodeclasses after ICE#9000
abhinavdahiya wants to merge 1 commit intoaws:mainfrom
abhinavdahiya:fix/offering-cache-ice-regression

Conversation

@abhinavdahiya
Copy link
Copy Markdown

@abhinavdahiya abhinavdahiya commented Mar 5, 2026

Summary

  • Adds tests reproducing a regression in v1.8.x where the offering cache's lastUnavailableOfferingsSeqNum (keyed by instance type name only, not by the full cache key including zones hash) causes stale cache hits across different EC2NodeClasses after an InsufficientInstanceCapacity error.
  • Adds a test showing that when a single-zone NodeClass has its only zone ICE'd, the launch path returns a generic CreateError instead of InsufficientCapacityError, causing NodeClaims to stay stuck instead of being deleted and retried.

Both issues were observed after upgrading from v1.5.1 to v1.8.x. The tests currently assert the buggy behavior and are annotated with comments indicating what the correct behavior should be once fixed.

xref: #8909

@abhinavdahiya abhinavdahiya requested a review from a team as a code owner March 5, 2026 14:42
…ter ICE

When two EC2NodeClasses have subnets in different availability zones, the
offering cache's lastUnavailableOfferingsSeqNum (keyed by instance type name
only, not by the full cache key including zones hash) causes stale cache hits.
After an InsufficientInstanceCapacity error, whichever nodeclass is queried
first updates the shared seqNum, causing the other nodeclass to skip cache
rebuild and return pre-ICE availability data.

Additionally, when a single-zone NodeClass has its only zone ICE'd, the
launch path returns a generic CreateError instead of InsufficientCapacityError,
causing NodeClaims to stay stuck instead of being deleted and retried.

These tests reproduce both issues observed after upgrading from v1.5.1 to
v1.8.x.
@abhinavdahiya abhinavdahiya force-pushed the fix/offering-cache-ice-regression branch from 88bfce8 to af53292 Compare March 5, 2026 14:46
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 6, 2026

Preview deployment ready!

Preview URL: https://pr-9000.d18coufmbnnaag.amplifyapp.com

Built from commit af5329219e66a3ef52408e892d3cfcfa292ff337

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant