Skip to content

feat: expose AMI cache TTL as runtime flag#9052

Open
chrisdoherty4 wants to merge 1 commit intoaws:mainfrom
chrisdoherty4:cpd-ami-cache-requeue-01
Open

feat: expose AMI cache TTL as runtime flag#9052
chrisdoherty4 wants to merge 1 commit intoaws:mainfrom
chrisdoherty4:cpd-ami-cache-requeue-01

Conversation

@chrisdoherty4
Copy link
Copy Markdown

@chrisdoherty4 chrisdoherty4 commented Apr 3, 2026

Fixes #N/A

Description

Operators running large fleets (15,000 nodes across 50+ clusters) with 10s of node classes can generate significant DescribeImages API call volume because the reconciler requeues periodically (order of 30s-1m) and uses a hardcoded 1-minute cache TTL. This change makes the cache TTL independently configurable so users can decide an appropriate AMI cache time for their usecase:

Flag Env var Default
--ami-cache-ttl AMI_CACHE_TTL 1m

Default preserve existing behavior.

How was this change tested?

  • Unit tests added to pkg/operator/options/suite_test.go covering CLI
    flag override and env var fallback, and validation rejection of non-positive values.
  • All existing unit tests pass.

Does this change impact docs?

  • Yes, PR includes docs updates
  • Yes, issue opened: #
  • No

@chrisdoherty4 chrisdoherty4 requested a review from a team as a code owner April 3, 2026 02:52
@chrisdoherty4 chrisdoherty4 requested a review from ryan-mist April 3, 2026 02:52
@chrisdoherty4 chrisdoherty4 marked this pull request as draft April 3, 2026 14:17
@chrisdoherty4
Copy link
Copy Markdown
Author

chrisdoherty4 commented Apr 3, 2026

Looking deeper it seems a handful of reconcilers set a shorter TTL than the minimum requeue time for the AMI reconciler making the --ami-requeue-interval rather useless.

The cache TTL configurability does help reduce the API calls so that still feels like a worth while configuration option - longer cache windows are acceptable in our case.

@chrisdoherty4 chrisdoherty4 marked this pull request as ready for review April 3, 2026 16:43
Operators running large fleets can generate significant DescribeImages
API call volume due to frequent AMI reconciles. This change makes the
AMI cache TTL configurable so operators can tune them for their workload
without rebuilding.

  --ami-cache-ttl        (env: AMI_CACHE_TTL,        default: 1m)

Default preserve existing behaviour.
@chrisdoherty4 chrisdoherty4 force-pushed the cpd-ami-cache-requeue-01 branch from a25243a to d3b7986 Compare April 3, 2026 19:51
@chrisdoherty4
Copy link
Copy Markdown
Author

chrisdoherty4 commented Apr 3, 2026

Modified the PR to only expose AMI cache TTL. Being able to tweak this for our use case greatly improves API calls and avoids hitting rate limits.

@chrisdoherty4 chrisdoherty4 changed the title feat: expose AMI cache TTL and requeue interval as runtime flags feat: expose AMI cache TTL as runtime flags Apr 7, 2026
@chrisdoherty4 chrisdoherty4 changed the title feat: expose AMI cache TTL as runtime flags feat: expose AMI cache TTL as runtime flag Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant