Skip to content

feat: Integrate AWS Application Recovery Controller Zonal Shift with Karpenter#9042

Open
sagdhana wants to merge 1 commit intoaws:mainfrom
sagdhana:zonalshiftintegration
Open

feat: Integrate AWS Application Recovery Controller Zonal Shift with Karpenter#9042
sagdhana wants to merge 1 commit intoaws:mainfrom
sagdhana:zonalshiftintegration

Conversation

@sagdhana
Copy link
Copy Markdown

Fixes #N/A

Description

This change integrates AWS's Application Recovery Controller Zonal Shift with Karpenter. Occasionally, zones in cloud providers can experience temporary outages. During these events, Karpenter's actions do not improve its cluster's availability posture and can sometimes exacerbate the scenario. By integrating Zonal Shift, Karpenter can be made aware of a customer's intention to shift traffic and scaling away from a zone.

Zonal Shift Documentation: https://docs.aws.amazon.com/r53recovery/latest/dg/arc-zonal-shift.html
Using Zonal Shift with EKS: https://docs.aws.amazon.com/r53recovery/latest/dg/arc-zonal-shift.resource-types.eks.html

How was this change tested?

Local testing by deploying to an AWS EKS cluster and validating that the integration works as expected.

Does this change impact docs?

  • Yes, PR includes docs updates
  • [X ] Yes, issue opened: #
  • [] No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@sagdhana sagdhana requested a review from a team as a code owner March 30, 2026 21:11
@DerekFrank
Copy link
Copy Markdown
Contributor

/assign

"github.com/awslabs/operatorpkg/option"
"sigs.k8s.io/controller-runtime/pkg/manager"

arczonalshiftProvider "github.com/aws/karpenter-provider-aws/pkg/providers/arczonalshift"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
arczonalshiftProvider "github.com/aws/karpenter-provider-aws/pkg/providers/arczonalshift"
ZonalShiftProvider arconalshiftprovider.Provider

We don't use camel case in imports


type DefaultProvider struct {
sync.RWMutex
zonalShiftStatuses map[string]shiftStatus
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: when you have a [string] map, its nice to add a comment that tells people what it maps from->to

Suggested change
zonalShiftStatuses map[string]shiftStatus
zonalShiftStatuses map[string]shiftStatus // map zoneid (string) -> shiftStatus

sync.RWMutex
zonalShiftStatuses map[string]shiftStatus

client sdk.ARCZonalShiftAPI
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: follow the usage of ec2API in other providers

Suggested change
client sdk.ARCZonalShiftAPI
arcZonalShiftAPI sdk.ARCZonalShiftAPI

return nil
}

func (p *DefaultProvider) IsZonalShifted(ctx context.Context, zone string) bool {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we should be specific if its zoneId or zoneName and only accept one unless we need to accept multiple

}), "failed to setup node instanceID indexer")
}

func GetAvailablityZoneMapping(ctx context.Context, ec2Api sdk.EC2API) map[string]string {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to exist, we have the mapping from subnets already, more below

func (p *DefaultProvider) IsZonalShifted(ctx context.Context, zone string) bool {
p.RLock()
defer p.RUnlock()
//if shift, ok := p.zonalShiftStatuses[zone]; ok {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming we'll rip these out :D

}
}

func (p *DefaultProvider) FetchZonalShifts(ctx context.Context) (map[string]shiftStatus, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe ListZonalShifts?

if getManagedResourceErr != nil {
// Resource is not found/registered in Zonal Shift. Log a message and use the NoopProvider so we don't block starting up.
log.FromContext(ctx).WithValues("Cluster", clusterArn).V(1).Info("Cluster not found in Zonal Shift")
zsProvider = arczonalshiftProvider.NewNoopProvider()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably panic, I don't think we should fail with just a log

_, getManagedResourceErr := arczonalshiftAPI.GetManagedResource(ctx, &inputGMR)
if getManagedResourceErr != nil {
// Resource is not found/registered in Zonal Shift. Log a message and use the NoopProvider so we don't block starting up.
log.FromContext(ctx).WithValues("Cluster", clusterArn).V(1).Info("Cluster not found in Zonal Shift")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error not Info

}

func (c *Controller) Register(_ context.Context, m manager.Manager) error {
return controllerruntime.NewControllerManagedBy(m).Named("zonalshift").WatchesRawSource(singleton.Source()).Complete(singleton.AsReconciler(c))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can put these . calls on new lines

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants