Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -516,6 +516,31 @@ spec:
- optional
type: string
type: object
placementGroup:
description: |-
PlacementGroup configures the EC2 placement group that Karpenter should launch instances into.
The referenced placement group must already exist; Karpenter does not create or delete placement groups.
properties:
id:
description: ID is the placement group id in EC2. This must be used when launching into a shared placement group.
pattern: ^pg-[0-9a-z]+$
type: string
name:
description: Name is the name of the placement group in EC2.
maxLength: 255
type: string
partition:
description: |-
Partition is the partition number that instances should launch into.
Valid only for partition placement groups.
format: int32
maximum: 7
minimum: 1
type: integer
type: object
x-kubernetes-validations:
- message: expected exactly one of ['name', 'id']
rule: has(self.name) != has(self.id)
role:
description: |-
Role is the AWS identity that nodes use.
Expand Down Expand Up @@ -811,6 +836,41 @@ spec:
instanceProfile:
description: InstanceProfile contains the resolved instance profile for the role
type: string
placementGroup:
description: PlacementGroup contains the current placement group that is available to this NodeClass under the placementGroup reference.
properties:
id:
description: ID of the placement group.
pattern: ^pg-[0-9a-z]+$
type: string
name:
description: Name of the placement group.
type: string
partitionCount:
description: PartitionCount is the number of partitions configured on the placement group.
format: int32
type: integer
spreadLevel:
description: SpreadLevel determines how instances are spread when the placement group strategy is spread.
enum:
- host
- rack
type: string
state:
description: State of the placement group.
type: string
strategy:
description: Strategy of the placement group.
enum:
- cluster
- spread
- partition
type: string
required:
- id
- name
- strategy
type: object
securityGroups:
description: |-
SecurityGroups contains the current security group values that are available to the
Expand Down
1 change: 1 addition & 0 deletions cmd/controller/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ func main() {
cloudProvider,
op.SubnetProvider,
op.SecurityGroupProvider,
op.PlacementGroupProvider,
op.InstanceProfileProvider,
op.InstanceProvider,
op.PricingProvider,
Expand Down
84 changes: 84 additions & 0 deletions designs/placement-groups.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# Placement Group Support

## Context

Amazon EC2 placement groups let operators influence instance placement for low-latency (`cluster`), failure-domain isolation (`partition`), and small critical workloads (`spread`). The long-standing request in https://github.com/aws/karpenter-provider-aws/issues/3324 is to make these groups usable from `EC2NodeClass`.

Karpenter already treats `EC2NodeClass` as launch configuration for existing AWS resources such as subnets, security groups, AMIs, and instance profiles. Placement groups fit best when modeled the same way.

## Problem

Users can launch Karpenter-managed nodes into subnets, security groups, and capacity reservations, but cannot direct those nodes into an existing placement group. This blocks workloads that already rely on EC2 placement-group semantics, for example:

- tightly-coupled clusters that need cluster placement-group networking
- replicated systems that want partition placement-group isolation
- small critical workloads that want spread placement-group separation

The previously proposed design in #5389 focused on Karpenter creating placement groups. That adds a new EC2 resource lifecycle to reconcile and exposes strategy-specific creation APIs that users may rely on long term.

## Options

### Option 1: Karpenter creates and owns placement groups

Pros:

- users can describe strategy directly in `EC2NodeClass`
- Karpenter could validate strategy-specific configuration at reconciliation time

Cons:

- introduces new lifecycle ownership for EC2 resources outside the current launch path
- expands the stable API surface with strategy creation details such as `cluster`, `spread`, `partition`, partition count, and spread level
- complicates shared placement groups and future AWS-specific variants
- makes rollback and drift semantics harder because the placement group becomes a controller-managed dependency

### Option 2: Karpenter references an existing placement group

Pros:

- matches how `EC2NodeClass` already models other AWS launch dependencies
- keeps the API small: identify the group and optionally pin a partition
- works for user-managed, shared, and externally tagged placement groups
- avoids inventing a placement-group controller lifecycle before demand is proven

Cons:

- users must provision the placement group out of band
- Karpenter cannot configure placement-group strategy on behalf of the user

## Recommendation

Add an optional `spec.placementGroup` field on `EC2NodeClass`:

```yaml
spec:
placementGroup:
name: analytics-partition
partition: 2
```

Behavior:

- `name` or `id` identifies the existing placement group; the fields are mutually exclusive
- `id` supports shared placement groups, which require `GroupId` during launch
- `partition` is optional and only meaningful for partition placement groups
- Karpenter resolves the configured group into `status.placementGroup`
- launch templates include the placement-group reference so both `CreateFleet` and `RunInstances` honor it

## Key Decisions

- Karpenter does not create, tag, delete, or mutate placement groups in this design
- placement-group strategy remains an operator concern because it belongs to the EC2 placement-group resource, not the instance launch request
- partition selection is the only launch-time knob worth exposing initially because AWS applies it at instance launch and it is useful even when the placement group is created elsewhere

## User Guidance

- Use `name` for placement groups in the same account and `id` for shared placement groups
- Pair cluster placement groups with subnet or topology constraints that keep launches in a single Availability Zone
- Omit `partition` to let EC2 distribute instances across partitions, or set it when the workload needs explicit partition affinity

## Future Work

- richer status surfacing for placement-group strategy and readiness
- strategy-aware validation and scheduling hints
- a separate proposal for Karpenter-managed placement-group lifecycle if real demand justifies the larger API
38 changes: 38 additions & 0 deletions examples/v1/placement-group.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: placement-group
spec:
template:
spec:
requirements:
- key: topology.kubernetes.io/zone
operator: In
values:
- us-west-2a
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: placement-group
---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: placement-group
spec:
amiFamily: AL2023
role: "KarpenterNodeRole-${CLUSTER_NAME}"
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: "${CLUSTER_NAME}"
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: "${CLUSTER_NAME}"
amiSelectorTerms:
- alias: al2023@latest
placementGroup:
# Use `name` for placement groups in the same account.
# Use `id` instead when launching into a shared placement group.
name: analytics-partition
# Optional, only valid for partition placement groups.
partition: 2
1 change: 1 addition & 0 deletions kwok/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,7 @@ func main() {
cloudProvider,
op.SubnetProvider,
op.SecurityGroupProvider,
op.PlacementGroupProvider,
op.InstanceProfileProvider,
op.InstanceProvider,
op.PricingProvider,
Expand Down
4 changes: 4 additions & 0 deletions kwok/operator/operator.go
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ import (
"github.com/aws/karpenter-provider-aws/pkg/providers/instanceprofile"
"github.com/aws/karpenter-provider-aws/pkg/providers/instancetype"
"github.com/aws/karpenter-provider-aws/pkg/providers/launchtemplate"
"github.com/aws/karpenter-provider-aws/pkg/providers/placementgroup"
"github.com/aws/karpenter-provider-aws/pkg/providers/pricing"
"github.com/aws/karpenter-provider-aws/pkg/providers/securitygroup"
ssmp "github.com/aws/karpenter-provider-aws/pkg/providers/ssm"
Expand All @@ -83,6 +84,7 @@ type Operator struct {
RecreationCache *cache.Cache
SubnetProvider subnet.Provider
SecurityGroupProvider securitygroup.Provider
PlacementGroupProvider placementgroup.Provider
InstanceProfileProvider instanceprofile.Provider
AMIProvider amifamily.Provider
AMIResolver amifamily.Resolver
Expand Down Expand Up @@ -138,6 +140,7 @@ func NewOperator(ctx context.Context, operator *operator.Operator) (context.Cont
cfg.Region,
false,
)
placementGroupProvider := placementgroup.NewDefaultProvider(ec2api, cache.New(awscache.DefaultTTL, awscache.DefaultCleanupInterval))
versionProvider := version.NewDefaultProvider(operator.KubernetesInterface, eksapi)
// Ensure we're able to hydrate the version before starting any reliant controllers.
// Version updates are hydrated asynchronously after this, in the event of a failure
Expand Down Expand Up @@ -205,6 +208,7 @@ func NewOperator(ctx context.Context, operator *operator.Operator) (context.Cont
RecreationCache: recreationCache,
SubnetProvider: subnetProvider,
SecurityGroupProvider: securityGroupProvider,
PlacementGroupProvider: placementGroupProvider,
InstanceProfileProvider: instanceProfileProvider,
AMIProvider: amiProvider,
AMIResolver: amiResolver,
Expand Down
60 changes: 60 additions & 0 deletions pkg/apis/crds/karpenter.k8s.aws_ec2nodeclasses.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -513,6 +513,31 @@ spec:
- optional
type: string
type: object
placementGroup:
description: |-
PlacementGroup configures the EC2 placement group that Karpenter should launch instances into.
The referenced placement group must already exist; Karpenter does not create or delete placement groups.
properties:
id:
description: ID is the placement group id in EC2. This must be used when launching into a shared placement group.
pattern: ^pg-[0-9a-z]+$
type: string
name:
description: Name is the name of the placement group in EC2.
maxLength: 255
type: string
partition:
description: |-
Partition is the partition number that instances should launch into.
Valid only for partition placement groups.
format: int32
maximum: 7
minimum: 1
type: integer
type: object
x-kubernetes-validations:
- message: expected exactly one of ['name', 'id']
rule: has(self.name) != has(self.id)
role:
description: |-
Role is the AWS identity that nodes use.
Expand Down Expand Up @@ -808,6 +833,41 @@ spec:
instanceProfile:
description: InstanceProfile contains the resolved instance profile for the role
type: string
placementGroup:
description: PlacementGroup contains the current placement group that is available to this NodeClass under the placementGroup reference.
properties:
id:
description: ID of the placement group.
pattern: ^pg-[0-9a-z]+$
type: string
name:
description: Name of the placement group.
type: string
partitionCount:
description: PartitionCount is the number of partitions configured on the placement group.
format: int32
type: integer
spreadLevel:
description: SpreadLevel determines how instances are spread when the placement group strategy is spread.
enum:
- host
- rack
type: string
state:
description: State of the placement group.
type: string
strategy:
description: Strategy of the placement group.
enum:
- cluster
- spread
- partition
type: string
required:
- id
- name
- strategy
type: object
securityGroups:
description: |-
SecurityGroups contains the current security group values that are available to the
Expand Down
25 changes: 25 additions & 0 deletions pkg/apis/v1/ec2nodeclass.go
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,11 @@ type EC2NodeClassSpec struct {
// +kubebuilder:validation:MaxItems:=30
// +optional
CapacityReservationSelectorTerms []CapacityReservationSelectorTerm `json:"capacityReservationSelectorTerms" hash:"ignore"`
// PlacementGroup configures the EC2 placement group that Karpenter should launch instances into.
// The referenced placement group must already exist; Karpenter does not create or delete placement groups.
// +kubebuilder:validation:XValidation:message="expected exactly one of ['name', 'id']",rule="has(self.name) != has(self.id)"
// +optional
PlacementGroup *PlacementGroup `json:"placementGroup,omitempty" hash:"ignore"`
// AssociatePublicIPAddress controls if public IP addresses are assigned to instances that are launched with the nodeclass.
// +optional
AssociatePublicIPAddress *bool `json:"associatePublicIPAddress,omitempty"`
Expand Down Expand Up @@ -199,6 +204,26 @@ type CapacityReservationSelectorTerm struct {
InstanceMatchCriteria string `json:"instanceMatchCriteria,omitempty"`
}

// PlacementGroup defines placement-group membership for instances launched with this node class.
type PlacementGroup struct {
// Name is the name of the placement group in EC2.
// Mutually exclusive with ID.
// +kubebuilder:validation:MaxLength:=255
// +optional
Name string `json:"name,omitempty"`
// ID is the placement group id in EC2. This must be used when launching into a shared placement group.
// Mutually exclusive with Name.
// +kubebuilder:validation:Pattern:="^pg-[0-9a-z]+$"
// +optional
ID string `json:"id,omitempty"`
// Partition is the partition number that instances should launch into.
// Valid only for partition placement groups.
// +kubebuilder:validation:Minimum:=1
// +kubebuilder:validation:Maximum:=7
// +optional
Partition *int32 `json:"partition,omitempty"`
}

// AMISelectorTerm defines selection logic for an ami used by Karpenter to launch nodes.
// If multiple fields are used for selection, the requirements are ANDed.
type AMISelectorTerm struct {
Expand Down
6 changes: 6 additions & 0 deletions pkg/apis/v1/ec2nodeclass_hash_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,9 @@ var _ = Describe("Hash", func() {
Entry("DetailedMonitoring", "14187487647319890991", v1.EC2NodeClass{Spec: v1.EC2NodeClassSpec{DetailedMonitoring: aws.Bool(true)}}),
Entry("InstanceStorePolicy", "4160809219257698490", v1.EC2NodeClass{Spec: v1.EC2NodeClassSpec{InstanceStorePolicy: lo.ToPtr(v1.InstanceStorePolicyRAID0)}}),
Entry("AssociatePublicIPAddress", "4469320567057431454", v1.EC2NodeClass{Spec: v1.EC2NodeClassSpec{AssociatePublicIPAddress: lo.ToPtr(true)}}),
Entry("PlacementGroup Name", "3719706974731311089", v1.EC2NodeClass{Spec: v1.EC2NodeClassSpec{PlacementGroup: &v1.PlacementGroup{Name: "analytics-cluster"}}}),
Entry("PlacementGroup ID", "18122240702898781533", v1.EC2NodeClass{Spec: v1.EC2NodeClassSpec{PlacementGroup: &v1.PlacementGroup{ID: "pg-0123456789abcdef0"}}}),
Entry("PlacementGroup Partition", "4265179377147301792", v1.EC2NodeClass{Spec: v1.EC2NodeClassSpec{PlacementGroup: &v1.PlacementGroup{Name: "analytics-cluster", Partition: lo.ToPtr(int32(1))}}}),
Entry("MetadataOptions HTTPEndpoint", "1277386558528601282", v1.EC2NodeClass{Spec: v1.EC2NodeClassSpec{MetadataOptions: &v1.MetadataOptions{HTTPEndpoint: lo.ToPtr("enabled")}}}),
Entry("MetadataOptions HTTPProtocolIPv6", "14697047633165484196", v1.EC2NodeClass{Spec: v1.EC2NodeClassSpec{MetadataOptions: &v1.MetadataOptions{HTTPProtocolIPv6: lo.ToPtr("enabled")}}}),
Entry("MetadataOptions HTTPPutResponseHopLimit", "2086799014304536137", v1.EC2NodeClass{Spec: v1.EC2NodeClassSpec{MetadataOptions: &v1.MetadataOptions{HTTPPutResponseHopLimit: lo.ToPtr(int64(10))}}}),
Expand Down Expand Up @@ -138,6 +141,9 @@ var _ = Describe("Hash", func() {
Entry("DetailedMonitoring", v1.EC2NodeClass{Spec: v1.EC2NodeClassSpec{DetailedMonitoring: aws.Bool(true)}}),
Entry("InstanceStorePolicy", v1.EC2NodeClass{Spec: v1.EC2NodeClassSpec{InstanceStorePolicy: lo.ToPtr(v1.InstanceStorePolicyRAID0)}}),
Entry("AssociatePublicIPAddress", v1.EC2NodeClass{Spec: v1.EC2NodeClassSpec{AssociatePublicIPAddress: lo.ToPtr(true)}}),
Entry("PlacementGroup Name", v1.EC2NodeClass{Spec: v1.EC2NodeClassSpec{PlacementGroup: &v1.PlacementGroup{Name: "analytics-cluster"}}}),
Entry("PlacementGroup ID", v1.EC2NodeClass{Spec: v1.EC2NodeClassSpec{PlacementGroup: &v1.PlacementGroup{ID: "pg-0123456789abcdef0"}}}),
Entry("PlacementGroup Partition", v1.EC2NodeClass{Spec: v1.EC2NodeClassSpec{PlacementGroup: &v1.PlacementGroup{Name: "analytics-cluster", Partition: lo.ToPtr(int32(1))}}}),
Entry("MetadataOptions HTTPEndpoint", v1.EC2NodeClass{Spec: v1.EC2NodeClassSpec{MetadataOptions: &v1.MetadataOptions{HTTPEndpoint: lo.ToPtr("enabled")}}}),
Entry("MetadataOptions HTTPProtocolIPv6", v1.EC2NodeClass{Spec: v1.EC2NodeClassSpec{MetadataOptions: &v1.MetadataOptions{HTTPProtocolIPv6: lo.ToPtr("enabled")}}}),
Entry("MetadataOptions HTTPPutResponseHopLimit", v1.EC2NodeClass{Spec: v1.EC2NodeClassSpec{MetadataOptions: &v1.MetadataOptions{HTTPPutResponseHopLimit: lo.ToPtr(int64(10))}}}),
Expand Down
Loading