-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat: add CPU options with nested virtualization and instance-type filtering #9043
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,99 @@ | ||
| # CPU Options and Nested Virtualization | ||
|
|
||
| ## Overview | ||
|
|
||
| AWS announced nested virtualization support on virtual EC2 instances in February 2026, | ||
| enabling KVM-based workloads (container sandboxes, microVMs, development VMs) without | ||
| bare-metal instances. The feature is configured via the `CpuOptions.NestedVirtualization` | ||
| field in the EC2 `RunInstances` and `CreateLaunchTemplate` APIs. | ||
|
|
||
| Karpenter needs to expose this capability on `EC2NodeClass` so users can request nodes | ||
| with nested virtualization enabled, and Karpenter needs to filter out instance types that | ||
| do not support the feature to avoid launch failures. | ||
|
|
||
| ## Goals | ||
|
|
||
| - Expose `cpuOptions` on `EC2NodeClass.spec` with `coreCount`, `threadsPerCore`, and | ||
| `nestedVirtualization` fields. | ||
| - Pass `CpuOptions` through to the EC2 launch template. | ||
| - Filter instance types to only those reporting `nested-virtualization` in | ||
| `ProcessorInfo.SupportedFeatures` from `DescribeInstanceTypes`. | ||
| - Validate that `nestedVirtualization` is mutually exclusive with `coreCount` and | ||
| `threadsPerCore` (EC2 API constraint). | ||
| - Cache `UnsupportedOperation` fleet errors as unfulfillable capacity. | ||
|
|
||
| ## API Updates | ||
|
|
||
| ### EC2NodeClass Spec | ||
|
|
||
| ```yaml | ||
| apiVersion: karpenter.k8s.aws/v1 | ||
| kind: EC2NodeClass | ||
| metadata: | ||
| name: nested-virt | ||
| spec: | ||
| cpuOptions: | ||
| nestedVirtualization: enabled | ||
| # ... other fields | ||
| ``` | ||
|
|
||
| ### CPUOptions Struct | ||
|
|
||
| ```go | ||
| type CPUOptions struct { | ||
| CoreCount *int32 `json:"coreCount,omitempty"` | ||
| ThreadsPerCore *int32 `json:"threadsPerCore,omitempty"` | ||
| NestedVirtualization *string `json:"nestedVirtualization,omitempty"` | ||
| } | ||
| ``` | ||
|
|
||
| CEL validation enforces that `nestedVirtualization: enabled` cannot be combined with | ||
| `coreCount` or `threadsPerCore` (EC2 rejects the combination). | ||
|
|
||
| ### Instance Type Label | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't know if we want to include a label unless we use that label to drive the configuration of the instance. This would be consistent with other features, such as instance-tenancy. For that feature, the label can take on two values ( These are the options in my order of preference. I heavily lean towards 1 or 2, though all three options have precedent in the project.
|
||
|
|
||
| A new well-known label `karpenter.k8s.aws/instance-nested-virtualization` is populated | ||
| from `ProcessorInfo.SupportedFeatures` during instance type resolution. Instance types | ||
| that report `nested-virtualization` in their supported features receive the label value | ||
| `"true"`. | ||
|
Comment on lines
+55
to
+58
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we include a label, I don't think we want to set it to true just because an instance is compatible with nested virtualization. It should only be true if the feature is actually enabled. You can imagine that application owners may configure their pods with a label selector like |
||
|
|
||
| As of March 2026, only the `*8i*` families support this feature: c8i, c8i-flex, m8i, | ||
| m8i-flex, r8i, r8i-flex (54 instance types total). No ARM, Xen, or bare-metal instances | ||
| support it. | ||
|
|
||
| ## Launch Behavior | ||
|
|
||
| ### Instance Type Filtering | ||
|
|
||
| When an `EC2NodeClass` sets `cpuOptions.nestedVirtualization: enabled`, a | ||
| `NestedVirtualizationFilter` in the instance filter chain rejects any instance type | ||
| lacking the `instance-nested-virtualization=true` label. This runs after the | ||
| `CompatibleAvailableFilter` and before capacity reservation filters. | ||
|
|
||
| ### Launch Template | ||
|
|
||
| The `cpuOptions()` converter maps the `CPUOptions` struct to | ||
| `LaunchTemplateCpuOptionsRequest`. It returns `nil` when all fields are nil (avoiding an | ||
| empty `CpuOptions` block in the API call). The `NestedVirtualization` string is cast to | ||
| the SDK enum type `ec2types.NestedVirtualizationSpecification`. | ||
|
|
||
| ### Error Handling | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with this approach since it ensures we'll always be able to make provisioning progress, but is this just defensive or are there existing gaps in the filter that you're aware of? |
||
|
|
||
| `UnsupportedOperation` is added to the `unfulfillableCapacityErrorCodes` set so that | ||
| launches against incompatible instance types (if they bypass the filter) are cached as | ||
| unavailable rather than retried indefinitely. | ||
|
|
||
| ## Instance Type Compatibility | ||
|
|
||
| The authoritative signal is `ProcessorInfo.SupportedFeatures` from `DescribeInstanceTypes`: | ||
|
|
||
| ```bash | ||
| aws ec2 describe-instance-types \ | ||
| --filters "Name=processor-info.supported-features,Values=nested-virtualization" \ | ||
| --query 'InstanceTypes[*].InstanceType' | ||
| ``` | ||
|
|
||
| This returns only the families that actually support the feature, avoiding heuristic-based | ||
| filtering (e.g., checking architecture + hypervisor) which would be both over-inclusive | ||
| (allowing older Intel families that don't support it) and fragile (breaking when AWS adds | ||
| support to new families). | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is support for
coreCountandthreadsPerCorea requirement for this integration? I think there's room for integration with these features in Karpenter, but it adds additional design ambiguity that nested virtualization doesn't necessarily have.A straightforward implementation of these two features would filter out instances which don't support the configured values (we should do this if we go forward with it). The downside of this approach is that it severely restricts the diversity of NodePools. We may want to introduce a way to dynamically scale or select these values based on the selected instance type that way we can retain NodePool diversity. Nested virtualization doesn't have the same concern because it's on or off. Additionally, you likely don't want or need to mix instance types with nested virtualization enabled in the same NodePool.