Skip to content

Commit f373310

Browse files
Pulling in do-not-disrupt grace period
1 parent 6613010 commit f373310

File tree

4 files changed

+83
-11
lines changed

4 files changed

+83
-11
lines changed

go.mod

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -123,7 +123,7 @@ require (
123123
golang.org/x/oauth2 v0.34.0 // indirect
124124
golang.org/x/sys v0.40.0 // indirect
125125
golang.org/x/term v0.39.0 // indirect
126-
golang.org/x/text v0.33.0 // indirect
126+
golang.org/x/text v0.34.0 // indirect
127127
golang.org/x/time v0.14.0 // indirect
128128
golang.org/x/tools v0.41.0 // indirect
129129
gomodules.xyz/jsonpatch/v2 v2.5.0 // indirect
@@ -140,3 +140,5 @@ require (
140140
sigs.k8s.io/randfill v1.0.0 // indirect
141141
sigs.k8s.io/structured-merge-diff/v6 v6.3.1 // indirect
142142
)
143+
144+
replace sigs.k8s.io/karpenter => github.com/AndrewMitchell25/karpenter v0.0.0-20260302230637-bb8c524f9de3

go.sum

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
github.com/AndrewMitchell25/karpenter v0.0.0-20260302230637-bb8c524f9de3 h1:cj7ydUP0ELUINLi40EbYaeW4odjs05+fFScPCBj0H0w=
2+
github.com/AndrewMitchell25/karpenter v0.0.0-20260302230637-bb8c524f9de3/go.mod h1:7HVTLcR8uNwHcnwjfaCqV2ICF3aOPvngK/J8CBXZraU=
13
github.com/Masterminds/semver/v3 v3.4.0 h1:Zog+i5UMtVoCU8oKka5P7i9q9HgrJeGzI9SA1Xbatp0=
24
github.com/Masterminds/semver/v3 v3.4.0/go.mod h1:4V+yj/TJE1HU9XfppCwVMZq3I84lprf4nC11bSS5beM=
35
github.com/Pallinder/go-randomdata v1.2.0 h1:DZ41wBchNRb/0GfsePLiSwb0PHZmT67XY00lCDlaYPg=
@@ -323,8 +325,8 @@ golang.org/x/text v0.13.0/go.mod h1:TvPlkZtksWOMsz7fbANvkp4WM8x/WCo/om8BMLbz+aE=
323325
golang.org/x/text v0.14.0/go.mod h1:18ZOQIKpY8NJVqYksKHtTdi31H5itFRjB5/qKTNYzSU=
324326
golang.org/x/text v0.15.0/go.mod h1:18ZOQIKpY8NJVqYksKHtTdi31H5itFRjB5/qKTNYzSU=
325327
golang.org/x/text v0.21.0/go.mod h1:4IBbMaMmOPCJ8SecivzSH54+73PCFmPWxNTLm+vZkEQ=
326-
golang.org/x/text v0.33.0 h1:B3njUFyqtHDUI5jMn1YIr5B0IE2U0qck04r6d4KPAxE=
327-
golang.org/x/text v0.33.0/go.mod h1:LuMebE6+rBincTi9+xWTY8TztLzKHc/9C1uBCG27+q8=
328+
golang.org/x/text v0.34.0 h1:oL/Qq0Kdaqxa1KbNeMKwQq0reLCCaFtqu2eNuSeNHbk=
329+
golang.org/x/text v0.34.0/go.mod h1:homfLqTYRFyVYemLBFl5GgL/DWEiH5wcsQ5gSh1yziA=
328330
golang.org/x/time v0.14.0 h1:MRx4UaLrDotUKUdCIqzPC48t1Y9hANFKIRpNx+Te8PI=
329331
golang.org/x/time v0.14.0/go.mod h1:eL/Oa2bBBK0TkX57Fyni+NgnyQQN4LitPmob2Hjnqw4=
330332
golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
@@ -375,8 +377,6 @@ sigs.k8s.io/controller-runtime v0.22.4 h1:GEjV7KV3TY8e+tJ2LCTxUTanW4z/FmNB7l327U
375377
sigs.k8s.io/controller-runtime v0.22.4/go.mod h1:+QX1XUpTXN4mLoblf4tqr5CQcyHPAki2HLXqQMY6vh8=
376378
sigs.k8s.io/json v0.0.0-20250730193827-2d320260d730 h1:IpInykpT6ceI+QxKBbEflcR5EXP7sU1kvOlxwZh5txg=
377379
sigs.k8s.io/json v0.0.0-20250730193827-2d320260d730/go.mod h1:mdzfpAEoE6DHQEN0uh9ZbOCuHbLK5wOm7dK4ctXE9Tg=
378-
sigs.k8s.io/karpenter v1.9.1-0.20260220232539-5e12af134257 h1:Z7WZW+Hw8Naj3kOcHIZbHyIKwTDtzQzm0N9tgqdGZbY=
379-
sigs.k8s.io/karpenter v1.9.1-0.20260220232539-5e12af134257/go.mod h1:5NVeUwDmwHGnGIiqZhYCVfRx1uE5f9zdZsUYI34isIo=
380380
sigs.k8s.io/randfill v1.0.0 h1:JfjMILfT8A6RbawdsK2JXGBR5AQVfd+9TbzrlneTyrU=
381381
sigs.k8s.io/randfill v1.0.0/go.mod h1:XeLlZ/jmk4i1HRopwe7/aU3H5n1zNUcX6TM94b3QxOY=
382382
sigs.k8s.io/structured-merge-diff/v6 v6.3.1 h1:JrhdFMqOd/+3ByqlP2I45kTOZmTRLBUm5pvRjeheg7E=

website/content/en/preview/concepts/disruption.md

Lines changed: 51 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -360,20 +360,44 @@ In this scenario, Karpenter cannot voluntary disrupt the node because:
360360
361361
As seen in this example, the more PDBs there are affecting a Node, the more difficult it will be for Karpenter to find an opportunity to perform voluntary disruption actions.
362362
363-
Secondly, you can block Karpenter from voluntarily disrupting and draining pods by adding the `karpenter.sh/do-not-disrupt: "true"` annotation to the pod.
364-
You can treat this annotation as a single-pod, permanently blocking PDB.
363+
Secondly, you can block Karpenter from voluntarily disrupting and draining pods by adding the `karpenter.sh/do-not-disrupt` annotation to the pod.
364+
The annotation supports two formats:
365+
- **Boolean format**: `karpenter.sh/do-not-disrupt: "true"` - Provides permanent protection from disruption
366+
- **Duration format**: `karpenter.sh/do-not-disrupt: "30m"` - Provides time-based protection for the specified duration after the pod starts running
367+
368+
#### Duration-Based Protection
369+
370+
When using the duration format, pods are protected from disruption for the specified time period after they start running (based on `pod.status.startTime`).
371+
Once the grace period expires, the pod becomes eligible for disruption like any other pod.
372+
This is useful for workloads that need protection during startup or critical phases but can be safely disrupted later.
373+
374+
Supported duration formats include:
375+
- `"5m"` - 5 minutes
376+
- `"1h"` - 1 hour
377+
- `"2h30m"` - 2 hours and 30 minutes
378+
- `"24h"` - 24 hours
379+
380+
###TODO REWRITE THIS SECTION
381+
If an invalid duration is specified, the pod will be treated as permanently protected (equivalent to `"true"`).
382+
383+
#### Behavior and Consequences
384+
385+
You can treat this annotation as a single-pod, blocking PDB (permanent when using boolean format, or temporary when using duration format).
365386
This has the following consequences:
366387
- Nodes with `karpenter.sh/do-not-disrupt` pods will be excluded from [Consolidation]({{<ref "#consolidation" >}}), and conditionally excluded from [Drift]({{<ref "#drift" >}}).
367388
- If the Node's owning NodeClaim has a [`terminationGracePeriod`]({{<ref "#terminationgraceperiod" >}}) configured, it will still be eligible for disruption via drift.
368389
- Like pods with a blocking PDB, pods with the `karpenter.sh/do-not-disrupt` annotation will **not** be gracefully evicted by the [Termination Controller]({{<ref "#termination-controller">}}).
369390
Karpenter will not be able to complete termination of the node until one of the following conditions is met:
370391
- All pods with the `karpenter.sh/do-not-disrupt` annotation are removed.
371392
- All pods with the `karpenter.sh/do-not-disrupt` annotation have entered a [terminal phase](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase) (`Succeeded` or `Failed`).
393+
- For duration-based annotations, the grace period has expired and the pod is no longer protected.
372394
- The owning NodeClaim's [`terminationGracePeriod`]({{<ref "#terminationgraceperiod" >}}) has elapsed.
373395
374-
This is useful for pods that you want to run from start to finish without disruption.
375-
Examples of pods that you might want to opt-out of disruption include an interactive game that you don't want to interrupt or a long batch job (such as you might have with machine learning) that would need to start over if it were interrupted.
396+
#### Examples
376397
398+
This is useful for pods that you want to run from start to finish without disruption, or that need protection during critical startup phases.
399+
400+
**Permanent protection** - useful for interactive games or long-running batch jobs:
377401
```yaml
378402
apiVersion: apps/v1
379403
kind: Deployment
@@ -384,6 +408,29 @@ spec:
384408
karpenter.sh/do-not-disrupt: "true"
385409
```
386410
411+
**Time-based protection** - useful for workloads with critical startup phases:
412+
```yaml
413+
apiVersion: apps/v1
414+
kind: Deployment
415+
spec:
416+
template:
417+
metadata:
418+
annotations:
419+
# Protect for 30 minutes after pod starts running
420+
karpenter.sh/do-not-disrupt: "30m"
421+
```
422+
423+
```yaml
424+
apiVersion: batch/v1
425+
kind: Job
426+
spec:
427+
template:
428+
metadata:
429+
annotations:
430+
# Protect for 2 hours after pod starts running
431+
karpenter.sh/do-not-disrupt: "2h"
432+
```
433+
387434
{{% alert title="Note" color="primary" %}}
388435
The `karpenter.sh/do-not-disrupt` annotation does **not** exclude nodes from the forceful disruption methods: [Expiration]({{<ref "#expiration" >}}), [Interruption]({{<ref "#interruption" >}}), [Node Repair](<ref "#node-repair" >), and manual deletion (e.g. `kubectl delete node ...`).
389436
While both interruption and node repair have implicit upper-bounds on termination time, expiration and manual termination do not.

website/content/en/preview/troubleshooting.md

Lines changed: 25 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -482,9 +482,32 @@ Review what [disruptions are](https://kubernetes.io/docs/concepts/workloads/pods
482482

483483
#### `karpenter.sh/do-not-disrupt` Annotation
484484

485-
If a pod exists with the annotation `karpenter.sh/do-not-disrupt: true` on a node, and a request is made to delete the node, Karpenter will not drain any pods from that node or otherwise try to delete the node. Nodes that have pods with a `do-not-disrupt` annotation are not considered for consolidation, though their unused capacity is considered for the purposes of running pods from other nodes which can be consolidated.
485+
The `karpenter.sh/do-not-disrupt` annotation can be used to protect pods from voluntary disruption. It supports two formats:
486486

487-
If you want to terminate a node with a `do-not-disrupt` pod, you can simply remove the annotation and the deprovisioning process will continue.
487+
**Boolean format** - Permanent protection:
488+
```yaml
489+
karpenter.sh/do-not-disrupt: "true"
490+
```
491+
492+
**Duration format** - Time-based protection:
493+
```yaml
494+
# Protect for 30 minutes after pod starts running
495+
karpenter.sh/do-not-disrupt: "30m"
496+
497+
# Protect for 2 hours after pod starts running
498+
karpenter.sh/do-not-disrupt: "2h"
499+
500+
# Protect for 1 hour and 30 minutes after pod starts running
501+
karpenter.sh/do-not-disrupt: "1h30m"
502+
```
503+
504+
If a pod exists with the `do-not-disrupt` annotation on a node, and a request is made to delete the node, Karpenter will not drain any pods from that node or otherwise try to delete the node. Nodes that have pods with a `do-not-disrupt` annotation are not considered for consolidation, though their unused capacity is considered for the purposes of running pods from other nodes which can be consolidated.
505+
506+
For duration-based annotations, protection expires after the specified time period from when the pod starts running (`pod.status.startTime`). Once expired, the pod becomes eligible for disruption.
507+
508+
If you want to terminate a node with a `do-not-disrupt` pod, you can either:
509+
- Remove the annotation and the deprovisioning process will continue
510+
- Wait for duration-based annotations to expire naturally
488511

489512
#### Scheduling Constraints (Consolidation Only)
490513

0 commit comments

Comments
 (0)