You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: website/content/en/preview/concepts/disruption.md
+51-4Lines changed: 51 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -360,20 +360,44 @@ In this scenario, Karpenter cannot voluntary disrupt the node because:
360
360
361
361
As seen in this example, the more PDBs there are affecting a Node, the more difficult it will be for Karpenter to find an opportunity to perform voluntary disruption actions.
362
362
363
-
Secondly, you can block Karpenter from voluntarily disrupting and draining pods by adding the `karpenter.sh/do-not-disrupt: "true"` annotation to the pod.
364
-
You can treat this annotation as a single-pod, permanently blocking PDB.
363
+
Secondly, you can block Karpenter from voluntarily disrupting and draining pods by adding the `karpenter.sh/do-not-disrupt` annotation to the pod.
- **Duration format**: `karpenter.sh/do-not-disrupt: "30m"` - Provides time-based protection for the specified duration after the pod starts running
367
+
368
+
#### Duration-Based Protection
369
+
370
+
When using the duration format, pods are protected from disruption for the specified time period after they start running (based on `pod.status.startTime`).
371
+
Once the grace period expires, the pod becomes eligible for disruption like any other pod.
372
+
This is useful for workloads that need protection during startup or critical phases but can be safely disrupted later.
373
+
374
+
Supported duration formats include:
375
+
- `"5m"` - 5 minutes
376
+
- `"1h"` - 1 hour
377
+
- `"2h30m"` - 2 hours and 30 minutes
378
+
- `"24h"` - 24 hours
379
+
380
+
###TODO REWRITE THIS SECTION
381
+
If an invalid duration is specified, the pod will be treated as permanently protected (equivalent to `"true"`).
382
+
383
+
#### Behavior and Consequences
384
+
385
+
You can treat this annotation as a single-pod, blocking PDB (permanent when using boolean format, or temporary when using duration format).
365
386
This has the following consequences:
366
387
- Nodes with `karpenter.sh/do-not-disrupt` pods will be excluded from [Consolidation]({{<ref "#consolidation" >}}), and conditionally excluded from [Drift]({{<ref "#drift" >}}).
367
388
- If the Node's owning NodeClaim has a [`terminationGracePeriod`]({{<ref "#terminationgraceperiod">}}) configured, it will still be eligible for disruption via drift.
368
389
- Like pods with a blocking PDB, pods with the `karpenter.sh/do-not-disrupt` annotation will **not** be gracefully evicted by the [Termination Controller]({{<ref "#termination-controller">}}).
369
390
Karpenter will not be able to complete termination of the node until one of the following conditions is met:
370
391
- All pods with the `karpenter.sh/do-not-disrupt` annotation are removed.
371
392
- All pods with the `karpenter.sh/do-not-disrupt` annotation have entered a [terminal phase](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-phase) (`Succeeded` or `Failed`).
393
+
- For duration-based annotations, the grace period has expired and the pod is no longer protected.
372
394
- The owning NodeClaim's [`terminationGracePeriod`]({{<ref "#terminationgraceperiod" >}}) has elapsed.
373
395
374
-
This is useful for pods that you want to run from start to finish without disruption.
375
-
Examples of pods that you might want to opt-out of disruption include an interactive game that you don't want to interrupt or a long batch job (such as you might have with machine learning) that would need to start over if it were interrupted.
396
+
#### Examples
376
397
398
+
This is useful for pods that you want to run from start to finish without disruption, or that need protection during critical startup phases.
399
+
400
+
**Permanent protection** - useful for interactive games or long-running batch jobs:
377
401
```yaml
378
402
apiVersion: apps/v1
379
403
kind: Deployment
@@ -384,6 +408,29 @@ spec:
384
408
karpenter.sh/do-not-disrupt: "true"
385
409
```
386
410
411
+
**Time-based protection** - useful for workloads with critical startup phases:
412
+
```yaml
413
+
apiVersion: apps/v1
414
+
kind: Deployment
415
+
spec:
416
+
template:
417
+
metadata:
418
+
annotations:
419
+
# Protect for 30 minutes after pod starts running
420
+
karpenter.sh/do-not-disrupt: "30m"
421
+
```
422
+
423
+
```yaml
424
+
apiVersion: batch/v1
425
+
kind: Job
426
+
spec:
427
+
template:
428
+
metadata:
429
+
annotations:
430
+
# Protect for 2 hours after pod starts running
431
+
karpenter.sh/do-not-disrupt: "2h"
432
+
```
433
+
387
434
{{% alert title="Note" color="primary" %}}
388
435
The `karpenter.sh/do-not-disrupt` annotation does **not** exclude nodes from the forceful disruption methods: [Expiration]({{<ref "#expiration" >}}), [Interruption]({{<ref "#interruption" >}}), [Node Repair](<ref "#node-repair" >), and manual deletion (e.g. `kubectl delete node ...`).
389
436
While both interruption and node repair have implicit upper-bounds on termination time, expiration and manual termination do not.
Copy file name to clipboardExpand all lines: website/content/en/preview/troubleshooting.md
+25-2Lines changed: 25 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -482,9 +482,32 @@ Review what [disruptions are](https://kubernetes.io/docs/concepts/workloads/pods
482
482
483
483
#### `karpenter.sh/do-not-disrupt` Annotation
484
484
485
-
If a pod exists with the annotation `karpenter.sh/do-not-disrupt: true` on a node, and a request is made to delete the node, Karpenter will not drain any pods from that node or otherwise try to delete the node. Nodes that have pods with a `do-not-disrupt` annotation are not considered for consolidation, though their unused capacity is considered for the purposes of running pods from other nodes which can be consolidated.
485
+
The `karpenter.sh/do-not-disrupt` annotation can be used to protect pods from voluntary disruption. It supports two formats:
486
486
487
-
If you want to terminate a node with a `do-not-disrupt` pod, you can simply remove the annotation and the deprovisioning process will continue.
487
+
**Boolean format** - Permanent protection:
488
+
```yaml
489
+
karpenter.sh/do-not-disrupt: "true"
490
+
```
491
+
492
+
**Duration format** - Time-based protection:
493
+
```yaml
494
+
# Protect for 30 minutes after pod starts running
495
+
karpenter.sh/do-not-disrupt: "30m"
496
+
497
+
# Protect for 2 hours after pod starts running
498
+
karpenter.sh/do-not-disrupt: "2h"
499
+
500
+
# Protect for 1 hour and 30 minutes after pod starts running
501
+
karpenter.sh/do-not-disrupt: "1h30m"
502
+
```
503
+
504
+
If a pod exists with the `do-not-disrupt` annotation on a node, and a request is made to delete the node, Karpenter will not drain any pods from that node or otherwise try to delete the node. Nodes that have pods with a `do-not-disrupt` annotation are not considered for consolidation, though their unused capacity is considered for the purposes of running pods from other nodes which can be consolidated.
505
+
506
+
For duration-based annotations, protection expires after the specified time period from when the pod starts running (`pod.status.startTime`). Once expired, the pod becomes eligible for disruption.
507
+
508
+
If you want to terminate a node with a `do-not-disrupt` pod, you can either:
509
+
- Remove the annotation and the deprovisioning process will continue
510
+
- Wait for duration-based annotations to expire naturally
0 commit comments