Skip to content

Commit 1b4f89e

Browse files
committed
KEP: introduce Federated Stateful Rollout (Coordinated Blue-Green Migration)
1 parent 6012227 commit 1b4f89e

1 file changed

Lines changed: 104 additions & 0 deletions

File tree

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
---
2+
title: "KEP: Federated Stateful Rollout (Coordinated Blue-Green Migration)"
3+
authors:
4+
- "@liwang0513"
5+
reviewers:
6+
- "@RainbowMango"
7+
- "@XiShanYongYe-Chang"
8+
- "@zhzhuang-zju"
9+
approvers:
10+
- "@RainbowMango"
11+
creation-date: 2026-04-04
12+
---
13+
14+
# Federated Stateful Rollout: Coordinated Blue-Green Migration for Flink
15+
16+
## Summary
17+
The **Federated Stateful Rollout** feature introduces a proactive orchestration mechanism for stateful workloads across multiple clusters. While the existing `StatefulFailover` handles unplanned outages (reactive), this feature manages planned operations such as regional rebalancing, cluster maintenance, and safe image upgrades. By coordinating a "Suspend-Capture-Resume" lifecycle, it ensures **Zero Data Loss** and eliminates **Reprocessing Lag** by utilizing synchronous Savepoints.
18+
19+
## Motivation
20+
Standard multi-cluster failover in Karmada currently faces three technical gaps for streaming applications:
21+
* **Reprocessing Lag:** Recovery from stale periodic checkpoints forces jobs to "catch up" on data backlogs, causing downstream latency.
22+
* **Topology Friction:** Image or DAG updates often break compatibility with old checkpoints; coordinated Savepoints are required for safe upgrades.
23+
* **Safety Gap:** There is no "Validation Gate." In standard failover, the source instance is often deleted before the target is confirmed healthy.
24+
25+
### Goals
26+
* **Coordinated Handoff:** Ensure an atomic "baton pass" of state between clusters.
27+
* **Zero Reprocessing:** Use synchronous Savepoints to start the target exactly where the source stopped.
28+
* **Validation Gate:** Keep the source cluster as a "hot standby" until the target is `RUNNING`.
29+
* **Transparent Orchestration:** Automate the manipulation of `ResourceBindings` and `Overrides` within ClusterSets.
30+
31+
### Non-Goals
32+
* Replacing reactive `StatefulFailover`.
33+
* Managing underlying storage (S3/GCS) bucket permissions.
34+
35+
## Proposal: WorkloadTransitionController
36+
We propose a new controller in `karmada-controller-manager` that orchestrates the migration state machine.
37+
38+
### Transition State Machine
39+
| Phase | Action | Visibility |
40+
| :--- | :--- | :--- |
41+
| **1. Trigger** | User taints a cluster or adds a migration annotation. | User-Visible |
42+
| **2. Discovery** | Controller identifies active cluster via `ResourceBinding` status. | Transparent |
43+
| **3. Expansion** | Controller patches `ResourceBinding` (replicas: 2) and adds a Finalizer. | Transparent |
44+
| **4. Hold** | Controller applies `ClusterOverridePolicy` to Target (state: `suspended`). | Transparent |
45+
| **5. Capture** | Controller patches Source to `suspended`, triggers Savepoint. | Transparent |
46+
| **6. Handoff** | Controller injects Savepoint URL into Target Override and flips to `running`. | Transparent |
47+
| **7. Cleanup** | Controller removes Source from `ResourceBinding` and deletes Overrides. | Transparent |
48+
49+
## Design Details
50+
51+
### The "Hold" Pattern via ClusterOverridePolicy
52+
To ensure the target cluster does not start prematurely, the controller utilizes a `ClusterOverridePolicy` to "hold" the deployment in a suspended state while the `ResourceBinding` is expanded.
53+
54+
**Example Hold Override:**
55+
```yaml
56+
apiVersion: policy.karmada.io/v1alpha1
57+
kind: ClusterOverridePolicy
58+
metadata:
59+
name: flink-migration-hold-pw
60+
spec:
61+
resourceSelectors:
62+
- apiVersion: flink.apache.org/v1beta1
63+
kind: FlinkDeployment
64+
name: hbase-demo
65+
namespace: s-spaasapi
66+
targetCluster:
67+
clusterNames: ["spaas-kaas-pw-dev02"]
68+
overriders:
69+
plaintext:
70+
- path: "/spec/job/state"
71+
operator: replace
72+
value: "suspended"
73+
```
74+
### ResourceBinding Manipulation
75+
In environments with maxGroups: 1, the controller must manually expand the ResourceBinding to allow coexistence during the handoff. A Migration Finalizer is added to prevent the Scheduler from reverting the expansion during the transition window.
76+
77+
```yaml
78+
# Internal ResourceBinding Patch
79+
spec:
80+
clusters:
81+
- name: spaas-kaas-tt-dev02 (Source)
82+
replicas: 1
83+
- name: spaas-kaas-pw-dev02 (Target)
84+
replicas: 1
85+
replicas: 2
86+
```
87+
88+
## User Stories
89+
### Story 1: Planned Cluster Maintenance (0 RPO)
90+
An SRE taints cluster `tt` for a Kubernetes upgrade. The controller detects the intent, captures a synchronous Savepoint in `tt`, and hands it to cluster `pw`. The job resumes in `pw` with zero backlog, maintaining real-time processing.
91+
92+
### Story 2: Atomic Image Upgrade
93+
A developer updates the `FlinkDeployment` image. The controller orchestrates a Blue-Green move. If the new image fails to initialize in the target cluster, the controller aborts and resumes the original job in the source cluster, providing an automated safety net.
94+
95+
## Risks and Mitigations
96+
- Risk: Split-Brain. Multiple clusters writing to the same sink.
97+
98+
- Mitigation: Strict "Suspend-before-Resume" sequence confirmed via `ResourceInterpreter` status aggregation.
99+
100+
## Alternatives Considered
101+
- Manual Scripting: Rejected as error-prone and unsafe for Exactly-Once requirements.
102+
103+
- New Federated CRD: Rejected to avoid API sprawl. Using standard `FlinkDeployment` + `Karmada` Overrides is more sustainable.
104+

0 commit comments

Comments
 (0)