diff --git a/docs/mimo/README.md b/docs/mimo/README.md index d21ffec52e0..fc60ba51dc3 100644 --- a/docs/mimo/README.md +++ b/docs/mimo/README.md @@ -20,5 +20,5 @@ The Actuator will treat these Maintenance Manifests as a work queue, taking ones After running each, a state will be written into the Manifest (with optional free-form status text) with the result of the ran Task. Manifests past their start-before times are marked as having a "timed out" state and not ran. -Currently, Manifests are created by the Admin API. -In the future, the Scheduler will create some these Manifests depending on cluster state/version and wall-clock time, providing the ability to perform tasks like rotations of secrets autonomously. +Manifests are created either manually via the Admin API or automatically by the [Scheduler](./scheduler.md). +The Scheduler creates Manifests on a recurring basis according to configured [MaintenanceSchedules](./scheduler.md#maintenanceschedule), using [calendar expressions and selectors](./scheduler-calendar-and-selectors.md) to determine when and where tasks run across the fleet. diff --git a/docs/mimo/admin-api.md b/docs/mimo/admin-api.md index 96504dfd362..2dd9a58503c 100644 --- a/docs/mimo/admin-api.md +++ b/docs/mimo/admin-api.md @@ -13,10 +13,13 @@ Creates a new manifest. Returns the created manifest. ### Example ```sh -curl -X PUT -k "https://localhost:8443/admin/subscriptions/fe16a035-e540-4ab7-80d9-373fa9a3d6ae/resourcegroups/v4-westeurope/providers/microsoft.redhatopenshift/openshiftclusters/abrownmimom1test/maintenancemanifests?api-version -=admin" -d '{"maintenanceTaskID": "b41749fc-af26-4ab7-b5a1-e03f3ee4cba6"}' --header "Content-Type: application/json" +curl -X PUT -k "https://localhost:8443/admin/subscriptions/SUBSCRIPTION_ID/resourcegroups/RESOURCE_GROUP/providers/microsoft.redhatopenshift/openshiftclusters/CLUSTER_NAME/maintenancemanifests?api-version=admin" \ + --header "Content-Type: application/json" \ + -d '{"maintenanceTaskID": "b41749fc-af26-4ab7-b5a1-e03f3ee4cba6"}' ``` +Replace `SUBSCRIPTION_ID`, `RESOURCE_GROUP`, and `CLUSTER_NAME` with the target cluster's values. Registered task IDs are defined in [`pkg/mimo/const.go`](../../pkg/mimo/const.go). + ## GET /admin/RESOURCE_ID/maintenancemanifests/MANIFEST_ID Returns a manifest. @@ -39,8 +42,26 @@ Returns a list of all schedules. ## PUT /admin/maintenanceschedules -Creates/updates a schedule. Returns the created schedule. +Creates or updates a schedule. Returns `201 Created` for new schedules and `200 OK` for updates. + +The schedule `id` field is the schedule's own unique identifier (distinct from `maintenanceTaskID`). If `id` is omitted from the request body, a new schedule is created with an auto-generated ID. If `id` is provided and matches an existing schedule, that schedule is updated. If `id` is provided but does not match any existing schedule, a new schedule is created with that ID. + +See [Scheduler Calendar and Selectors](./scheduler-calendar-and-selectors.md) for the calendar expression format and selector syntax. + +### Example + +```sh +curl -X PUT -k "https://localhost:8443/admin/maintenanceschedules?api-version=admin" \ + --header "Content-Type: application/json" \ + -d '{"state": "Enabled", "maintenanceTaskID": "9b741734-6505-447f-8510-85eb0ae561a2", "schedule": "Mon *-*-* 00:00", "lookForwardCount": 4, "scheduleAcross": "24h", "selectors": [{"key": "subscriptionState", "operator": "in", "values": ["Registered"]}]}' +``` + +Task IDs are defined in [`pkg/mimo/const.go`](../../pkg/mimo/const.go). ## GET /admin/maintenanceschedules/SCHEDULE_ID Returns a schedule. + +## GET /admin/RESOURCE_ID/selectors + +Returns the selector key-value pairs for a specific cluster. Use this to verify which clusters a schedule's selectors will match. diff --git a/docs/mimo/scheduler-calendar-and-selectors.md b/docs/mimo/scheduler-calendar-and-selectors.md new file mode 100644 index 00000000000..af9ad766d01 --- /dev/null +++ b/docs/mimo/scheduler-calendar-and-selectors.md @@ -0,0 +1,321 @@ +# Scheduler Calendar and Selectors Reference + +## Calendar Format + +The Scheduler uses a calendar expression format based on [systemd calendar events](https://www.freedesktop.org/software/systemd/man/latest/systemd.time.html#Calendar%20Events) to define when maintenance schedules trigger. + +### Syntax + +``` +[Weekday] Year-Month-Day Hour:Minute[:Second] +``` + +The seconds component is optional. If provided, it must be `00`. Per-second granularity is not supported. + +| Component | Position | Required | Format | +|-----------|----------|----------|--------| +| Weekday | Prefix (space-separated from date) | No | Three-letter abbreviation(s), comma-separated | +| Year | Before first `-` | Yes | Four-digit year, `*` for any, or comma-separated list | +| Month | Between first and second `-` | Yes | 1-12, `*` for any, or comma-separated list | +| Day | After second `-` | Yes | 1-31, `*` for any, or comma-separated list | +| Hour | Before first `:` | Yes | 0-23, `*` for any, or comma-separated list | +| Minute | After first `:` | Yes | `0`, `15`, `30`, or `45`, or `*` for any, or comma-separated list of allowed values | +| Second | After second `:` (if present) | No | Must be `00` if provided; per-second granularity is unsupported | + +### Field Ranges + +| Field | Min | Max | Notes | +|-------|-----|-----|-------| +| Year | 2026 | -- | Years before 2026 are rejected. There is no enforced upper bound. Matching is exact: specifying `2026` matches only 2026, not subsequent years. Use `*` for any year. | +| Month | 1 | 12 | | +| Day | 1 | 31 | Days beyond the month's range are handled gracefully | +| Hour | 0 | 23 | | +| Minute | 0, 15, 30, 45 | | Only these four values are allowed. The wildcard `*` matches any minute. | + +### Weekday Abbreviations + +| Abbreviation | Day | +|-------------|-----| +| `Mon` | Monday | +| `Tue` | Tuesday | +| `Wed` | Wednesday | +| `Thu` | Thursday | +| `Fri` | Friday | +| `Sat` | Saturday | +| `Sun` | Sunday | + +### Wildcards and Lists + +- **Wildcard (`*`)**: Matches any value for that field. For example, `*-*-* 00:00` means "every day at midnight." +- **Comma-separated list**: Matches any of the listed values. For example, `Mon,Wed,Fri` means Monday, Wednesday, or Friday. Multiple values can also be used in numeric fields: `*-*-1,15 00:00` means the 1st and 15th of every month. For minutes, only `0`, `15`, `30`, and `45` may appear in the list. + +### Calendar Examples + +| Expression | Meaning | +|------------|---------| +| `*-*-* 00:00` | Every day at midnight UTC | +| `*-*-* 06:00` | Every day at 06:00 UTC | +| `*-*-* *:00` | Every hour on the hour | +| `*-*-* *:15` | Every hour at quarter past | +| `Mon *-*-* 00:00` | Every Monday at midnight UTC | +| `Mon,Thu *-*-* 00:00` | Every Monday and Thursday at midnight UTC | +| `Mon,Wed,Fri *-*-* 12:00` | Monday, Wednesday, Friday at noon UTC | +| `*-*-1 00:00` | First day of every month at midnight UTC | +| `*-*-1,15 06:00` | 1st and 15th of every month at 06:00 UTC | +| `*-1-* 00:00` | Every day in January at midnight UTC | +| `*-3,6,9,12-1 00:00` | First day of each quarter at midnight UTC | +| `2026-*-* 00:00` | Every day in 2026 only, at midnight UTC | +| `Sat,Sun *-*-* 02:00` | Every weekend at 02:00 UTC | +| `*-*-* 00:00:00` | Equivalent to `*-*-* 00:00` (seconds are optional) | +| `*-*-* *:0,30` | Every hour at the top and bottom of the hour | + +### Time Zone Handling + +All times are in UTC. The Scheduler does not support time zone specifications in calendar expressions. If you need to target a specific local time, convert to UTC before defining the schedule. + +| Target | Local Time | UTC Equivalent (Winter) | UTC Equivalent (Summer/DST) | +|--------|-----------|------------------------|---------------------------| +| US East business end | 18:00 EST | 23:00 UTC | 22:00 UTC (EDT) | +| US West business end | 18:00 PST | 02:00 UTC (+1 day) | 01:00 UTC (+1 day, PDT) | +| EU West business end | 18:00 CET | 17:00 UTC | 16:00 UTC (CEST) | + +### Differences from systemd Calendar Events + +The Scheduler's calendar format is inspired by but not identical to systemd's `OnCalendar` syntax: + +| Feature | systemd | Scheduler | +|---------|---------|-----------| +| Weekday prefix | Supported | Supported | +| Comma-separated values | Supported | Supported | +| Seconds field | Required | Optional; must be `00` if present | +| Minute granularity | Any value 0-59 | Restricted to `0`, `15`, `30`, `45` | +| Range syntax (e.g., `Mon..Fri`) | Supported | Not supported | +| Repeat syntax (e.g., `*-*-* *:00/15:00`) | Supported | Not supported | +| Multiple expressions (separated by `;`) | Supported | Not supported | +| `~` (last day of month) | Supported | Not supported | + +If you need functionality not supported by the Scheduler's calendar format, create multiple schedules with different expressions. + +## Selectors + +Selectors determine which clusters a `MaintenanceSchedule` applies to. They filter the fleet based on cluster and subscription properties. + +### Selector Syntax + +Each selector is a JSON object with the following fields: + +```json +{ + "key": "SELECTOR_KEY", + "operator": "OPERATOR", + "value": "SINGLE_VALUE", + "values": ["VALUE_1", "VALUE_2"] +} +``` + +| Field | Required | Description | +|-------|----------|-------------| +| `key` | Yes | The cluster property to match against (see [Well-Known Selector Keys](#well-known-selector-keys)) | +| `operator` | Yes | The comparison operator (see [Available Operators](#available-operators)) | +| `value` | Conditional | Single string value; required for `eq` operator, must not be provided for `in`/`notin` | +| `values` | Conditional | Array of string values; required for `in`/`notin` operators, must not be provided for `eq` | + +Replace `SELECTOR_KEY`, `OPERATOR`, `SINGLE_VALUE`, `VALUE_1`, and `VALUE_2` with appropriate values. + +### Available Operators + +| Operator | Description | Requires | +|----------|-------------|----------| +| `eq` | Exact string equality match | `value` (single string) | +| `in` | Value is contained in the provided list | `values` (string array, at least one element) | +| `notin` | Value is not contained in the provided list | `values` (string array, at least one element) | + +### Well-Known Selector Keys + +The following keys are defined in the Scheduler's cluster cache (see [`pkg/mimo/scheduler/selectors.go`](../../pkg/mimo/scheduler/selectors.go)). All values are strings. + +| Key | Description | Example Values | +|-----|-------------|---------------| +| `resourceID` | Full ARM resource ID of the cluster (lowercased) | `/subscriptions/.../openshiftclusters/mycluster` | +| `subscriptionID` | Azure subscription ID containing the cluster | `00000000-0000-0000-0000-000000000000` | +| `subscriptionState` | Registration state of the subscription | `Registered`, `Warned`, `Suspended` | +| `authenticationType` | Cluster authentication mechanism | `WorkloadIdentity`, `ServicePrincipal` | +| `architectureVersion` | Cluster architecture version (integer as string) | `1`, `2` | +| `provisioningState` | Current provisioning state of the cluster | `Succeeded`, `Failed`, `Creating` | +| `outboundType` | Network outbound routing type | `Loadbalancer`, `UserDefinedRouting` | +| `APIServerVisibility` | API server endpoint visibility | `Public`, `Private` | +| `isManagedDomain` | Whether the cluster uses an ARO-managed domain | `true`, `false` | + +The per-cluster selectors diagnostic endpoint (`GET /admin/RESOURCE_ID/selectors`) can be used to inspect the actual selector values for a given cluster. See [Admin API](./admin-api.md). + +### Selector Examples + +#### Match all clusters in registered subscriptions + +```json +[ + { + "key": "subscriptionState", + "operator": "in", + "values": ["Registered"] + } +] +``` + +#### Match clusters in registered or warned subscriptions + +```json +[ + { + "key": "subscriptionState", + "operator": "in", + "values": ["Registered", "Warned"] + } +] +``` + +#### Exclude a specific subscription + +```json +[ + { + "key": "subscriptionState", + "operator": "in", + "values": ["Registered"] + }, + { + "key": "subscriptionID", + "operator": "notin", + "values": ["EXCLUDED_SUBSCRIPTION_ID"] + } +] +``` + +Replace `EXCLUDED_SUBSCRIPTION_ID` with the subscription to exclude. + +#### Target a single specific cluster + +```json +[ + { + "key": "resourceID", + "operator": "eq", + "value": "/subscriptions/SUBSCRIPTION_ID/resourcegroups/RESOURCE_GROUP/providers/microsoft.redhatopenshift/openshiftclusters/CLUSTER_NAME" + } +] +``` + +Replace `SUBSCRIPTION_ID`, `RESOURCE_GROUP`, and `CLUSTER_NAME` with the target cluster's values. The resource ID must be lowercased. + +#### Target clusters in a specific subscription + +```json +[ + { + "key": "subscriptionID", + "operator": "eq", + "value": "TARGET_SUBSCRIPTION_ID" + } +] +``` + +Replace `TARGET_SUBSCRIPTION_ID` with the target subscription ID. + +#### Target only Workload Identity clusters on managed domains + +```json +[ + { + "key": "subscriptionState", + "operator": "in", + "values": ["Registered"] + }, + { + "key": "authenticationType", + "operator": "eq", + "value": "WorkloadIdentity" + }, + { + "key": "isManagedDomain", + "operator": "eq", + "value": "true" + } +] +``` + +#### Exclude clusters in a non-terminal provisioning state + +```json +[ + { + "key": "subscriptionState", + "operator": "in", + "values": ["Registered"] + }, + { + "key": "provisioningState", + "operator": "notin", + "values": ["Creating", "Deleting"] + } +] +``` + +### Selector Evaluation Rules + +1. **All selectors use AND logic.** A cluster must match every selector in the list to be included. +2. **Empty selectors match no clusters.** A schedule with zero selectors is rejected by the API with `400 Bad Request`. +3. **Unknown keys cause an error.** If a selector references a key not present in the cluster's selector data, the cluster is skipped and an error is logged. +4. **String comparison is exact.** All comparisons are case-sensitive string matches. The `resourceID` key is always lowercased in the cluster cache. +5. **Selectors are evaluated per cluster, per schedule.** Each Scheduler poll cycle re-evaluates selectors against the current cluster cache, so changes to cluster or subscription state are reflected on the next cycle. + +## Combining Calendar and Selectors + +When designing a schedule, consider both the timing (calendar) and targeting (selectors) together. The task IDs used in these examples are defined in [`pkg/mimo/const.go`](../../pkg/mimo/const.go). + +### Example: Conservative Weekly Rollout + +A schedule that runs TLS certificate rotation every Monday, spread across 24 hours, targeting only registered subscriptions: + +```json +{ + "state": "Enabled", + "maintenanceTaskID": "9b741734-6505-447f-8510-85eb0ae561a2", + "schedule": "Mon *-*-* 00:00", + "lookForwardCount": 4, + "scheduleAcross": "24h", + "selectors": [ + { + "key": "subscriptionState", + "operator": "in", + "values": ["Registered"] + } + ] +} +``` + +The result: each Monday at midnight UTC, the Scheduler begins creating manifests. Each cluster's manifest has a `runAfter` time calculated as Monday 00:00 UTC plus its deterministic offset within the 24-hour `scheduleAcross` window. The [Actuator](./actuator.md) executes each manifest after its `runAfter` time. + +### Example: Testing on a Single Cluster + +A schedule targeting a single cluster for validation before fleet-wide rollout: + +```json +{ + "state": "Enabled", + "maintenanceTaskID": "9b741734-6505-447f-8510-85eb0ae561a2", + "schedule": "*-*-* 12:00", + "lookForwardCount": 1, + "scheduleAcross": "0s", + "selectors": [ + { + "key": "resourceID", + "operator": "eq", + "value": "/subscriptions/SUBSCRIPTION_ID/resourcegroups/RESOURCE_GROUP/providers/microsoft.redhatopenshift/openshiftclusters/CLUSTER_NAME" + } + ] +} +``` + +Replace `SUBSCRIPTION_ID`, `RESOURCE_GROUP`, and `CLUSTER_NAME` with your test cluster's values. + +The result: a manifest is created daily at noon UTC for the single specified cluster with `scheduleAcross` of `0s` (no spread, immediate execution). diff --git a/docs/mimo/scheduler.md b/docs/mimo/scheduler.md index 8a457798a25..15ec6fd6d3e 100644 --- a/docs/mimo/scheduler.md +++ b/docs/mimo/scheduler.md @@ -1,3 +1,87 @@ -# MIMO Scheduler +# Managed Infrastructure Maintenance Operator: Scheduler -The MIMO Scheduler is a planned component, but is not yet implemented. +The Scheduler is the MIMO component responsible for autonomously creating [Maintenance Manifests](./maintenance-manifest-lifecycle.md) based on configured `MaintenanceSchedule` objects. While the [Actuator](./actuator.md) executes maintenance tasks on individual clusters, the Scheduler determines when and where those tasks should run across the fleet. + +## Architecture + +The Scheduler runs alongside the Actuator and shares key infrastructure: + +- **Bucket Partitioning**: The Scheduler uses the same bucket partitioning scheme as the Actuator (see [`pkg/util/bucket/`](../../pkg/util/bucket/)). Each Scheduler instance owns the same subset of cluster buckets as its co-located Actuator. +- **Shared Database**: Both components share the same database. The Scheduler writes `MaintenanceManifest` documents that the Actuator subsequently reads and executes. +- **Cluster Cache**: The Scheduler maintains a cache of cluster and subscription metadata, updated via database change notifications. This cache is used for selector evaluation. + +The Scheduler exposes a `/healthz/ready` endpoint for readiness probes. + +## Key Concepts + +### MaintenanceSchedule + +A `MaintenanceSchedule` defines a recurring maintenance operation applied across the fleet. Each schedule specifies: + +- **Which task** to run (`maintenanceTaskID`) +- **Which clusters** it applies to (via `selectors`) +- **When to run** (via `schedule` in [calendar format](./scheduler-calendar-and-selectors.md#calendar-format)) +- **How quickly** to roll out across matching clusters (`scheduleAcross`) +- **How far ahead** to pre-create manifests (`lookForwardCount`) + +Schedules have two states: `Enabled` (actively creating manifests) and `Disabled` (skipped by the Scheduler). + +See the [Admin API](./admin-api.md) for schedule management endpoints. + +### Calendar Format + +Schedules use a systemd-style calendar event format: + +``` +[Weekday] YYYY-MM-DD HH:MM[:SS] +``` + +Minutes are restricted to `0`, `15`, `30`, or `45`. The seconds component is optional and must be `00` if provided. Wildcards (`*`) and comma-separated lists are supported. + +Examples: +- `Mon *-*-* 00:00` -- Every Monday at midnight +- `*-*-* 06:00` -- Every day at 06:00 UTC +- `*-*-* *:0,30` -- Every hour at the top and bottom of the hour + +See [Scheduler Calendar and Selectors](./scheduler-calendar-and-selectors.md) for the full format specification. + +### Selectors + +Selectors determine which clusters a schedule applies to, using a syntax similar to Kubernetes label selectors. A schedule must have at least one selector. Supported operators are `eq`, `in`, and `notin`. Nine selector keys are available, covering cluster identity, subscription state, authentication type, architecture version, provisioning state, network configuration, and domain type. + +See [Scheduler Calendar and Selectors](./scheduler-calendar-and-selectors.md#selectors) for the full list of keys and operators. + +### Schedule Across (Thundering Herd Prevention) + +The `scheduleAcross` field defines a time window over which manifest execution is spread. Each cluster's position within the window is calculated deterministically: + +``` +position = CRC32(lowercase(clusterResourceID)) / MaxUint32 +runAfter = scheduleStartTime + (position * scheduleAcross) +``` + +This ensures the same cluster always runs at the same relative offset, clusters are evenly distributed, and no coordination between Scheduler instances is required. + +### Look Forward Count + +The `lookForwardCount` field controls how many future schedule periods the Scheduler pre-creates manifests for. For a weekly schedule with `lookForwardCount: 5`, manifests are created for the next 5 weeks. This is count-based, not duration-based. + +## Data Flow + +```mermaid +graph LR; + AdminAPI[Admin API] -->|write| Schedules[MaintenanceSchedules] + Schedules -->|read| Scheduler + ClustersAndSubs[Clusters + Subscriptions] -->|cache| Scheduler + Scheduler -->|create| Manifests[MaintenanceManifests] + Manifests -->|read| Actuator +``` + +1. An operator creates a `MaintenanceSchedule` via the [Admin API](./admin-api.md) +2. The Scheduler polls for enabled schedules +3. For each schedule, the Scheduler iterates over its owned cluster buckets +4. For each cluster matching the schedule's selectors, the Scheduler calculates the next `lookForwardCount` run times +5. If no manifest already exists for a given run time, the Scheduler creates a `MaintenanceManifest` with the calculated `runAfter` timestamp +6. The Actuator picks up the manifest and executes the task at the appropriate time + +The Scheduler is idempotent: if interrupted mid-cycle, the next iteration produces the same result. When a schedule is updated, the Scheduler cancels any pending manifests whose timing no longer matches the current settings.