Switch pod-scheduler configs from JSON to YAML; add JSON Schema#2170
Switch pod-scheduler configs from JSON to YAML; add JSON Schema#2170LoopedBard3 wants to merge 5 commits intoaspnet:mainfrom
Conversation
The three pod configs in build/ were JSON, which made them hard to read and didn't allow inline comments. This converts them to YAML, ships a JSON Schema describing the shape, and changes scenario.type from a magic 1/2/3 integer to a self-documenting single/dual/triple string. * build/benchmarks_ci_pods.yml, benchmarks_ci_azure_pods.yml, and benchmarks_ci_cobalt_pods.yml replace their .json predecessors. Each opens with a yaml-language-server # yaml-language-server: schema=... directive so VS Code / Cursor / the YAML LSP provide autocomplete, hover docs, and inline validation while editing. * scripts/pod-scheduler/pod-config.schema.json (new) is the schema. The schema descriptions also serve as reliable grounding for LLM-driven edits. * config_loader.py now dispatches on file extension and accepts both YAML and JSON. scenario.type accepts single|dual|triple (case-insensitive) plus the legacy integer 1|2|3 for back-compat; bools (which YAML parsers happily produce from yes/no) are rejected explicitly. Unknown extensions and unknown type strings raise ConfigError so typos can't silently drop scenarios. * Generated benchmarks-ci-*.yml files only change the embedded regen command in the file header (.json -> .yml). Schedule data is byte-identical, verified by the existing snapshot test. * Tests: 5 new cases covering YAML loading, type-string aliases, type back-compat, bool/invalid rejection, and unknown-extension rejection. Default fixture writer now produces YAML so the YAML path is exercised on every test. * Docs: README.md, build/README.md, and scripts/pod-scheduler/README.md are updated with YAML examples and the schema-directive convention. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR migrates the pod-scheduler’s source-of-truth configs in build/ from JSON to YAML to improve readability (comments) and editor support, and introduces a JSON Schema to validate/configure those YAML files while keeping JSON loading for backward compatibility.
Changes:
- Converted the three
build/benchmarks_ci_*_pods.jsonconfigs to.ymlequivalents and updated references/docs accordingly. - Added
scripts/pod-scheduler/pod-config.schema.jsonand adoptedyaml-language-server$schemadirectives in shipped configs. - Updated the loader to dispatch by extension (YAML/JSON) and to accept
scenario.typeassingle|dual|triple(case-insensitive) while still allowing legacy1|2|3.
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/pod-scheduler/config_loader.py | Load YAML/JSON by extension; add scenario type aliasing and validation. |
| scripts/pod-scheduler/pod-config.schema.json | New JSON Schema describing the YAML/JSON config shape for LSP validation. |
| scripts/pod-scheduler/models.py | Update docs/comments to reflect YAML/JSON config sources. |
| scripts/pod-scheduler/generator.py | Update generated header regen command to reference .yml. |
| scripts/pod-scheduler/scheduler.py | Docstring wording updated from JSON-specific to config-generic. |
| scripts/pod-scheduler/main.py | CLI help/usage updated to YAML/JSON. |
| scripts/pod-scheduler/README.md | Docs updated with YAML examples and schema-driven editing guidance. |
| scripts/pod-scheduler/tests/test_config_loader.py | Add YAML-path tests + scenario.type alias/back-compat coverage. |
| scripts/pod-scheduler/tests/test_generator.py | Update expected filenames/paths from .json to .yml. |
| scripts/pod-scheduler/tests/test_snapshots.py | Snapshot inputs updated to new .yml configs. |
| build/benchmarks_ci_pods.yml | New YAML config replacing JSON for on-prem CI pods. |
| build/benchmarks_ci_azure_pods.yml | New YAML config replacing JSON for Azure CI pods. |
| build/benchmarks_ci_cobalt_pods.yml | New YAML config replacing JSON for Cobalt CI pods. |
| build/benchmarks_ci_pods.json | Removed legacy JSON config (replaced by YAML). |
| build/benchmarks_ci_azure_pods.json | Removed legacy JSON config (replaced by YAML). |
| build/benchmarks_ci_cobalt_pods.json | Removed legacy JSON config (replaced by YAML). |
| build/benchmarks-ci-01.yml | Regenerated header updated to .yml config path. |
| build/benchmarks-ci-02.yml | Regenerated header updated to .yml config path. |
| build/benchmarks-ci-azure.yml | Regenerated header updated to .yml config path. |
| build/benchmarks-ci-cobalt.yml | Regenerated header updated to .yml config path. |
| build/README.md | Docs updated to reference YAML configs and schema directive. |
| README.md | Top-level doc updated to say pipelines are generated from YAML configs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Three of the review comments on aspnet#2170 flagged the same class of issue around YAML/Python edge cases. Addressing them all here: * scripts/pod-scheduler/requirements.txt (new) declares the runtime PyYAML dependency that landed with the YAML migration. Without this, a clean checkout fails with ModuleNotFoundError. * scripts/pod-scheduler/README.md gains a Prerequisites subsection pointing at requirements.txt so contributors know what to install. * config_loader._reject_bool / _coerce_int / _coerce_float (new helpers) explicitly reject Python bools in numeric fields. Previously `timeout: yes` would slip through `int(timeout)` and resolve to 1 because bool is a subclass of int. The new helpers also enforce range minimums (target_yaml_count >= 1, schedule_offset_hours >= 0, timeout >= 1, estimated_runtime >= 0) so out-of-band values fail loudly at load time. * test_config_loader.py adds 6 cases covering bool rejection on each numeric field plus negative-runtime and below-minimum target-count. All 55 tests pass. The schema already used "type": "integer" for these fields (which in JSON Schema 2020-12 explicitly excludes booleans), so editor-side validation was already correct; this commit closes the runtime gap. Generated pipeline YAML still regenerates byte-identical to HEAD. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Per @sebastienros's review feedback on aspnet#2170, the named-key form for `machines` / `profiles` was visually noisy when almost every pod follows the same SUT-load-DB convention. Both blocks now accept either: * a positional list of 1-3 entries (`[sut]` / `[sut, load]` / `[sut, load, db]`), preferred for new pods, or * the existing named-dict form (`{sut: ..., load: ..., db: ...}`), still supported. The two blocks for a single pod must use the same form and declare the same set of roles; mixing list and dict, or differing lengths, raises `ConfigError` at load time so the role mapping can't drift silently. Schema (pod-config.schema.json) gets a `oneOf` for both blocks; the YAML LSP catches shape errors at edit time. * `config_loader._normalize_roles` resolves either form to a canonical ordered dict and rejects unknown roles, bools, non-strings, and empty values. The pod loader then enforces same-shape and same-role-set between machines and profiles before constructing the Pod. * All three committed configs (`benchmarks_ci_pods.yml`, `benchmarks_ci_azure_pods.yml`, `benchmarks_ci_cobalt_pods.yml`) switch to the shorthand form. Each gets a header comment above its pods block explaining the convention. Pods that re-use a machine across roles (e.g. `cobalt-cloud-lin-azl3-dual` puts the db box in the `load` slot) keep the inline comment that flags the reuse. * `scripts/pod-scheduler/README.md` gains a Pod Definition section showing both forms and recommending the list form for new pods. * test_config_loader.py adds 10 cases covering shorthand happy paths (1/2/3 entries), mixed-shape rejection, length mismatch, empty/too-long lists, bool entries, and unknown-role keys. All 65 tests pass. Generated pipeline YAMLs are byte-identical to HEAD: this is purely an input-format change, the schedule data is unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| machines: { sut: intel-lin, load: intel-load, db: intel-db } | ||
| profiles: { sut: intel-lin-app, load: intel-load-load, db: intel-db-db } | ||
| machines: [intel-lin, intel-load, intel-db] | ||
| profiles: [intel-lin-app, intel-load-load, intel-db-db] |
There was a problem hiding this comment.
What I meant is that the one of these should not need the role name (application, load, db), scheduling doesn't care about it.
| # of the same stage when both run a triple-type scenario. | ||
| machines: { sut: intel-win, load: intel-load2, db: intel-db } | ||
| profiles: { sut: intel-win-app, load: intel-load2-load, db: intel-db-db } | ||
| machines: [intel-win, intel-load2, intel-db] |
There was a problem hiding this comment.
Feels weird to use the json syntax in yml (though totally valid)
machines:
- intel-win
- intel-load2
- intel-db
Also note that YML "IS" JSON, so a JSON document is valid yaml, in case in python you'd want to only handle the yml parsing and not check the extension, just say it's yaml.
Two simplifications from @sebastienros's round 3 review on aspnet#2170: * Drop the per-extension dispatch in config_loader._load_raw. YAML is a strict superset of JSON, so yaml.safe_load handles both. Old .json configs still load via the YAML parser, and the loader no longer cares about the file extension. _load_raw is a one-liner now. * Reshape pod definitions into parallel arrays. Both `machines` and `profiles` are flat 1-3 entry lists; machines[i] runs profiles[i]. The role of slot i is implicit by length (1=SUT, 2=SUT+load, 3=SUT+load+db) so neither list carries explicit `sut`/`load`/`db` keys. The named-dict form is no longer accepted. The loader now enforces, per pod: * len(machines) == len(profiles), so each machine pairs with exactly one profile. * machines and profiles are both lists of 1-3 unique non-empty strings. * bools (yes/no/true/false from YAML) are rejected explicitly. Per-pod uniqueness is enforced; global `machine -> single profile` isn't, because real configs legitimately reuse a physical box across roles in different pods (e.g. azure2-db is db for one pod and load for another). Test `test_global_machine_can_have_different_profiles_across_pods` documents that as intended. Configs: * All three .yml configs reformatted from inline flow style `[a, b, c]` to block style and converted to the parallel-array form. Each retains the existing inline comment for pods that re-use machines across roles. The header comment over each file's pods block now describes the parallel-array semantics. * Schema (`pod-config.schema.json`) tightens machines/profiles to `array of unique strings, minItems 1, maxItems 3`; the old oneOf with the dict variant is gone. * README's Pod Definition section rewritten around the new shape. Tests: * Drop tests for removed behavior (extension rejection, dict form, named-role rejection, mixed-shape rejection). * Add `TestPodRoles` covering single/dual/triple happy paths, length-mismatch, machines uniqueness, profiles uniqueness, empty/too- long lists, dict form rejection, bool entries, and the global same-machine/different-profile case. * 66 tests pass (was 65). Generated pipeline YAMLs are byte-identical to HEAD; this is purely an input-format simplification. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Future-proofing tweak following the round-3 review on aspnet#2170. Removes three of the four `maxItems: 3` hardcoding sites so adding a fourth role later (e.g. a second load machine) only requires a schema bump. * models.ScenarioType IntEnum is gone. `Scenario.type` is now a plain `int` (1/2/3 == single/dual/triple). `SCENARIO_TYPE_SINGLE` / `DUAL` / `TRIPLE` module constants stay for readable references. `DEFAULT_RUNTIMES` is keyed by int. * `Pod` no longer has `sut` / `load` / `db` / `*_profile` fields. The dataclass stores `machines: List[str]` and `profiles: List[str]` (parallel arrays). Backward-compat properties (`pod.sut`, `pod.load`, etc.) read from the lists, so call sites that need a named slot (the conflicts diagnostic) keep working unchanged. * `pod.machines_for_type(t)` and `profiles_for_type(t)` are slices (`self.machines[:t]`); `validate(t)` is just `t > len(machines)`. No more 2/3 magic numbers branching on which fields are populated. * `__post_init__` enforces `len(machines) == len(profiles)` and non-empty machines, so an invalid Pod can't be constructed. * `main.print_pod_conflicts` iterates `pod.machines` against `ROLE_NAMES` (a new module-level constant) for the diagnostic output, so adding a 4th role just needs a name appended to `ROLE_NAMES` -- everything else falls out of `len(pod.machines)`. * All call sites and tests updated. `test_models` gains 4 cases exercising the new constructor invariants and the backward-compat property accessors. 70 tests pass (was 66). The schema's `maxItems: 3` is now the single remaining `3` hardcoding -- that's the user-facing constraint, which is the right place for it. Generated pipeline YAMLs are byte-identical to HEAD; this is purely an internal model refactor. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Why
Follow-up to #2167. The pod-scheduler configs introduced there are JSON, which makes them awkward to read and edit — there are no comments, lots of quote/comma noise, and
type: 3is a magic integer you have to grepmodels.pyto understand. This is also rough for LLM-driven edits, which tend to revert to literal JSON and don't have anywhere to anchor docs.This PR switches the configs to YAML, ships a JSON Schema, and makes
scenario.typea self-documenting string.What
Format
build/benchmarks_ci{,_azure,_cobalt}_pods.json→.yml. Each opens with:# yaml-language-server: $schema=../scripts/pod-scheduler/pod-config.schema.jsongold-linandgold-winsharegold-db, or that the-28cobalt-hosted pods reuse the same physical machines as their non-28 siblings.Self-documenting
typetype: 1|2|3→type: single|dual|triple(case-insensitive). Integer form is still accepted so any local.jsoncopies keep loading; bools (the YAMLyes/notrap) and unknown strings raiseConfigError.Schema
scripts/pod-scheduler/pod-config.schema.json(Draft 2020-12). Schema descriptions double as grounding for LLM agents — they should hallucinate fewer keys and wrongtypevalues.jsonschemapackage.Loader
config_loader.pydispatches on file extension:.yml/.yamlvia PyYAML,.jsonvia stdlib (back-compat), unknown extensions →ConfigError.Generated pipelines
benchmarks-ci-01.yml,-02.yml,-azure.yml,-cobalt.ymlregenerated. Only the embedded regen-command header line changed (.json→.yml); the snapshot test confirms scheduling output is byte-identical.Tests
typewith case variations.type.typerejection..jsonback-compat happy path (default fixture writer is now YAML, so every existing test exercises the YAML path).Docs
README.md,build/README.md, andscripts/pod-scheduler/README.mdupdated with YAML examples and a section on the schema directive.Verification
The snapshot test enforces that regenerating from the new YAML configs produces byte-identical pipeline YAML, so this is a config/format refactor with no behavioral change.
Notes for reviewers
_format_source_pathrepo-relative URL behavior; it now expects.ymlpaths in the test, which matches whatmain.pyis invoked with from CI._parse_scenario_typeexplicitly rejectsboolbecauseyaml.safe_loadhappily turnsyes/no/true/falseinto Python bools, which would otherwise quietly resolve to1/0via the integer alias. Better to fail loudly.additionalProperties: falseon pods/scenarios so a misspelled key (e.g.mahcines:instead ofmachines:) is caught at edit-time by the YAML LSP.