Skip to content

Switch pod-scheduler configs from JSON to YAML; add JSON Schema#2170

Draft
LoopedBard3 wants to merge 5 commits intoaspnet:mainfrom
LoopedBard3:loopedbard3/pod-config-format-options
Draft

Switch pod-scheduler configs from JSON to YAML; add JSON Schema#2170
LoopedBard3 wants to merge 5 commits intoaspnet:mainfrom
LoopedBard3:loopedbard3/pod-config-format-options

Conversation

@LoopedBard3
Copy link
Copy Markdown
Contributor

@LoopedBard3 LoopedBard3 commented May 7, 2026

Why

Follow-up to #2167. The pod-scheduler configs introduced there are JSON, which makes them awkward to read and edit — there are no comments, lots of quote/comma noise, and type: 3 is a magic integer you have to grep models.py to understand. This is also rough for LLM-driven edits, which tend to revert to literal JSON and don't have anywhere to anchor docs.

This PR switches the configs to YAML, ships a JSON Schema, and makes scenario.type a self-documenting string.

What

Format

  • Move build/benchmarks_ci{,_azure,_cobalt}_pods.json.yml. Each opens with:
    # yaml-language-server: $schema=../scripts/pod-scheduler/pod-config.schema.json
    so VS Code / Cursor / the Red Hat YAML extension provide autocomplete, hover docs, and inline validation while editing.
  • Inline comments now document things that used to live only in PR review threads — e.g., that gold-lin and gold-win share gold-db, or that the -28 cobalt-hosted pods reuse the same physical machines as their non-28 siblings.

Self-documenting type

  • type: 1|2|3type: single|dual|triple (case-insensitive). Integer form is still accepted so any local .json copies keep loading; bools (the YAML yes/no trap) and unknown strings raise ConfigError.

Schema

  • New scripts/pod-scheduler/pod-config.schema.json (Draft 2020-12). Schema descriptions double as grounding for LLM agents — they should hallucinate fewer keys and wrong type values.
  • Schema and configs validated locally with the jsonschema package.

Loader

  • config_loader.py dispatches on file extension: .yml/.yaml via PyYAML, .json via stdlib (back-compat), unknown extensions → ConfigError.

Generated pipelines

  • benchmarks-ci-01.yml, -02.yml, -azure.yml, -cobalt.yml regenerated. Only the embedded regen-command header line changed (.json.yml); the snapshot test confirms scheduling output is byte-identical.

Tests

  • 49 unit tests pass. 5 new cases:
    • All three string aliases for type with case variations.
    • Integer back-compat for type.
    • Bool and unknown-string type rejection.
    • .json back-compat happy path (default fixture writer is now YAML, so every existing test exercises the YAML path).
    • Unknown-extension rejection.

Docs

  • Repo README.md, build/README.md, and scripts/pod-scheduler/README.md updated with YAML examples and a section on the schema directive.

Verification

cd scripts/pod-scheduler
python -m unittest discover tests
# Ran 49 tests in 0.344s — OK

The snapshot test enforces that regenerating from the new YAML configs produces byte-identical pipeline YAML, so this is a config/format refactor with no behavioral change.

Notes for reviewers

  • I kept the _format_source_path repo-relative URL behavior; it now expects .yml paths in the test, which matches what main.py is invoked with from CI.
  • _parse_scenario_type explicitly rejects bool because yaml.safe_load happily turns yes/no/true/false into Python bools, which would otherwise quietly resolve to 1/0 via the integer alias. Better to fail loudly.
  • Schema asserts additionalProperties: false on pods/scenarios so a misspelled key (e.g. mahcines: instead of machines:) is caught at edit-time by the YAML LSP.

The three pod configs in build/ were JSON, which made them hard to read
and didn't allow inline comments. This converts them to YAML, ships a
JSON Schema describing the shape, and changes scenario.type from a
magic 1/2/3 integer to a self-documenting single/dual/triple string.

* build/benchmarks_ci_pods.yml, benchmarks_ci_azure_pods.yml, and
  benchmarks_ci_cobalt_pods.yml replace their .json predecessors. Each
  opens with a yaml-language-server # yaml-language-server: schema=...
  directive so VS Code / Cursor / the YAML LSP provide autocomplete,
  hover docs, and inline validation while editing.
* scripts/pod-scheduler/pod-config.schema.json (new) is the schema. The
  schema descriptions also serve as reliable grounding for LLM-driven
  edits.
* config_loader.py now dispatches on file extension and accepts both
  YAML and JSON. scenario.type accepts single|dual|triple
  (case-insensitive) plus the legacy integer 1|2|3 for back-compat;
  bools (which YAML parsers happily produce from yes/no) are rejected
  explicitly. Unknown extensions and unknown type strings raise
  ConfigError so typos can't silently drop scenarios.
* Generated benchmarks-ci-*.yml files only change the embedded regen
  command in the file header (.json -> .yml). Schedule data is
  byte-identical, verified by the existing snapshot test.
* Tests: 5 new cases covering YAML loading, type-string aliases, type
  back-compat, bool/invalid rejection, and unknown-extension rejection.
  Default fixture writer now produces YAML so the YAML path is
  exercised on every test.
* Docs: README.md, build/README.md, and scripts/pod-scheduler/README.md
  are updated with YAML examples and the schema-directive convention.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR migrates the pod-scheduler’s source-of-truth configs in build/ from JSON to YAML to improve readability (comments) and editor support, and introduces a JSON Schema to validate/configure those YAML files while keeping JSON loading for backward compatibility.

Changes:

  • Converted the three build/benchmarks_ci_*_pods.json configs to .yml equivalents and updated references/docs accordingly.
  • Added scripts/pod-scheduler/pod-config.schema.json and adopted yaml-language-server $schema directives in shipped configs.
  • Updated the loader to dispatch by extension (YAML/JSON) and to accept scenario.type as single|dual|triple (case-insensitive) while still allowing legacy 1|2|3.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
scripts/pod-scheduler/config_loader.py Load YAML/JSON by extension; add scenario type aliasing and validation.
scripts/pod-scheduler/pod-config.schema.json New JSON Schema describing the YAML/JSON config shape for LSP validation.
scripts/pod-scheduler/models.py Update docs/comments to reflect YAML/JSON config sources.
scripts/pod-scheduler/generator.py Update generated header regen command to reference .yml.
scripts/pod-scheduler/scheduler.py Docstring wording updated from JSON-specific to config-generic.
scripts/pod-scheduler/main.py CLI help/usage updated to YAML/JSON.
scripts/pod-scheduler/README.md Docs updated with YAML examples and schema-driven editing guidance.
scripts/pod-scheduler/tests/test_config_loader.py Add YAML-path tests + scenario.type alias/back-compat coverage.
scripts/pod-scheduler/tests/test_generator.py Update expected filenames/paths from .json to .yml.
scripts/pod-scheduler/tests/test_snapshots.py Snapshot inputs updated to new .yml configs.
build/benchmarks_ci_pods.yml New YAML config replacing JSON for on-prem CI pods.
build/benchmarks_ci_azure_pods.yml New YAML config replacing JSON for Azure CI pods.
build/benchmarks_ci_cobalt_pods.yml New YAML config replacing JSON for Cobalt CI pods.
build/benchmarks_ci_pods.json Removed legacy JSON config (replaced by YAML).
build/benchmarks_ci_azure_pods.json Removed legacy JSON config (replaced by YAML).
build/benchmarks_ci_cobalt_pods.json Removed legacy JSON config (replaced by YAML).
build/benchmarks-ci-01.yml Regenerated header updated to .yml config path.
build/benchmarks-ci-02.yml Regenerated header updated to .yml config path.
build/benchmarks-ci-azure.yml Regenerated header updated to .yml config path.
build/benchmarks-ci-cobalt.yml Regenerated header updated to .yml config path.
build/README.md Docs updated to reference YAML configs and schema directive.
README.md Top-level doc updated to say pipelines are generated from YAML configs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/pod-scheduler/config_loader.py
Comment thread scripts/pod-scheduler/config_loader.py Outdated
Comment thread scripts/pod-scheduler/README.md
Comment thread build/benchmarks_ci_azure_pods.yml Outdated
LoopedBard3 and others added 2 commits May 7, 2026 14:45
Three of the review comments on aspnet#2170 flagged the same
class of issue around YAML/Python edge cases. Addressing them all here:

* scripts/pod-scheduler/requirements.txt (new) declares the runtime
  PyYAML dependency that landed with the YAML migration. Without this,
  a clean checkout fails with ModuleNotFoundError.
* scripts/pod-scheduler/README.md gains a Prerequisites subsection
  pointing at requirements.txt so contributors know what to install.
* config_loader._reject_bool / _coerce_int / _coerce_float (new
  helpers) explicitly reject Python bools in numeric fields. Previously
  `timeout: yes` would slip through `int(timeout)` and resolve to
  1 because bool is a subclass of int. The new helpers also enforce
  range minimums (target_yaml_count >= 1, schedule_offset_hours >= 0,
  timeout >= 1, estimated_runtime >= 0) so out-of-band values fail
  loudly at load time.
* test_config_loader.py adds 6 cases covering bool rejection on each
  numeric field plus negative-runtime and below-minimum target-count.
  All 55 tests pass.

The schema already used "type": "integer" for these fields (which in
JSON Schema 2020-12 explicitly excludes booleans), so editor-side
validation was already correct; this commit closes the runtime gap.

Generated pipeline YAML still regenerates byte-identical to HEAD.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Per @sebastienros's review feedback on aspnet#2170, the
named-key form for `machines` / `profiles` was visually noisy when
almost every pod follows the same SUT-load-DB convention. Both blocks
now accept either:

* a positional list of 1-3 entries (`[sut]` / `[sut, load]` /
  `[sut, load, db]`), preferred for new pods, or
* the existing named-dict form (`{sut: ..., load: ..., db: ...}`),
  still supported.

The two blocks for a single pod must use the same form and declare the
same set of roles; mixing list and dict, or differing lengths, raises
`ConfigError` at load time so the role mapping can't drift silently.
Schema (pod-config.schema.json) gets a `oneOf` for both blocks; the
YAML LSP catches shape errors at edit time.

* `config_loader._normalize_roles` resolves either form to a canonical
  ordered dict and rejects unknown roles, bools, non-strings, and empty
  values. The pod loader then enforces same-shape and same-role-set
  between machines and profiles before constructing the Pod.
* All three committed configs (`benchmarks_ci_pods.yml`,
  `benchmarks_ci_azure_pods.yml`, `benchmarks_ci_cobalt_pods.yml`)
  switch to the shorthand form. Each gets a header comment above its
  pods block explaining the convention. Pods that re-use a machine
  across roles (e.g. `cobalt-cloud-lin-azl3-dual` puts the db box in
  the `load` slot) keep the inline comment that flags the reuse.
* `scripts/pod-scheduler/README.md` gains a Pod Definition section
  showing both forms and recommending the list form for new pods.
* test_config_loader.py adds 10 cases covering shorthand happy paths
  (1/2/3 entries), mixed-shape rejection, length mismatch, empty/too-long
  lists, bool entries, and unknown-role keys. All 65 tests pass.

Generated pipeline YAMLs are byte-identical to HEAD: this is purely an
input-format change, the schedule data is unchanged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread build/benchmarks_ci_pods.yml Outdated
machines: { sut: intel-lin, load: intel-load, db: intel-db }
profiles: { sut: intel-lin-app, load: intel-load-load, db: intel-db-db }
machines: [intel-lin, intel-load, intel-db]
profiles: [intel-lin-app, intel-load-load, intel-db-db]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant is that the one of these should not need the role name (application, load, db), scheduling doesn't care about it.

Comment thread build/benchmarks_ci_pods.yml Outdated
# of the same stage when both run a triple-type scenario.
machines: { sut: intel-win, load: intel-load2, db: intel-db }
profiles: { sut: intel-win-app, load: intel-load2-load, db: intel-db-db }
machines: [intel-win, intel-load2, intel-db]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels weird to use the json syntax in yml (though totally valid)

machines:
- intel-win
- intel-load2
- intel-db

Also note that YML "IS" JSON, so a JSON document is valid yaml, in case in python you'd want to only handle the yml parsing and not check the extension, just say it's yaml.

LoopedBard3 and others added 2 commits May 7, 2026 15:48
Two simplifications from @sebastienros's round 3 review on
aspnet#2170:

* Drop the per-extension dispatch in config_loader._load_raw. YAML is a
  strict superset of JSON, so yaml.safe_load handles both. Old .json
  configs still load via the YAML parser, and the loader no longer cares
  about the file extension. _load_raw is a one-liner now.
* Reshape pod definitions into parallel arrays. Both `machines` and
  `profiles` are flat 1-3 entry lists; machines[i] runs profiles[i].
  The role of slot i is implicit by length (1=SUT, 2=SUT+load,
  3=SUT+load+db) so neither list carries explicit `sut`/`load`/`db`
  keys. The named-dict form is no longer accepted.

The loader now enforces, per pod:

* len(machines) == len(profiles), so each machine pairs with exactly one
  profile.
* machines and profiles are both lists of 1-3 unique non-empty strings.
* bools (yes/no/true/false from YAML) are rejected explicitly.

Per-pod uniqueness is enforced; global `machine -> single profile`
isn't, because real configs legitimately reuse a physical box across
roles in different pods (e.g. azure2-db is db for one pod and load for
another). Test `test_global_machine_can_have_different_profiles_across_pods`
documents that as intended.

Configs:

* All three .yml configs reformatted from inline flow style
  `[a, b, c]` to block style and converted to the parallel-array
  form. Each retains the existing inline comment for pods that re-use
  machines across roles. The header comment over each file's pods block
  now describes the parallel-array semantics.
* Schema (`pod-config.schema.json`) tightens machines/profiles to
  `array of unique strings, minItems 1, maxItems 3`; the old oneOf
  with the dict variant is gone.
* README's Pod Definition section rewritten around the new shape.

Tests:

* Drop tests for removed behavior (extension rejection, dict form,
  named-role rejection, mixed-shape rejection).
* Add `TestPodRoles` covering single/dual/triple happy paths,
  length-mismatch, machines uniqueness, profiles uniqueness, empty/too-
  long lists, dict form rejection, bool entries, and the global
  same-machine/different-profile case.
* 66 tests pass (was 65).

Generated pipeline YAMLs are byte-identical to HEAD; this is purely an
input-format simplification.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Future-proofing tweak following the round-3 review on
aspnet#2170. Removes three of the four `maxItems: 3`
hardcoding sites so adding a fourth role later (e.g. a second load
machine) only requires a schema bump.

* models.ScenarioType IntEnum is gone. `Scenario.type` is now a plain
  `int` (1/2/3 == single/dual/triple). `SCENARIO_TYPE_SINGLE` /
  `DUAL` / `TRIPLE` module constants stay for readable references.
  `DEFAULT_RUNTIMES` is keyed by int.
* `Pod` no longer has `sut` / `load` / `db` / `*_profile`
  fields. The dataclass stores `machines: List[str]` and
  `profiles: List[str]` (parallel arrays). Backward-compat properties
  (`pod.sut`, `pod.load`, etc.) read from the lists, so call sites
  that need a named slot (the conflicts diagnostic) keep working
  unchanged.
* `pod.machines_for_type(t)` and `profiles_for_type(t)` are slices
  (`self.machines[:t]`); `validate(t)` is just `t > len(machines)`.
  No more 2/3 magic numbers branching on which fields are populated.
* `__post_init__` enforces `len(machines) == len(profiles)` and
  non-empty machines, so an invalid Pod can't be constructed.
* `main.print_pod_conflicts` iterates `pod.machines` against
  `ROLE_NAMES` (a new module-level constant) for the diagnostic
  output, so adding a 4th role just needs a name appended to
  `ROLE_NAMES` -- everything else falls out of `len(pod.machines)`.
* All call sites and tests updated. `test_models` gains 4 cases
  exercising the new constructor invariants and the backward-compat
  property accessors. 70 tests pass (was 66).

The schema's `maxItems: 3` is now the single remaining `3`
hardcoding -- that's the user-facing constraint, which is the right
place for it.

Generated pipeline YAMLs are byte-identical to HEAD; this is purely an
internal model refactor.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants