Skip to content

Move cobalt cloud jobs from benchmarks-ci-azure-eastus2 into benchmarks-ci-azure#2166

Merged
LoopedBard3 merged 1 commit intoaspnet:mainfrom
LoopedBard3:MoveEastUS2ToMainAzure
Apr 29, 2026
Merged

Move cobalt cloud jobs from benchmarks-ci-azure-eastus2 into benchmarks-ci-azure#2166
LoopedBard3 merged 1 commit intoaspnet:mainfrom
LoopedBard3:MoveEastUS2ToMainAzure

Conversation

@LoopedBard3
Copy link
Copy Markdown
Contributor

@LoopedBard3 LoopedBard3 commented Apr 28, 2026

The cobalt cloud machines were moved from eastus2 to westus2, co-located with the other Azure benchmark machines. This merges the separate eastus2 pipeline back into the main Azure CI pipeline.

Changes

  • Merge cobalt cloud machines into benchmarks_ci_azure.json with machine_group support for proper scheduling
  • Regenerate benchmarks-ci-azure.yml using the crank-scheduler (14 groups with intelligent dependsOn to avoid machine conflicts)
  • Update benchmarks.template.liquid header with crank-scheduler usage instructions
  • Remove EAST US 2 MACHINES section header from azure.profile.yml
  • Delete separate eastus2 files: benchmarks-ci-azure-eastus2.yml, benchmarks.matrix.azure.eastus2.yml, benchmarks_ci_azure_eastus2.json
  • Remove benchmarks.matrix.azure.yml (replaced by JSON + crank-scheduler approach)
  • Remove cobaltcloud service bus queue — cobalt jobs now use the existing azure/azurearm64 queues

How the generated pipeline works (Used approach in #2106)

  • The crank-scheduler reads benchmarks_ci_azure.json (machines + scenarios) and produces benchmarks-ci-azure.yml via the liquid template
  • Machine groups ensure cobalt cloud machines only pair with each other (shared client/db machines)
  • Jobs sharing machines (e.g. cobalt-cloud-lin-server and cobalt-cloud-lin-server-azure-linux3 both use the same client and db) are serialized via dependsOn

Notes

  • IP addresses were already updated in Update cobalt cloud VM IP addresses for new Azure region #2165
  • Relay profiles in azure.profile.yml still reference cobaltcloudlin* relay endpoints on aspnetperf.servicebus.windows.net — these are Azure Relay Hybrid Connection URLs that require infrastructure-level reconfiguration

@LoopedBard3 LoopedBard3 force-pushed the MoveEastUS2ToMainAzure branch 3 times, most recently from 66cef0b to 0cf702f Compare April 28, 2026 23:02
- Merge cobalt cloud machines into benchmarks_ci_azure.json with machine_group support
- Regenerate benchmarks-ci-azure.yml using crank-scheduler (14 groups, handles machine conflicts)
- Update benchmarks.template.liquid header with scheduler instructions
- Remove separate eastus2 pipeline files (benchmarks-ci-azure-eastus2.yml, benchmarks.matrix.azure.eastus2.yml, benchmarks_ci_azure_eastus2.json)
- Remove benchmarks.matrix.azure.yml (replaced by JSON + scheduler approach)
- Remove cobaltcloud service bus queue (cobalt jobs now use azure/azurearm64 queues)
- Remove EAST US 2 MACHINES header from azure.profile.yml

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@LoopedBard3 LoopedBard3 force-pushed the MoveEastUS2ToMainAzure branch from 0cf702f to ba9a434 Compare April 28, 2026 23:08
@LoopedBard3
Copy link
Copy Markdown
Contributor Author

LoopedBard3 commented Apr 29, 2026

Verified that the pipeline still runs: https://dev.azure.com/dnceng/internal/_build/results?buildId=2963441&view=results

As part of this, I will also have the benchmarks-ci-azure-eastus2 pipeline deleted.

@LoopedBard3 LoopedBard3 marked this pull request as ready for review April 29, 2026 17:48
@LoopedBard3 LoopedBard3 self-assigned this Apr 29, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Consolidates the former benchmarks-ci-azure-eastus2 (cobalt cloud) pipeline into the main Azure CI pipeline by moving cobalt machines/scenarios into benchmarks_ci_azure.json, regenerating the combined benchmarks-ci-azure.yml, and removing the now-redundant eastus2/matrix files.

Changes:

  • Merge cobalt cloud machines + scenarios into build/benchmarks_ci_azure.json using machine_group.
  • Regenerate build/benchmarks-ci-azure.yml to include cobalt jobs and updated scheduling/serialization.
  • Remove separate eastus2 pipeline/config/matrix YAML files and clean up azure.profile.yml section header.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
build/benchmarks_ci_azure_eastus2.json Deleted legacy eastus2-only scheduler config.
build/benchmarks_ci_azure.json Adds machine_group to existing machines and introduces cobalt cloud machines + scenario targeting.
build/benchmarks.template.liquid Updates header comments to document crank-scheduler-based YAML generation.
build/benchmarks.matrix.azure.yml Deleted legacy matrix definition (superseded by JSON + scheduler).
build/benchmarks.matrix.azure.eastus2.yml Deleted legacy eastus2 matrix definition.
build/benchmarks-ci-azure.yml Regenerated combined Azure pipeline including cobalt jobs and new dependsOn sequencing.
build/benchmarks-ci-azure-eastus2.yml Deleted legacy eastus2 pipeline.
build/azure.profile.yml Removes the “EAST US 2 MACHINES” section header comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread build/benchmarks-ci-azure.yml
Comment thread build/benchmarks-ci-azure.yml
Comment thread build/benchmarks-ci-azure.yml
Comment thread build/benchmarks-ci-azure.yml
Comment thread build/benchmarks.template.liquid
@LoopedBard3 LoopedBard3 merged commit 8be8c1b into aspnet:main Apr 29, 2026
6 checks passed
@LoopedBard3 LoopedBard3 deleted the MoveEastUS2ToMainAzure branch April 29, 2026 19:33
LoopedBard3 added a commit to LoopedBard3/Benchmarks that referenced this pull request Apr 29, 2026
Reflects main branch changes from PR aspnet#2166:
- Merged cobalt-cloud-lin pods (eastus2) into azure config
- Removed separate benchmarks_ci_azure_eastus2_pods.json
- Fixed IDNA pod load profiles to match updated main:
  - idna-amd-lin now uses idna-amd-win as load
  - idna-intel-lin now uses idna-intel-win as load
  - idna-amd-win now uses idna-intel-lin as load
  - idna-intel-win now uses idna-amd-lin as load
- Added cobalt-cloud-lin-azl3-dual pod for type-2 scenarios
  (uses cobalt-cloud-lin-db as load instead of client)
- Total runs: 26 (matches main azure pipeline)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
LoopedBard3 added a commit to LoopedBard3/Benchmarks that referenced this pull request Apr 29, 2026
Reflects main branch changes from PR aspnet#2166:
- Merged cobalt-cloud-lin pods (eastus2) into azure config
- Removed separate benchmarks_ci_azure_eastus2_pods.json
- Kept IDNA pod load profiles on linux machines (load jobs
  require linux), reverting the main branch profile change
- Added cobalt-cloud-lin-azl3-dual pod for type-2 scenarios
  (uses cobalt-cloud-lin-db as load instead of client)
- Total runs: 26 (matches main azure pipeline)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
LoopedBard3 added a commit that referenced this pull request May 4, 2026
)

* Add pod-based crank scheduler prototype

Simplified alternative to PR #2106's full crank-scheduler. Uses a pod
model where machines are fixed groups (SUT + load + DB) instead of
individual machines with capability scoring and preferred partners.

Key simplifications:
- Pods define fixed machine groupings (no role priority/scoring)
- Shared machines between pods handled via collision detection
- Same greedy longest-job-first bin-packing algorithm
- Same Liquid template YAML generation
- ~570 lines vs ~2000 lines in the full scheduler

Includes:
- scripts/pod-scheduler/ (5 Python files + README)
- build/benchmarks_ci_pods.json (pod-based config for CI benchmarks)

* Add azure, azure-eastus2, and cobalt pod configs

Pod-based configurations for all three additional CI environments:
- benchmarks_ci_azure_pods.json: 6 pods, 14 runs (matches main)
- benchmarks_ci_azure_eastus2_pods.json: 2 pods, 12 runs (matches main)
- benchmarks_ci_cobalt_pods.json: 4 pods, 44 runs (matches main)

Notable pod patterns:
- Azure IDNA pods cross-use each other as load machines
- Cobalt hosted has 28-core variant pods sharing physical machines
  with full-core pods (handled by collision detection)
- Azure eastus2 pods share load/db, serialized automatically

Also fixes unicode bar chars for Windows compatibility.

* Update azure pod config: merge eastus2, keep IDNA on linux loads

Reflects main branch changes from PR #2166:
- Merged cobalt-cloud-lin pods (eastus2) into azure config
- Removed separate benchmarks_ci_azure_eastus2_pods.json
- Kept IDNA pod load profiles on linux machines (load jobs
  require linux), reverting the main branch profile change
- Added cobalt-cloud-lin-azl3-dual pod for type-2 scenarios
  (uses cobalt-cloud-lin-db as load instead of client)
- Total runs: 26 (matches main azure pipeline)

* Regenerate pipeline YAMLs from pod-scheduler configs

Generated via:
  python ./scripts/pod-scheduler/main.py --config ./build/benchmarks_ci_pods.json --template ./build/benchmarks.template.liquid --yaml-output ./build
  python ./scripts/pod-scheduler/main.py --config ./build/benchmarks_ci_azure_pods.json --template ./build/benchmarks.template.liquid --yaml-output ./build --base-name benchmarks-ci-azure
  python ./scripts/pod-scheduler/main.py --config ./build/benchmarks_ci_cobalt_pods.json --template ./build/benchmarks.template.liquid --yaml-output ./build --base-name benchmarks-ci-cobalt

* Cap timeoutInMinutes at 240 (max 2x old 120 default)

Formula is now max(120, min(240, 2 * estimated_runtime)).
This prevents scenarios with long runtimes (e.g. Proxies at 150min)
from setting unreasonably high timeouts compared to previous values.

Resulting timeouts: 120 (default), 140 (Grpc), 180 (PGO/Containers), 240 (Proxies)

* Address review feedback
- Fix 4 incorrect template filenames in benchmarks_ci_pods.json:
  crossgen-scenarios -> crossgen2-scenarios,
  custom-proxies-scenarios -> proxies-custom-scenarios,
  single-file-scenarios -> singlefile-scenarios,
  websockets-scenarios -> websocket-scenarios
- Fix machine utilization calculation bug (was inflating totals for
  machines not in current stage)
- Remove unused imports (sys, Any, Dict, json, Pod)
- Remove dead render_with_liquid function and --template CLI arg
- Add guard against empty queues (ZeroDivisionError)
- Update README and docstrings to reflect removed template arg

Code:
- Validate cron schedules at load time and raise on unsupported hour fields instead of silently no-op'ing the offset for split YAMLs
- Add optional 'timeout' override per scenario; fall back to the runtime-derived formula when absent
- Move pipeline plumbing (pool, service-bus connection/namespace) into JSON metadata.pipeline with the previous hardcoded values as defaults
- Strict validation of duplicate pods, duplicate scenario.pods entries, empty queues; default scheduler to fail-fast on unknown/invalid pod references with a --lenient opt-out
- Stricter job-id sanitization (handles '.', '/', parens, leading digits, unicode) and explicit duplicate detection in generated YAML
- Replace id(stage) bookkeeping in split_schedule with explicit indices; add stable name tie-breaker to create_schedule for deterministic output
- Use Run.job_name in the generator instead of duplicating the regex
- Drop stale '--template' arg from generated YAML headers and README

Tests:
- 41 unit + snapshot tests covering models, config loader, scheduler, generator, and YAML parity with the committed *_pods.json configs

Cleanup:
- Revert benchmarks.template.liquid and benchmarks_ci_azure.json to main; the deleted crank-scheduler does not consume them
- Regenerate all four pipeline YAMLs against the new generator

* Remove unused benchmarks.template.liquid
The Liquid template was only consumed by the deleted crank-scheduler. The pod-scheduler renders pipeline YAML directly via Python, and grep confirms no other script, pipeline, or build step reads this file.

* Remove orphaned benchmarks.yml and benchmarks.matrix.0[12].yml
These were artifacts of the old hand-driven matrix.yml -> json -> Liquid template -> benchmarks.yml workflow. Their only inbound references were stale documentation comments cross-pointing between each other; nothing in the repo (no script, no pipeline) consumed them.

* Document pod-scheduler flow across READMEs and YAML headers
- Generated YAML headers now embed the exact regen command (with the source config and base name) and a pointer to scripts/pod-scheduler/README.md, so each file documents how to reproduce itself
- New build/README.md maps each *_pods.json config to the YAML it produces, lists the hand-maintained scenario templates, and explains the typical edit/regenerate workflow
- Top-level README.md gains a 'Continuous benchmarking pipelines' section linking to the pod-scheduler and build/ docs
- pod-scheduler README's Quick Start now uses repo-root-relative commands and points at the snapshot tests for verification
- Tests cover the new _format_source_path helper and the snapshot test passes the source config so headers stay verified

* Remove orphaned crank-scheduler JSON configs

benchmarks_ci.json, benchmarks_ci_azure.json, and benchmarks_ci_cobalt.json used the old 'machines + capabilities' format consumed by the deleted crank-scheduler. Their replacements (benchmarks_ci_pods.json, benchmarks_ci_azure_pods.json, benchmarks_ci_cobalt_pods.json) drive the pod-scheduler. grep finds zero inbound references for any of the three across scripts, pipelines, docs, and tests.

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Parker Bibus <parker.bibus@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants