Gate H100.8 CI workflows behind ciflow/h100.8 label#3016
Conversation
H100.8 runners are scarce and the job queue is long. Instead of running H100 tests on every PR event, require maintainers to explicitly add the `ciflow/h100.8` label. Push-to-main and cron schedules are unaffected. - integration_test_8gpu_h100.yaml: PR trigger changed to `labeled` only, with paths-ignore for experiments/ and job-level label check - integration_test_8gpu_graph_trainer_h100.yaml: PR trigger changed to `labeled` only, with paths filter for graph_trainer/ and job-level label check
|
Warning: Unknown label
Please add the new label to .github/pytorch-probot.yml |
| @@ -1,3 +1,4 @@ | |||
| ciflow_push_tags: | |||
| - ciflow/8gpu | |||
| - ciflow/h100.8 | |||
There was a problem hiding this comment.
I am not sure what this do....
Add `synchronize` back to pull_request types so that new commits pushed to a PR with the ciflow/h100.8 label re-trigger the H100 CI. The job-level label check still prevents runs on unlabeled PRs.
|
How much do we actually save if we still have when PR merged into main ? |
Tag-based triggering ignores path filters, so revert to pull_request types [labeled, synchronize] with job-level label check. Also remove stale ciflow/8gpu/* tag trigger from h100.yaml.
fegin
left a comment
There was a problem hiding this comment.
Thanks for the improvement. Can we verify if this PR is not related to the failing CI? I don't think it is but just want confirm. Thanks!
| pull_request: | ||
| types: [opened, synchronize, reopened, ready_for_review] | ||
| types: [labeled] |
There was a problem hiding this comment.
I would propose to add on: push: tags: ciflow/h100.8/* to a dispatch tigger rather than the complex condition down below
There was a problem hiding this comment.
Claude: Tag-based triggering and path filters are incompatible — GitHub ignores
paths/paths-ignore for tag pushes. To get both label gating and path filtering,
we need to go back to the pull_request approach.
I don't know how true is this, but what I have now seems to work, so I am shipping this.
verified, not related. |
Summary
H100.8 runners are scarce and the job queue is very long, blocking PRs. This PR gates H100 CI workflows behind a
ciflow/h100.8label so they only run on PRs when a maintainer explicitly requests it.Changes to 2 workflows:
integration_test_8gpu_h100.yaml— PR trigger narrowed from[opened, synchronize, reopened, ready_for_review]to[labeled]only, with a job-levelifchecking for theciflow/h100.8labelintegration_test_8gpu_graph_trainer_h100.yaml— same trigger change, replacing the draft-PR check with theciflow/h100.8label checkTrigger behavior after this PR:
h100.yamlgraph_trainer_h100.yamlmain(merge)paths-ignore: experiments/)paths: graph_trainer/)ciflow/h100.8graph_trainer/filesNote: The
ciflow/h100.8label needs to be created in the repo settings.Test plan
ciflow/h100.8label to a PR triggers the appropriate H100 workflow(s)