flowchart TD
A[New CI job] --> B{Feasible to run\non every commit?}
B -->|No| SCHED[Scheduled]
B -->|Yes| C{Must block\nbad code from landing?}
C -->|No| D{When is the\nfeedback valuable?}
D -->|During review| PR_SA[PR standalone]
D -->|Per merged commit| PUSH_SA[Push standalone]
C -->|Yes| E{Deterministic?\nZero false positives?}
E -->|No — flaky| PR_CG[PR via ci-gate]
E -->|Yes| F{Easily reproducible\nlocally?}
F -->|No — quick job| PR_CG
F -->|No — slow job| PR_CG
F -->|Yes| MQ_CG
PR_SA --> G{Needs cache warming\non push to main/release?}
PR_CG --> G
MQ_CG --> G
G -->|Yes| PUSH_SA2[+ Push standalone\ncompile-only]
G -->|No| DONE[Done]
Jobs in ci-gate that must survive cancellation (e.g. long external calls) should use the standalone variant for that trigger instead.
push triggers below assume the protected branches: main, release/**, performance, and performance-stable.
| CI gate | Trigger | Typical duration | When to use |
|---|---|---|---|
| yes | pull_request |
< 15 min | Default for most PR checks. Lint, cross-platform builds. |
| yes | merge_group |
15–30 min | Default correctness gate. Full test suite, race tests. Must be deterministic — no flaky tests. |
| yes | push |
N/A | CI gate workflow is required for PR and merge queue. It should not run on push. |
| no | pull_request |
< 1h | Non-blocking feedback useful during review — e.g. deployment previews, flaky tests for author visibility, or jobs that must not be restarted on every push. If it's very long, make it dispatch instead. |
| no | push |
varies | Cache warming for PRs and merge queues. Actions that occur only on the finalized commit, like deploying docs, publishing artifacts, triggering downstream pipelines. |
| no | schedule |
1–4+ hours | Very long-running suites, flaky test discovery by repetition, QA regression runs. Not feasible on every commit. |
| no | workflow_dispatch |
> 1h | Rare, or long-running jobs triggered manually for specific cases. |
When a workflow appears in both ci-gate rows and standalone rows, the ci-gate path handles pull_request/merge_group via workflow_call. Standalone triggers such as push/schedule may live either in that workflow's own file or in a separate top-level workflow that calls the reusable workflow (for example, centralized cache warming) — both patterns share one job definition.
CI gate (ci-gate) is the workflow required to pass for merging PRs. It is required for passing pull requests and merge groups.
If a job uses if: ${{ !github.event.pull_request.draft }} to skip on draft PRs, the
workflow must include ready_for_review in its pull_request event types:
on:
pull_request:
types:
- opened
- reopened
- synchronize
- ready_for_reviewWithout ready_for_review, converting a draft PR to ready-for-review fires an event
the workflow doesn't subscribe to, so the job never runs — the PR appears to have
skipped CI until the next push.
Merge queue checks are the correctness gate. Their contract is strict:
- A failure means the code is wrong. If something fails in the merge queue, merging it would break the target branch. There are no false positives.
- A pass means the code is correct. Code that passes the merge queue must not then
fail on
mainor a release branch. There are no false negatives.
This means only deterministic, reproducible checks belong here. A flaky test does not belong in the merge queue — by definition it cannot provide a reliable signal.
Belongs here:
- The full unit and integration test suite (things developers can reproduce with
make test-allor equivalent) - Race detector tests (
-race), which are expensive but deterministic - Any check where a failure must block the merge
Does not belong here:
- Flaky tests — they produce false positives and erode trust in the gate
- Checks that are not locally reproducible (those belong in the PR bucket)
GitHub's merge queue supports running jobs multiple times before deciding on a result. This can mask flakiness in the short term but is not a long-term solution — it increases queue latency and still occasionally lets flaky failures block the queue. The right fix is to move flaky tests to the PR bucket and fix them there.
Merge queue batching — multiple PRs can be grouped into a single merge-group run, reducing CI cost proportional to batch size. This works well when the merge queue gate is fast and reliable; a flaky gate undermines batching by causing entire batches to be re-queued.
Due to GitHub Actions cache scoping by branch, caches for use in PR and merge queue jobs should be made available from the base branch. This means running parts of jobs that allow caches to be generated on a push trigger to those branches.
Scheduled jobs cover work that is too long-running or too expensive to attach to every commit, or that is specifically designed to discover problems through repetition.
Belongs here:
- Very long-running test suites (multi-hour jobs) where per-commit triggering is not feasible
- Repeated runs of the test suite to surface flaky tests by statistical observation
- QA and regression detection workflows that aggregate results over time
Go's test cache keys each package result by the compiled test binary hash plus the
mtime/size of every file the test opens at runtime. A single file with a wrong mtime
invalidates the cache for the entire package.
Main repo fixtures — run git restore-mtime over testdata paths so each file's
mtime equals the commit that last modified it. This is deterministic and
content-sensitive.
Submodule fixtures — shallow clones (--depth 1) don't have enough history for
git restore-mtime. Instead, set every file's mtime to the submodule's HEAD commit
timestamp. All files in a submodule share the same mtime, which is stable as long as
the pinned commit doesn't change.
Directory mtimes — normalize all directories to a fixed epoch (e.g.
200902132331.30). Git doesn't track directory mtimes, so without this step they
reflect checkout time, which varies between runs.
Go caches test results at the package level. A package with hundreds of test cases that reads many fixture files will miss the cache if any fixture changes. Splitting it into sub-packages (one per fixture set) means unrelated fixtures don't cause unnecessary re-runs.
This is also why the execution/tests/ package is broken into focused sub-packages
rather than one monolithic package.
Some test packages allocate large databases or hold many files open simultaneously. Running too many of them in parallel can exhaust RAM or IOPS and cause OOM kills or spurious timeouts.
Use -p to limit the number of packages tested in parallel (default: GOMAXPROCS),
and -parallel to limit concurrency within a single package (default: GOMAXPROCS):
# At most 2 packages at a time, at most 4 subtests in parallel within each
go test -p 2 -parallel 4 ./...These flags can be passed via GO_FLAGS in the Makefile:
make test-all GO_FLAGS="-p 2 -parallel 4"The purpose of make test-bench in CI is to verify that benchmarks compile and
execute at least one iteration — not to produce meaningful performance numbers.
Benchmarks sized for profiling work can take minutes per iteration. Use
testing.Short() to trim parameter sweeps in CI:
if testing.Short() {
totalSteps = 10 // instead of 200+
keyCount = 10_000 // instead of 1_000_000
}make test-bench passes -short so these guards are active in CI.
go test forces benchmark packages to run serially regardless of the -p flag —
the serialization is enforced at the action-graph level when -bench is set. Reducing
per-iteration work (via testing.Short()) is the only effective way to reduce wall time.
Every CI job should have a local equivalent. If you're changing code in execution/,
you should be able to run the corresponding test group locally
(make test-group TEST_GROUP=execution-tests) and get the same result as CI.
CI workflows should use the same Makefile targets that developers use locally rather than inline shell commands. This keeps CI and local runs as similar as possible.
Re-run with debug logging:
gh run rerun <run-id> --debugOr in the UI: "Re-run jobs" → enable "Debug logging".
Raw logs include per-line timestamps useful for profiling slow steps:
gh run view <run-id> --log