Feat: Signed rebase, stuck DCO recovery, and per-repo dispatch lock#277
Open
ModeSevenIndustrialSolutions wants to merge 13 commits into
Open
Feat: Signed rebase, stuck DCO recovery, and per-repo dispatch lock#277ModeSevenIndustrialSolutions wants to merge 13 commits into
ModeSevenIndustrialSolutions wants to merge 13 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a local git-based rebase path intended to preserve commit verification when updating “behind” PR branches, and (stacked from #270) expands the merge flow to enable auto-merge + wait (Step 5.5) for PRs that are temporarily blocked/behind/unstable while required checks settle.
Changes:
- Add a signature-preserving “local clone + rebase + force-with-lease push” option (
--rebase-local/--no-rebase-local) and gating logic for when to use it. - Add/extend Step 5.5 auto-merge enablement + bounded wait behavior, including a parallel merge wait-status ticker.
- Add substantial test coverage for the new gating/dispatch behavior and Step 5.5 routing outcomes.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
src/dependamerge/merge_manager.py |
Implements local-rebase gating + local git workflow; adds Step 5.5 wait/ticker plumbing and refines auto-merge/skip-gate behavior. |
src/dependamerge/cli.py |
Wires the new --rebase-local/--no-rebase-local option into the merge context and manager construction. |
tests/test_local_rebase_signing.py |
Adds new unit/integration-style tests covering the local-rebase gate, dispatch, and URL auth helper. |
tests/test_auto_merge.py |
Updates and extends Step 5.5/skip-gate tests for blocked/behind/unstable behavior and short timeouts. |
tests/test_behind_pr_handling.py |
Updates expectations for behind + --no-fix to allow Step 5.5 routing. |
scripts/demo_wait_ticker.py |
Adds a manual demo harness for the wait-status ticker behavior without GitHub API calls. |
docs/TESTING_WAIT_STATUS_TICKER.md |
Documents manual verification options and recommends the new demo script. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
006273d to
2e4f068
Compare
This was referenced May 12, 2026
When a PR is `behind` the base branch, dependamerge previously
called the GitHub REST `PUT /repos/{owner}/{repo}/pulls/{n}/
update-branch` endpoint to bring it up to date. That endpoint
creates a server-side merge commit whose committer is the calling
token's GitHub user — and that committer is not signed with the
user's local SSH/GPG key. On repositories whose branch protection
requires verified signatures, the resulting commit loses its
`Verified` badge and the PR becomes un-mergeable until a human
intervenes.
`pre-commit-ci[bot]` PRs are particularly affected because that
bot has no comment macro for recreating a PR with a re-signed
commit (pre-commit-ci/issues#41), so any
pre-commit-ci PR we break this way is stuck until somebody closes
it manually.
This commit adds an opt-out (`--no-rebase-local`) local-git rebase
path that:
- Clones the head repo into a secure temp workspace using the
existing `git_ops` helpers (the same ones the
`dependamerge blocked --fix` flow already uses for interactive
conflict resolution).
- Fetches the base branch (from upstream when the PR is from a
fork, from origin otherwise).
- Runs `git rebase` non-interactively. Because we shell out to
the user's `git` binary, the user's `~/.gitconfig`
(`commit.gpgsign`, `gpg.format`, `user.signingkey`) is honoured
automatically and rebased commits remain verified.
- Force-pushes-with-lease back to the head repo.
- Cleans up the workspace.
A new `_should_use_local_rebase()` gate decides between this
local path and the legacy REST path. The local path activates
when:
- `rebase_local` is True (CLI default; `--no-rebase-local`
opts out), AND
- either the PR author is `pre-commit-ci[bot]` (always — no
recovery macro available), OR
- `requires_commit_signatures()` returns True for the base
branch (classic protection or rulesets) AND
`check_pr_commit_signatures()` confirms the current PR head
is verified (so REST update-branch *would* break it).
The gate uses `is True` rather than truthy comparison so
`AsyncMock` defaults can't accidentally route real PRs into the
network-touching local-rebase path in unit tests. If the
signature requirement check raises, we fail safely to the REST
path. If the PR head is not currently verified, REST
update-branch can't make verification any worse, so we use the
cheaper REST path.
When the gate says use_local, Step 5 ALWAYS skips REST
`update-branch` — even when the local rebase fails (no `git` on
PATH, conflict during rebase, network error, push rejected).
Falling back to REST in the failure case would defeat the
feature's entire purpose. Instead, we mark the PR as having been
through Step 5 (so Step 5.5 doesn't double the configured
`merge_timeout`) and let Step 5.5 enable auto-merge — auto-merge
produces a properly verified merge commit signed by GitHub
itself when checks pass.
CLI:
- New `--rebase-local` / `--no-rebase-local` flag on `merge`
(default: enabled). The help text explains the trade-off
between signature preservation and the per-PR clone cost.
Tests (15 new in `tests/test_local_rebase_signing.py`):
- `TestShouldUseLocalRebaseGate` (8 tests): pin every branch of
the gating decision tree — `--no-rebase-local`, pre-commit-ci,
signed base + signed head, signed base + unsigned head,
unsigned base, signature check failure, no GitHub client, and
the `is True` strictness check that prevents AsyncMock leakage.
- `TestStep5DispatchLocalRebase` (4 tests): pin the integration
invariant that REST `update_branch()` is NEVER called when
`use_local` is True, regardless of whether the local rebase
succeeds or fails. Also pins backward compatibility for
unsigned-base PRs and the `--no-rebase-local` opt-out.
- `TestAuthedCloneUrl` (3 tests): pin token-injection behaviour
for HTTPS / SSH / git:// URLs.
Existing tests:
- 877/877 pre-existing tests still pass (no regressions).
- Tightened the existing
`test_actual_run_performs_rebase_for_behind_pr`
was already passing without modification thanks to the
`is True` strictness check on `requires_commit_signatures` —
AsyncMock defaults no longer accidentally route into the local
path.
Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: Matthew Watkins <mwatkins@linuxfoundation.org>
The local-rebase helper imports create_secure_tempdir via the git_ops module namespace already (git_ops.create_secure_tempdir(...)), so the duplicate import in the destructuring tuple is unused and trips ruff's F401 check. Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Matthew Watkins <mwatkins@linuxfoundation.org>
Moves every code path dependamerge uses to bring a behind PR up
to date with its base branch into a new ``src/dependamerge/rebase.py``
module:
- ``should_use_local_rebase()`` (was ``_should_use_local_rebase``)
- ``local_rebase_pr()`` (was ``AsyncMergeManager._local_git_rebase_pr``)
- ``authed_clone_url()`` (was ``AsyncMergeManager._authed_clone_url``;
also duplicated as ``FixOrchestrator._authed_url``, deduplicated
in a follow-up)
- ``perform_step5_rebase()`` (new top-level dispatcher)
- ``rest_rebase_and_poll()`` + post-rebase polling helpers (was
inline in ``_merge_single_pr`` Step 5)
Step 5 in ``_merge_single_pr`` is now a ~25-line dispatcher call
to ``rebase.perform_step5_rebase(ctx=..., ...)``, replacing the
~280-line inline block that prompted basedpyright's
``Code is too complex to analyze`` warning. The warning is
resolved structurally rather than squelched.
Design notes:
- The dispatcher takes a ``RebaseContext`` dataclass (logger,
console, GitHub client, callbacks, configuration, mutable
``_rebased_prs`` set) instead of a full ``AsyncMergeManager``
reference. This keeps the rebase logic decoupled from manager
internals and lets the new tests exercise it without standing
up the whole merge pipeline.
- The post-rebase polling loop is split into:
- ``_poll_post_rebase()`` — runs the loop, updates
``pr_info.head_sha`` from refresh payloads.
- ``_poll_should_continue()`` — pure per-state decision
function. The four ``mergeable_state`` cases (``clean``,
``behind``, ``blocked``, ``None``) become four short
branches instead of nested ``if check_attempt < ...``
blocks.
- ``_log_blocked_timeout()`` — extracted user-facing log line.
- ``_log_post_rebase_status()`` — extracted final status line.
- Local imports of ``git_ops`` symbols are now top-level since
this whole module is rebase-related; no need to defer the cost
to the call site any more.
- Local rebase signature-preservation invariant is preserved:
when ``should_use_local_rebase`` returns True, REST
``update-branch`` is *never* called regardless of whether the
local rebase succeeds or fails, so we never destroy a verified
signature.
Tests:
- Updated ``tests/test_local_rebase_signing.py`` to call the new
module-level functions instead of the removed manager methods.
- ``rebase_module.should_use_local_rebase(...)`` replaces
``mgr._should_use_local_rebase(...)``.
- ``patch("dependamerge.rebase.local_rebase_pr", ...)`` replaces
``patch.object(mgr, "_local_git_rebase_pr", ...)``.
- ``rebase_module.authed_clone_url(...)`` replaces
``AsyncMergeManager._authed_clone_url(...)``.
- All 15 tests still pass; 892/892 pass across the full suite.
Tooling:
- ruff: clean (auto-fixed import-from-collections-abc).
- mypy: clean.
- basedpyright: 0 errors, 0 warnings (the complexity warning is
gone).
Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: Matthew Watkins <mwatkins@linuxfoundation.org>
Three substantive fixes plus a missing test, all on the new ``rebase`` module added in 06e7801. 1. **Local-rebase path now enables auto-merge.** Previously, marking ``_rebased_prs`` after a local rebase made Step 5.5 skip the PR (its ``base_should_wait`` predicate excludes already-rebased PRs to avoid doubling ``merge_timeout``). Without auto-merge enablement, Step 6's skip gate also failed to fire (it requires ``pr_key in _auto_merge_enabled``), so the flow fell through to a manual merge attempt that 405'd against pending checks \u2014 exactly the failure mode this whole feature exists to prevent. ``_run_local_path()`` now calls ``ctx.enable_auto_merge()`` immediately after the local rebase returns, regardless of whether the rebase itself succeeded or failed. On success, GitHub recomputes mergeability against the new head and auto-merge fires when checks pass. On failure, auto-merge stands ready to fire when the PR is brought up to date through some other channel (third-party rebase, human update). In both cases Step 6 sees ``pr_key in _auto_merge_enabled`` and routes correctly to ``AUTO_MERGE_PENDING``. 2. **``check_pr_commit_signatures()`` failures now fail closed.** Previously the gate returned ``True`` (use local path) when the PR-head signature check raised, on the rationale that we couldn't see whether the signature needed preserving. That conflicts with the documented gate ("base requires signatures AND PR head is verified") and triggers network- touching local clones on transient API errors. Now returns ``False`` (use REST). When verification isn't established, REST update-branch can't make things any worse than they already are, and we avoid the unintended side effect of cloning + force-pushing on transient errors. 3. **``authed_clone_url()`` docstring corrected.** The previous wording claimed the token "never appears in command-line arguments", but injecting the token into the HTTPS clone URL means it *does* appear in the URL passed to ``git`` \u2014 visible to ``ps`` / ``/proc/<pid>/cmdline`` for the duration of the git invocation. Log output is separately redacted by ``git_ops._redact``, but no equivalent protection exists for ``ps``-style introspection. New docstring states the constraint accurately and suggests SSH keys / credential helpers as alternatives for callers that need stronger guarantees. 4. **Added ``test_pr_signature_check_failure_uses_rest``** covering ``check_pr_commit_signatures()`` raising. The existing ``test_signature_check_failure_uses_rest`` only covered ``requires_commit_signatures()`` raising, which is a distinct branch in the gate. Updated ``test_pre_commit_ci_pr_skips_rest_update_branch_on_failure`` to assert the new (correct) outcome: the PR ends up ``AUTO_MERGE_PENDING`` rather than ``MERGED``, and is in ``_auto_merge_enabled``. The previous expectation was that the manual merge would proceed anyway (because the test mocked it to succeed); the new behaviour properly defers to auto-merge. All 893 tests pass (was 892, +1 new). ruff / mypy / basedpyright clean. Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Matthew Watkins <mwatkins@linuxfoundation.org>
Three more substantive fixes following the second review pass. 1. **Dead `try/except GitError` around `git_ops.rebase()`.** ``git_ops.rebase()`` invokes ``run_git`` with ``check=False``, so a non-zero exit code does *not* raise ``GitError`` \u2014 the surrounding ``try/except GitError`` was dead code. Removed it. The non-zero-returncode path is now the sole error handler, and the debug log now includes ``stderr`` and ``stdout`` from the failed rebase so anyone investigating can see the actual cause (conflicts, corrupt index, invalid ref, etc.) rather than always guessing "conflicts". 2. **Unused ``fix_out_of_date`` field on ``RebaseContext``.** The field was carried into the dataclass but never read in ``rebase.py``; the call-site gate (``mergeable_state == "behind" and self.fix_out_of_date``) lives in ``_merge_single_pr``, which only constructs a ``RebaseContext`` when the gate already passed. Dropped the field from both the dataclass and the construction site in ``merge_manager.py``. 3. **Test convention violation: passing ``mgr._github_client`` instead of the typed ``client`` mock returned by ``_make_mgr()``.** ``tests/conftest.py`` documents the typed- mock pattern: tests should configure and reference the returned ``client`` AsyncMock to keep mock usage consistent and avoid optional-member-access warnings. Updated all eight gate tests to pass ``github_client=client`` instead. The one intentionally-None case (``test_no_github_client_returns_false``) now passes ``github_client=None`` directly rather than nulling out ``mgr._github_client`` and then dereferencing it. All 893 tests still pass; ruff, mypy, and basedpyright remain clean. Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Matthew Watkins <mwatkins@linuxfoundation.org>
Two doc-only fixes: - rebase.py module docstring referenced a rest_rebase_and_poll function that doesn't exist (the REST path is implemented by the private _run_rest_path + _poll_post_rebase helpers). Replaced with an accurate description that names the actual public dispatcher (perform_step5_rebase) and explains how it delegates to the private helpers. - tests/test_local_rebase_signing.py module docstring and section header referenced _should_use_local_rebase (the old manager method name) instead of should_use_local_rebase (the new module-level function). Updated both. No behaviour change. All 893 tests pass; ruff clean. Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Matthew Watkins <mwatkins@linuxfoundation.org>
Two related Copilot comments flagged the same issue: when local rebase succeeds but auto-merge cannot be enabled (repo doesn't allow it, branch protection blocks it, etc.), unconditionally marking ``_rebased_prs`` made Step 5.5 skip the PR. Step 6 then fired a manual merge while GitHub was still recomputing mergeability after the force-push, which 405'd transiently \u2014 exactly the failure mode this whole feature exists to prevent. The fix: - Capture ``enable_auto_merge``'s return value (previously discarded). Naming the result ``auto_merge_ok``. - Only add the PR to ``ctx.rebased_prs`` when ``auto_merge_ok`` is True. When auto-merge is unavailable, leave the marker unset so Step 5.5 still runs its bounded poll loop \u2014 GitHub gets time to settle and Step 6 can act against a stable state. The signature-preservation invariant is unchanged: REST ``update-branch`` is still never called from the local-rebase path, regardless of any outcome. Updated docstring to spell out the linked decision: > Auto-merge enablement and ``_rebased_prs`` marking are linked: > > - If auto-merge gets enabled, mark ``_rebased_prs`` so Step > 5.5 skips this PR; auto-merge will handle the bounded wait > server-side and Step 6's skip gate routes to > ``AUTO_MERGE_PENDING``. > - If auto-merge cannot be enabled (repo doesn't allow it, > branch protection blocks it, etc.), do not mark > ``_rebased_prs`` so Step 5.5 still runs its bounded poll > loop. Without this, GitHub may still be recomputing > mergeability after the force-push when Step 6 fires and a > manual merge attempt would 405 transiently. Tests: - New ``test_auto_merge_unavailable_leaves_pr_unmarked`` pins the new branch: ``enable_auto_merge`` returns False, REST ``update-branch`` still doesn't fire, and the PR is NOT in ``_rebased_prs``. - All 894 tests pass; ruff clean. Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Matthew Watkins <mwatkins@linuxfoundation.org>
Two more substantive fixes from the latest review. 1. **Fail closed when head repo identity is unknown.** ``local_rebase_pr()`` previously synthesised ``head_clone_url`` from ``base_full`` when both ``head_repo_clone_url`` and ``head_repo_full_name`` were missing. ``PullRequestInfo`` objects from the merge workflow don't populate those fields (they're only set in the interactive fix flow), so for a fork PR this would have force-pushed the rebased branch to the *base* repo instead of the fork \u2014 a dangerous mis-target that could create or overwrite a branch on someone else's repository. Fix: when both fields are unset, return False immediately so the caller falls through to the auto-merge path. The existing tests already populate both fields, so this is a safety improvement against the production state, not a change in observed behaviour for code paths that today reach ``local_rebase_pr()``. 2. **Clamp wait-loop sleep duration to non-negative.** The Step 5.5 wait loop computed ``remaining = wait_deadline - time()`` after the ``while ... < wait_deadline`` check. The two reads aren't atomic, so a near-deadline crossing could produce a tiny negative ``remaining`` which would raise ``ValueError`` from ``asyncio.sleep`` and abort the merge worker. Fix: clamp via ``max(0.0, ...)`` so ``asyncio.sleep`` always gets a non-negative argument. Worst case: a no-op sleep right before the loop exits naturally on the next iteration. Tests: - New ``test_missing_head_repo_identity_returns_false`` pins the new fail-closed branch in ``local_rebase_pr()``. 895/895 tests pass; ruff clean. Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Matthew Watkins <mwatkins@linuxfoundation.org>
Two related Copilot comments addressed: 1. **Populate head/base repo metadata in PullRequestInfo constructors** so the local-rebase fail-closed gate (added in d36d6e7) doesn't unnecessarily refuse the local path for real-world PRs. Before this change, ``PullRequestInfo`` objects from the merge workflow had ``head_repo_full_name``, ``head_repo_clone_url``, ``base_repo_full_name``, ``base_repo_clone_url``, and ``is_fork`` left unset (those fields were only populated in the interactive ``blocked --fix`` flow). The fail-closed gate would then refuse the local path even for non-fork PRs that could safely be rebased. Two constructor sites updated: - ``GitHubClient.get_pull_request_info`` (REST source PR): pulls the fields straight out of the existing ``head.repo`` / ``base.repo`` substructures of the REST PR payload. - ``GitHubService.to_pull_request_info`` (GraphQL similar PRs): added ``headRepository`` / ``baseRepository`` queries to ``REPO_OPEN_PRS_PAGE`` and ``ORG_REPOS_WITH_OPEN_PRS``, then propagated the response fields. The GraphQL ``url`` field doesn't include the ``.git`` suffix that the REST ``clone_url`` does, so we synthesise the canonical ``.git`` form for parity. 2. **Refactor ``_make_mgr()`` to wrap the shared ``tests.conftest.make_merge_manager`` helper** instead of reimplementing it. The repo's test guidelines centralise the typed-mock convention there, and reimplementing it diverged the ``token`` default and the ``mgr._github_client`` assignment style. The new wrapper adds only the ``rebase_local=True`` default specific to this test module. All 895 tests pass; ruff clean. The fail-closed gate added in d36d6e7 remains in place as a defensive guard against any future code path that constructs a ``PullRequestInfo`` without populating the head/base repo fields. Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Matthew Watkins <mwatkins@linuxfoundation.org>
Previously the function caught all exceptions and returned (True, []) on error, on the rationale that fail-open avoids blocking merges unnecessarily. That default was wrong for the new signature-preservation gate in ``rebase.py``, which interprets ``all_verified=True`` as a positive confirmation that the PR head is signed. An API failure could route a real PR into the network-touching local-rebase path even though we couldn't actually confirm anything about the signature state. Two callers, two correct behaviours: - ``_trigger_dependabot_recreate``: already wraps in ``try/except`` and returns ``None`` (don't recreate) on error \u2014 the existing handler matches a fail-safe interpretation. - ``should_use_local_rebase``: wraps in ``try/except`` and returns `(False, ...)`` (fail-closed) on error \u2014 already added in d36d6e7 to match the documented gate invariant. So both call sites now do the right thing for their own semantics, and neither relies on the conflated default. Updated ``test_api_error_returns_verified`` \u2192 ``test_api_error_propagates`` to assert the new contract: the exception surfaces to callers via ``pytest.raises``. 895/895 tests pass. Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Matthew Watkins <mwatkins@linuxfoundation.org>
Two more fixes from the latest review pass.
1. **Fork detection no longer collapses head_full to base_full
too early.** The previous logic did ``head_full = head_full or
base_full`` *before* the fork comparison
``(head_full or base_full) != base_full``, which always
evaluated False whenever head_full was missing - incorrectly
treating fork PRs as non-forks. The base branch would then be
fetched from origin (the fork) instead of upstream, and the
fork-vs-non-fork remote routing got it backwards.
New decision order, all defensive:
1. ``pr_info.is_fork`` from the API (most reliable)
2. Direct full_name comparison
3. Direct clone_url comparison
4. Default to non-fork
The ``head_full = head_full or base_full`` collapse now
happens *after* ``is_fork`` is computed, so the fallback
only affects clone-URL synthesis (safe - guarded by the
missing-head_repo identity check that fails closed earlier).
2. **Unshallow + retry on rebase failure.** The default depth=50
shallow clone can miss the merge base for PRs whose branch
point is older than 50 commits, causing ``git rebase`` to fail
with a non-zero exit. Since this code path deliberately doesn't
fall back to REST update-branch, those PRs would be stuck
forever on the old code.
New recovery: when the first rebase exits non-zero, abort the
rebase, unshallow both remotes (origin always; upstream when
is_fork), and retry once. If that also fails, the cause is
genuine (conflicts, corrupt index, etc.) and we return False
as before. The cost of the unshallow only hits us on the
rare PRs where the shallow depth wasn't enough, not the
common case.
895/895 tests pass; ruff clean.
Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: Matthew Watkins <mwatkins@linuxfoundation.org>
Real-world failure on lfreleng-actions/lftools-uv#342 \u2014 a dependabot PR sat blocked on a DCO check that GitHub never completed. DCO normally finishes in seconds; when it stalls the only reliable recovery is to ask dependabot to recreate the PR so the check fires again on a fresh head SHA. This class of failure previously required manual intervention. Adds a stuck-check detector and routes it into the existing `@dependabot recreate` path so the recovery is automatic. Detection (`AsyncMergeManager._detect_stuck_dco`): * Examines both the modern `/check-runs` API and the older `/commits/{sha}/status` contexts (some DCO bots use the status API). * Matches DCO-shaped check names: `DCO`, `dco/dco`, `dcobot`, any `dco/...`, and any name containing `signoff` / `sign-off` / `signed-off`. * Treats a check as stuck only when: - status is `queued` / `in_progress` / `pending`, AND - the check has been pending \u2265 60s (new module constant `STUCK_DCO_THRESHOLD_SECONDS`), AND - the PR itself was created \u2265 60s ago AND last updated \u2265 60s ago. * The PR-level age + idle floors prevent false positives on PRs caught immediately after open or force-push, where the DCO check is simply running normally. * All API failures degrade to `(False, None, 0.0)` so a network blip cannot fire a recreate. Wiring (`AsyncMergeManager._merge_single_pr`): The existing recreate-eligibility check (which fired only on "branch protection" failure summaries) now also fires when `_detect_stuck_dco` returns True for a dependabot PR, routing through the same `_trigger_dependabot_recreate` flow. That path already posts the `@dependabot recreate` comment, polls for the old PR to close, finds the replacement, and waits for its checks to land \u2014 so the "park in the background while other PRs continue" behaviour comes for free. The new detection log line is printed with `markup=False` so a check name containing characters that look like Rich markup (e.g. `[required]`) is not silently swallowed by the console parser. Tests (`tests/test_stuck_dco_detection.py`, 10 new): * `TestDetectStuckDcoCheckRuns` (6): positive detection, sub-threshold age, completed check, non-DCO check name, PR-level age floor, `dco/dco` name variant. * `TestDetectStuckDcoStatusContexts` (1): detection via the older status API. * `TestDetectStuckDcoRobustness` (3): PR fetch failure, unparseable timestamps, no client \u2014 each must return the safe default. Full suite: 905/905 pass (892 baseline + 13 new). Co-authored-by: Claude <claude@anthropic.com> Signed-off-by: Matthew Watkins <mwatkins@linuxfoundation.org>
In repo-scoped runs the worker concurrency was forced to 1 to avoid races between back-to-back merge_pull_request calls on the same repo. That made any PR sitting in Step 5.5's auto-merge wait loop head-of-line block every other PR in the batch with the default 5-minute merge_timeout, a single slow PR could stall a batch of N PRs for up to N x 5 minutes. This commit splits the two concerns: * Workers run in parallel again (default cap of 5 for repo runs, capped by PR count). * The actual merge_pull_request API call is serialised per repository via a new _get_merge_dispatch_lock(owner, repo) helper. Workers on the same repo queue on the lock; workers on different repos hold distinct locks and dispatch in parallel. * Approve, rebase polling, Step 5.5's auto-merge wait, and DCO status polling all run outside the lock, so a PR parked waiting for required checks no longer blocks anything else. Wiring: * New self._merge_dispatch_locks: dict[str, asyncio.Lock] on AsyncMergeManager, guarded by _merge_dispatch_locks_lock for safe lazy creation. * New async helper _get_merge_dispatch_lock(owner, repo). * Both merge_pull_request call sites in _merge_single_pr (main path and recreated-PR path) now wrap the call in async with await self._get_merge_dispatch_lock(...). * cli.py's two repo-scoped _run_parallel_merge call sites raise concurrency=1 to min(5, len(prs)) or 1, with the rationale recorded inline. Tests (tests/test_merge_dispatch_lock.py, 5 new): * TestMergeDispatchLockIdentity (3): same repo returns the same lock, different repos return distinct locks, and concurrent first-acquire on the same key returns one shared lock (proving the outer lazy-init guard works). * TestMergeDispatchLockSerialisation (2): same-repo workers serialise (peak == 1), different-repo workers run in parallel (peak == N). Full suite: 910/910 pass (905 baseline + 5 new). Co-authored-by: Claude <claude@anthropic.com> Signed-off-by: Matthew Watkins <mwatkins@linuxfoundation.org>
f6202a0 to
7a86dea
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three related recovery / throughput fixes for repo-scoped batch
merges:
Signature-preserving local rebase — when GitHub's REST
update-branchwould break commit verification (because theresulting commit is signed by the calling token's user, not the
user's local SSH/GPG key), dependamerge now clones, rebases,
and force-pushes locally instead.
Stuck DCO recovery — when a DCO check has been pending for
≥ 60s on a dependabot PR that itself was created and last
updated ≥ 60s ago, dependamerge now treats the check as stuck
and routes through the existing
@dependabot recreateflowinstead of failing. Previously this required manual
intervention (real example: Chore: Bump lfit/releng-reusable-workflows/.github/workflows/reuse-sonarqube-cloud.yaml from 0.3.3 to 0.3.4 lfreleng-actions/lftools-uv#342).
Decouple wait from dispatch — repo-scoped runs no longer
force
concurrency=1. The actualmerge_pull_requestAPIcall is serialised per repository via a new dispatch lock,
while approve / rebase / Step 5.5 wait loops run in parallel.
A single PR parked waiting for required checks no longer
head-of-line blocks the rest of the batch.
All three plug into existing background-park infrastructure, so
recovery happens without blocking other PRs in the worker pool.
1. Local rebase (
Feat(merge))Motivation
Real-world failure on
lfreleng-actions/gerrit-review-action#98:pre-commit-ci[bot]autoupdate PR was behind base.PUT /repos/.../pulls/98/update-branch.user — that identity isn't registered at GitHub.
Verifiedbadge → branch protection (whichrequires verified signatures) blocked the merge.
pre-commit-ci[bot]has norecreate/rebasemacro(pre-commit-ci/issues#41),
so the PR was stuck until manually closed.
Approach
Two new helpers on
AsyncMergeManager:_should_use_local_rebase()→(use_local, reason).Activates when
rebase_localis enabled (CLI default) ANDeither:
pre-commit-ci[bot](no recovery macro), ORrequires_commit_signatures()isTrueAND
check_pr_commit_signatures()confirms the current PRhead is verified.
_local_git_rebase_pr()— secure tempdir + clone + fetchgit rebase+git push --force-with-lease, using existinggit_opshelpers. Workspace cleaned up infinally.When the gate says
use_local, RESTupdate_branch()isnever called — even on local-rebase failure. Falling back
would defeat the feature.
CLI
New
--rebase-local/--no-rebase-localflag onmerge(default: enabled).
2. Stuck DCO recovery (
Fix(merge))Motivation
Real-world failure on
lfreleng-actions/lftools-uv#342— adependabot PR sat blocked on a DCO check that GitHub never
completed. DCO normally finishes in seconds; when it stalls,
the only reliable recovery for dependabot PRs is to ask
dependabot to recreate the PR so the check fires again on a
fresh head SHA.
Approach
New
AsyncMergeManager._detect_stuck_dco(pr_info)returning(is_stuck, check_name, age_seconds):/check-runsAPI and the older/commits/{sha}/statuscontexts.DCO,dco/dco,dcobot, anydco/..., plus any name containingsignoff/sign-off/signed-off.queued/in_progress/pending, ANDSTUCK_DCO_THRESHOLD_SECONDS(60s, new module constant), AND
ago.
caught immediately after open or force-push.
(False, None, 0.0).Wiring: the existing recreate-eligibility check in
_merge_single_pr(which fired only onbranch protectionfailure summaries) now also fires when
_detect_stuck_dcoreturns True for a dependabot PR, routing through the same
_trigger_dependabot_recreateflow. The detection log lineuses
markup=Falseso a check name containing[required]-stylebrackets is not silently swallowed by Rich's markup parser.
3. Decouple wait from dispatch (
Fix(merge))Motivation
Real-world output from a 3-PR batch where one PR was waiting on
required checks:
PRs #2 and #3 couldn't even start because the worker concurrency
was forced to
1for repo-scoped runs (to avoid races betweenback-to-back merges on the same repo). With the default 5-minute
merge_timeout, a single slow PR stalled the entire batch for upto N × 5 minutes.
Approach
Split the two concerns the old
concurrency=1was conflating:runs, capped by PR count).
merge_pull_requestAPI call is serialised perrepository via a new
AsyncMergeManager._get_merge_dispatch_lock(owner, repo)helper. Workers on the same repo queue on the lock; workers
on different repos hold distinct locks and dispatch in
parallel.
status polling all run outside the lock, so a PR parked
waiting for required checks no longer blocks anything else.
The
cli.pyrepo-scoped_run_parallel_mergecall sites raiseconcurrency=1tomin(5, len(prs)) or 1, with the rationalerecorded inline.
Tests
Local rebase — 15 tests in
tests/test_local_rebase_signing.py.Stuck DCO — 10 tests in
tests/test_stuck_dco_detection.py:detection via
check-runsand the older status API, sub-thresholdage, completed check, non-DCO name, PR age floor, name variants,
and three robustness paths (PR fetch failure, unparseable
timestamps, no client).
Dispatch lock — 5 tests in
tests/test_merge_dispatch_lock.py:TestMergeDispatchLockIdentity(3): same repo returns thesame lock, different repos return distinct locks, concurrent
first-acquire on the same key returns one shared lock
(proving the outer lazy-init guard works).
TestMergeDispatchLockSerialisation(2): same-repo workersserialise (
peak == 1), different-repo workers run inparallel (
peak == N).Full suite: 910/910 pass (892 baseline + 18 new across the
three features).
Limitations
Local rebase: requires
giton PATH. Per-PR shallow clonehas a small disk + network cost (sub-second for typical
autoupdate PRs). Concurrent rebases get separate temp
workspaces.
Stuck DCO: 60s threshold is fixed; could become a CLI flag
later if needed. Detection requires the check name to match
the allow-list — uncommon DCO bot variants would need the
matcher extended.
Dispatch lock: the per-repo lock is per-process. Multiple
concurrent dependamerge invocations against the same repo
could still race; this is no worse than the previous behaviour
and is generally avoided by GitHub's own rate limiting.