Skip to content

Feat: Signed rebase, stuck DCO recovery, and per-repo dispatch lock#277

Open
ModeSevenIndustrialSolutions wants to merge 13 commits into
lfit:mainfrom
modeseven-lfit:fix/preserve-signatures-on-rebase
Open

Feat: Signed rebase, stuck DCO recovery, and per-repo dispatch lock#277
ModeSevenIndustrialSolutions wants to merge 13 commits into
lfit:mainfrom
modeseven-lfit:fix/preserve-signatures-on-rebase

Conversation

@ModeSevenIndustrialSolutions
Copy link
Copy Markdown
Contributor

@ModeSevenIndustrialSolutions ModeSevenIndustrialSolutions commented May 6, 2026

Summary

Three related recovery / throughput fixes for repo-scoped batch
merges:

  1. Signature-preserving local rebase — when GitHub's REST
    update-branch would break commit verification (because the
    resulting commit is signed by the calling token's user, not the
    user's local SSH/GPG key), dependamerge now clones, rebases,
    and force-pushes locally instead.

  2. Stuck DCO recovery — when a DCO check has been pending for
    ≥ 60s on a dependabot PR that itself was created and last
    updated ≥ 60s ago, dependamerge now treats the check as stuck
    and routes through the existing @dependabot recreate flow
    instead of failing. Previously this required manual
    intervention (real example: Chore: Bump lfit/releng-reusable-workflows/.github/workflows/reuse-sonarqube-cloud.yaml from 0.3.3 to 0.3.4 lfreleng-actions/lftools-uv#342).

  3. Decouple wait from dispatch — repo-scoped runs no longer
    force concurrency=1. The actual merge_pull_request API
    call is serialised per repository via a new dispatch lock,
    while approve / rebase / Step 5.5 wait loops run in parallel.
    A single PR parked waiting for required checks no longer
    head-of-line blocks the rest of the batch.

All three plug into existing background-park infrastructure, so
recovery happens without blocking other PRs in the worker pool.


1. Local rebase (Feat(merge))

Motivation

Real-world failure on lfreleng-actions/gerrit-review-action#98:

  1. pre-commit-ci[bot] autoupdate PR was behind base.
  2. dependamerge called PUT /repos/.../pulls/98/update-branch.
  3. The server-side merge commit was signed by the calling token's
    user — that identity isn't registered at GitHub.
  4. PR lost its Verified badge → branch protection (which
    requires verified signatures) blocked the merge.
  5. pre-commit-ci[bot] has no recreate / rebase macro
    (pre-commit-ci/issues#41),
    so the PR was stuck until manually closed.

Approach

Two new helpers on AsyncMergeManager:

  • _should_use_local_rebase()(use_local, reason).
    Activates when rebase_local is enabled (CLI default) AND
    either:
    • the PR author is pre-commit-ci[bot] (no recovery macro), OR
    • the base branch's requires_commit_signatures() is True
      AND check_pr_commit_signatures() confirms the current PR
      head is verified.
  • _local_git_rebase_pr() — secure tempdir + clone + fetch
    • git rebase + git push --force-with-lease, using existing
      git_ops helpers. Workspace cleaned up in finally.

When the gate says use_local, REST update_branch() is
never called — even on local-rebase failure. Falling back
would defeat the feature.

CLI

New --rebase-local / --no-rebase-local flag on merge
(default: enabled).


2. Stuck DCO recovery (Fix(merge))

Motivation

Real-world failure on lfreleng-actions/lftools-uv#342 — a
dependabot PR sat blocked on a DCO check that GitHub never
completed. DCO normally finishes in seconds; when it stalls,
the only reliable recovery for dependabot PRs is to ask
dependabot to recreate the PR so the check fires again on a
fresh head SHA.

Approach

New AsyncMergeManager._detect_stuck_dco(pr_info) returning
(is_stuck, check_name, age_seconds):

  • Examines both the modern /check-runs API and the older
    /commits/{sha}/status contexts.
  • Matches DCO-shaped names: DCO, dco/dco, dcobot, any
    dco/..., plus any name containing signoff / sign-off /
    signed-off.
  • Treats a check as stuck only when all of:
    • status is queued / in_progress / pending, AND
    • the check has been pending ≥ STUCK_DCO_THRESHOLD_SECONDS
      (60s, new module constant), AND
    • the PR itself was created ≥ 60s ago AND last updated ≥ 60s
      ago.
  • The PR-level age + idle floors prevent false positives on PRs
    caught immediately after open or force-push.
  • All API failures degrade to (False, None, 0.0).

Wiring: the existing recreate-eligibility check in
_merge_single_pr (which fired only on branch protection
failure summaries) now also fires when _detect_stuck_dco
returns True for a dependabot PR, routing through the same
_trigger_dependabot_recreate flow. The detection log line
uses markup=False so a check name containing [required]-style
brackets is not silently swallowed by Rich's markup parser.


3. Decouple wait from dispatch (Fix(merge))

Motivation

Real-world output from a 3-PR batch where one PR was waiting on
required checks:

🚀 Merging 3 pull requests...
🔀 Merging PRs in lfreleng-actions/http-api-tool-docker (0/3 PRs, 0%)
   ⏳ Waiting for 1 PR to complete checks [217s]
   ⏱️  Elapsed: 1m 27s

PRs #2 and #3 couldn't even start because the worker concurrency
was forced to 1 for repo-scoped runs (to avoid races between
back-to-back merges on the same repo). With the default 5-minute
merge_timeout, a single slow PR stalled the entire batch for up
to N × 5 minutes.

Approach

Split the two concerns the old concurrency=1 was conflating:

  • Workers run in parallel again (default cap of 5 for repo
    runs, capped by PR count).
  • The actual merge_pull_request API call is serialised per
    repository
    via a new
    AsyncMergeManager._get_merge_dispatch_lock(owner, repo)
    helper. Workers on the same repo queue on the lock; workers
    on different repos hold distinct locks and dispatch in
    parallel.
  • Approve, rebase polling, Step 5.5's auto-merge wait, and DCO
    status polling all run outside the lock
    , so a PR parked
    waiting for required checks no longer blocks anything else.

The cli.py repo-scoped _run_parallel_merge call sites raise
concurrency=1 to min(5, len(prs)) or 1, with the rationale
recorded inline.


Tests

Local rebase — 15 tests in tests/test_local_rebase_signing.py.

Stuck DCO — 10 tests in tests/test_stuck_dco_detection.py:
detection via check-runs and the older status API, sub-threshold
age, completed check, non-DCO name, PR age floor, name variants,
and three robustness paths (PR fetch failure, unparseable
timestamps, no client).

Dispatch lock — 5 tests in tests/test_merge_dispatch_lock.py:

  • TestMergeDispatchLockIdentity (3): same repo returns the
    same lock, different repos return distinct locks, concurrent
    first-acquire on the same key returns one shared lock
    (proving the outer lazy-init guard works).
  • TestMergeDispatchLockSerialisation (2): same-repo workers
    serialise (peak == 1), different-repo workers run in
    parallel (peak == N).

Full suite: 910/910 pass (892 baseline + 18 new across the
three features).

Limitations

  • Local rebase: requires git on PATH. Per-PR shallow clone
    has a small disk + network cost (sub-second for typical
    autoupdate PRs). Concurrent rebases get separate temp
    workspaces.

  • Stuck DCO: 60s threshold is fixed; could become a CLI flag
    later if needed. Detection requires the check name to match
    the allow-list — uncommon DCO bot variants would need the
    matcher extended.

  • Dispatch lock: the per-repo lock is per-process. Multiple
    concurrent dependamerge invocations against the same repo
    could still race; this is no worse than the previous behaviour
    and is generally avoided by GitHub's own rate limiting.

Copilot AI review requested due to automatic review settings May 6, 2026 06:17
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a local git-based rebase path intended to preserve commit verification when updating “behind” PR branches, and (stacked from #270) expands the merge flow to enable auto-merge + wait (Step 5.5) for PRs that are temporarily blocked/behind/unstable while required checks settle.

Changes:

  • Add a signature-preserving “local clone + rebase + force-with-lease push” option (--rebase-local/--no-rebase-local) and gating logic for when to use it.
  • Add/extend Step 5.5 auto-merge enablement + bounded wait behavior, including a parallel merge wait-status ticker.
  • Add substantial test coverage for the new gating/dispatch behavior and Step 5.5 routing outcomes.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/dependamerge/merge_manager.py Implements local-rebase gating + local git workflow; adds Step 5.5 wait/ticker plumbing and refines auto-merge/skip-gate behavior.
src/dependamerge/cli.py Wires the new --rebase-local/--no-rebase-local option into the merge context and manager construction.
tests/test_local_rebase_signing.py Adds new unit/integration-style tests covering the local-rebase gate, dispatch, and URL auth helper.
tests/test_auto_merge.py Updates and extends Step 5.5/skip-gate tests for blocked/behind/unstable behavior and short timeouts.
tests/test_behind_pr_handling.py Updates expectations for behind + --no-fix to allow Step 5.5 routing.
scripts/demo_wait_ticker.py Adds a manual demo harness for the wait-status ticker behavior without GitHub API calls.
docs/TESTING_WAIT_STATUS_TICKER.md Documents manual verification options and recommends the new demo script.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/dependamerge/merge_manager.py Outdated
Comment thread src/dependamerge/merge_manager.py Outdated
Comment thread src/dependamerge/merge_manager.py Outdated
Comment thread src/dependamerge/merge_manager.py Outdated
Comment thread tests/test_local_rebase_signing.py
Copilot AI review requested due to automatic review settings May 6, 2026 09:03

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

@ModeSevenIndustrialSolutions ModeSevenIndustrialSolutions requested review from a team and Copilot May 6, 2026 15:47
@ModeSevenIndustrialSolutions ModeSevenIndustrialSolutions force-pushed the fix/preserve-signatures-on-rebase branch from 006273d to 2e4f068 Compare May 11, 2026 14:40
@ModeSevenIndustrialSolutions ModeSevenIndustrialSolutions changed the title Feat(merge): Preserve commit signatures via local rebase Feat: Local rebase signing and stuck DCO recovery for dependabot PRs May 11, 2026
@ModeSevenIndustrialSolutions ModeSevenIndustrialSolutions changed the title Feat: Local rebase signing and stuck DCO recovery for dependabot PRs Feat: Signed rebase, stuck DCO recovery, and per-repo dispatch lock May 12, 2026
When a PR is `behind` the base branch, dependamerge previously
called the GitHub REST `PUT /repos/{owner}/{repo}/pulls/{n}/
update-branch` endpoint to bring it up to date. That endpoint
creates a server-side merge commit whose committer is the calling
token's GitHub user — and that committer is not signed with the
user's local SSH/GPG key. On repositories whose branch protection
requires verified signatures, the resulting commit loses its
`Verified` badge and the PR becomes un-mergeable until a human
intervenes.

`pre-commit-ci[bot]` PRs are particularly affected because that
bot has no comment macro for recreating a PR with a re-signed
commit (pre-commit-ci/issues#41), so any
pre-commit-ci PR we break this way is stuck until somebody closes
it manually.

This commit adds an opt-out (`--no-rebase-local`) local-git rebase
path that:

- Clones the head repo into a secure temp workspace using the
  existing `git_ops` helpers (the same ones the
  `dependamerge blocked --fix` flow already uses for interactive
  conflict resolution).
- Fetches the base branch (from upstream when the PR is from a
  fork, from origin otherwise).
- Runs `git rebase` non-interactively. Because we shell out to
  the user's `git` binary, the user's `~/.gitconfig`
  (`commit.gpgsign`, `gpg.format`, `user.signingkey`) is honoured
  automatically and rebased commits remain verified.
- Force-pushes-with-lease back to the head repo.
- Cleans up the workspace.

A new `_should_use_local_rebase()` gate decides between this
local path and the legacy REST path. The local path activates
when:

- `rebase_local` is True (CLI default; `--no-rebase-local`
  opts out), AND
- either the PR author is `pre-commit-ci[bot]` (always — no
  recovery macro available), OR
- `requires_commit_signatures()` returns True for the base
  branch (classic protection or rulesets) AND
  `check_pr_commit_signatures()` confirms the current PR head
  is verified (so REST update-branch *would* break it).

The gate uses `is True` rather than truthy comparison so
`AsyncMock` defaults can't accidentally route real PRs into the
network-touching local-rebase path in unit tests. If the
signature requirement check raises, we fail safely to the REST
path. If the PR head is not currently verified, REST
update-branch can't make verification any worse, so we use the
cheaper REST path.

When the gate says use_local, Step 5 ALWAYS skips REST
`update-branch` — even when the local rebase fails (no `git` on
PATH, conflict during rebase, network error, push rejected).
Falling back to REST in the failure case would defeat the
feature's entire purpose. Instead, we mark the PR as having been
through Step 5 (so Step 5.5 doesn't double the configured
`merge_timeout`) and let Step 5.5 enable auto-merge — auto-merge
produces a properly verified merge commit signed by GitHub
itself when checks pass.

CLI:

- New `--rebase-local` / `--no-rebase-local` flag on `merge`
  (default: enabled). The help text explains the trade-off
  between signature preservation and the per-PR clone cost.

Tests (15 new in `tests/test_local_rebase_signing.py`):

- `TestShouldUseLocalRebaseGate` (8 tests): pin every branch of
  the gating decision tree — `--no-rebase-local`, pre-commit-ci,
  signed base + signed head, signed base + unsigned head,
  unsigned base, signature check failure, no GitHub client, and
  the `is True` strictness check that prevents AsyncMock leakage.
- `TestStep5DispatchLocalRebase` (4 tests): pin the integration
  invariant that REST `update_branch()` is NEVER called when
  `use_local` is True, regardless of whether the local rebase
  succeeds or fails. Also pins backward compatibility for
  unsigned-base PRs and the `--no-rebase-local` opt-out.
- `TestAuthedCloneUrl` (3 tests): pin token-injection behaviour
  for HTTPS / SSH / git:// URLs.

Existing tests:

- 877/877 pre-existing tests still pass (no regressions).
- Tightened the existing
  `test_actual_run_performs_rebase_for_behind_pr`
  was already passing without modification thanks to the
  `is True` strictness check on `requires_commit_signatures` —
  AsyncMock defaults no longer accidentally route into the local
  path.

Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: Matthew Watkins <mwatkins@linuxfoundation.org>
The local-rebase helper imports create_secure_tempdir via the
git_ops module namespace already (git_ops.create_secure_tempdir(...)),
so the duplicate import in the destructuring tuple is unused and
trips ruff's F401 check.

Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: Matthew Watkins <mwatkins@linuxfoundation.org>
Moves every code path dependamerge uses to bring a behind PR up
to date with its base branch into a new ``src/dependamerge/rebase.py``
module:

- ``should_use_local_rebase()`` (was ``_should_use_local_rebase``)
- ``local_rebase_pr()`` (was ``AsyncMergeManager._local_git_rebase_pr``)
- ``authed_clone_url()`` (was ``AsyncMergeManager._authed_clone_url``;
  also duplicated as ``FixOrchestrator._authed_url``, deduplicated
  in a follow-up)
- ``perform_step5_rebase()`` (new top-level dispatcher)
- ``rest_rebase_and_poll()`` + post-rebase polling helpers (was
  inline in ``_merge_single_pr`` Step 5)

Step 5 in ``_merge_single_pr`` is now a ~25-line dispatcher call
to ``rebase.perform_step5_rebase(ctx=..., ...)``, replacing the
~280-line inline block that prompted basedpyright's
``Code is too complex to analyze`` warning. The warning is
resolved structurally rather than squelched.

Design notes:

- The dispatcher takes a ``RebaseContext`` dataclass (logger,
  console, GitHub client, callbacks, configuration, mutable
  ``_rebased_prs`` set) instead of a full ``AsyncMergeManager``
  reference. This keeps the rebase logic decoupled from manager
  internals and lets the new tests exercise it without standing
  up the whole merge pipeline.
- The post-rebase polling loop is split into:
  - ``_poll_post_rebase()`` — runs the loop, updates
    ``pr_info.head_sha`` from refresh payloads.
  - ``_poll_should_continue()`` — pure per-state decision
    function. The four ``mergeable_state`` cases (``clean``,
    ``behind``, ``blocked``, ``None``) become four short
    branches instead of nested ``if check_attempt < ...``
    blocks.
  - ``_log_blocked_timeout()`` — extracted user-facing log line.
  - ``_log_post_rebase_status()`` — extracted final status line.
- Local imports of ``git_ops`` symbols are now top-level since
  this whole module is rebase-related; no need to defer the cost
  to the call site any more.
- Local rebase signature-preservation invariant is preserved:
  when ``should_use_local_rebase`` returns True, REST
  ``update-branch`` is *never* called regardless of whether the
  local rebase succeeds or fails, so we never destroy a verified
  signature.

Tests:

- Updated ``tests/test_local_rebase_signing.py`` to call the new
  module-level functions instead of the removed manager methods.
- ``rebase_module.should_use_local_rebase(...)`` replaces
  ``mgr._should_use_local_rebase(...)``.
- ``patch("dependamerge.rebase.local_rebase_pr", ...)`` replaces
  ``patch.object(mgr, "_local_git_rebase_pr", ...)``.
- ``rebase_module.authed_clone_url(...)`` replaces
  ``AsyncMergeManager._authed_clone_url(...)``.
- All 15 tests still pass; 892/892 pass across the full suite.

Tooling:

- ruff: clean (auto-fixed import-from-collections-abc).
- mypy: clean.
- basedpyright: 0 errors, 0 warnings (the complexity warning is
  gone).

Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: Matthew Watkins <mwatkins@linuxfoundation.org>
Three substantive fixes plus a missing test, all on the new
``rebase`` module added in 06e7801.

1. **Local-rebase path now enables auto-merge.** Previously,
   marking ``_rebased_prs`` after a local rebase made Step 5.5
   skip the PR (its ``base_should_wait`` predicate excludes
   already-rebased PRs to avoid doubling ``merge_timeout``).
   Without auto-merge enablement, Step 6's skip gate also failed
   to fire (it requires ``pr_key in _auto_merge_enabled``), so
   the flow fell through to a manual merge attempt that 405'd
   against pending checks \u2014 exactly the failure mode this
   whole feature exists to prevent.

   ``_run_local_path()`` now calls ``ctx.enable_auto_merge()``
   immediately after the local rebase returns, regardless of
   whether the rebase itself succeeded or failed. On success,
   GitHub recomputes mergeability against the new head and
   auto-merge fires when checks pass. On failure, auto-merge
   stands ready to fire when the PR is brought up to date
   through some other channel (third-party rebase, human
   update). In both cases Step 6 sees ``pr_key in
   _auto_merge_enabled`` and routes correctly to
   ``AUTO_MERGE_PENDING``.

2. **``check_pr_commit_signatures()`` failures now fail closed.**
   Previously the gate returned ``True`` (use local path) when
   the PR-head signature check raised, on the rationale that
   we couldn't see whether the signature needed preserving.
   That conflicts with the documented gate ("base requires
   signatures AND PR head is verified") and triggers network-
   touching local clones on transient API errors.

   Now returns ``False`` (use REST). When verification isn't
   established, REST update-branch can't make things any
   worse than they already are, and we avoid the unintended
   side effect of cloning + force-pushing on transient errors.

3. **``authed_clone_url()`` docstring corrected.** The previous
   wording claimed the token "never appears in command-line
   arguments", but injecting the token into the HTTPS clone URL
   means it *does* appear in the URL passed to ``git`` \u2014
   visible to ``ps`` / ``/proc/<pid>/cmdline`` for the
   duration of the git invocation. Log output is separately
   redacted by ``git_ops._redact``, but no equivalent
   protection exists for ``ps``-style introspection.

   New docstring states the constraint accurately and
   suggests SSH keys / credential helpers as alternatives for
   callers that need stronger guarantees.

4. **Added ``test_pr_signature_check_failure_uses_rest``**
   covering ``check_pr_commit_signatures()`` raising. The
   existing ``test_signature_check_failure_uses_rest`` only
   covered ``requires_commit_signatures()`` raising, which is
   a distinct branch in the gate.

Updated ``test_pre_commit_ci_pr_skips_rest_update_branch_on_failure``
to assert the new (correct) outcome: the PR ends up
``AUTO_MERGE_PENDING`` rather than ``MERGED``, and is in
``_auto_merge_enabled``. The previous expectation was that the
manual merge would proceed anyway (because the test mocked it
to succeed); the new behaviour properly defers to auto-merge.

All 893 tests pass (was 892, +1 new). ruff / mypy /
basedpyright clean.

Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: Matthew Watkins <mwatkins@linuxfoundation.org>
Three more substantive fixes following the second review pass.

1. **Dead `try/except GitError` around `git_ops.rebase()`.**
   ``git_ops.rebase()`` invokes ``run_git`` with
   ``check=False``, so a non-zero exit code does *not* raise
   ``GitError`` \u2014 the surrounding ``try/except GitError`` was
   dead code. Removed it. The non-zero-returncode path is now
   the sole error handler, and the debug log now includes
   ``stderr`` and ``stdout`` from the failed rebase so anyone
   investigating can see the actual cause (conflicts, corrupt
   index, invalid ref, etc.) rather than always guessing
   "conflicts".

2. **Unused ``fix_out_of_date`` field on ``RebaseContext``.**
   The field was carried into the dataclass but never read in
   ``rebase.py``; the call-site gate (``mergeable_state == "behind"
   and self.fix_out_of_date``) lives in ``_merge_single_pr``,
   which only constructs a ``RebaseContext`` when the gate
   already passed. Dropped the field from both the dataclass and
   the construction site in ``merge_manager.py``.

3. **Test convention violation: passing ``mgr._github_client``
   instead of the typed ``client`` mock returned by
   ``_make_mgr()``.** ``tests/conftest.py`` documents the typed-
   mock pattern: tests should configure and reference the
   returned ``client`` AsyncMock to keep mock usage consistent
   and avoid optional-member-access warnings. Updated all eight
   gate tests to pass ``github_client=client`` instead. The one
   intentionally-None case (``test_no_github_client_returns_false``)
   now passes ``github_client=None`` directly rather than
   nulling out ``mgr._github_client`` and then dereferencing it.

All 893 tests still pass; ruff, mypy, and basedpyright remain
clean.

Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: Matthew Watkins <mwatkins@linuxfoundation.org>
Two doc-only fixes:

- rebase.py module docstring referenced a
  rest_rebase_and_poll function that doesn't exist (the REST
  path is implemented by the private _run_rest_path +
  _poll_post_rebase helpers).  Replaced with an accurate
  description that names the actual public dispatcher
  (perform_step5_rebase) and explains how it delegates to the
  private helpers.
- tests/test_local_rebase_signing.py module docstring and
  section header referenced _should_use_local_rebase (the old
  manager method name) instead of should_use_local_rebase
  (the new module-level function).  Updated both.

No behaviour change. All 893 tests pass; ruff clean.

Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: Matthew Watkins <mwatkins@linuxfoundation.org>
Two related Copilot comments flagged the same issue: when local
rebase succeeds but auto-merge cannot be enabled (repo doesn't
allow it, branch protection blocks it, etc.), unconditionally
marking ``_rebased_prs`` made Step 5.5 skip the PR. Step 6 then
fired a manual merge while GitHub was still recomputing
mergeability after the force-push, which 405'd transiently \u2014
exactly the failure mode this whole feature exists to prevent.

The fix:

- Capture ``enable_auto_merge``'s return value (previously
  discarded). Naming the result ``auto_merge_ok``.
- Only add the PR to ``ctx.rebased_prs`` when ``auto_merge_ok``
  is True. When auto-merge is unavailable, leave the marker
  unset so Step 5.5 still runs its bounded poll loop \u2014 GitHub
  gets time to settle and Step 6 can act against a stable
  state.

The signature-preservation invariant is unchanged: REST
``update-branch`` is still never called from the local-rebase
path, regardless of any outcome.

Updated docstring to spell out the linked decision:

> Auto-merge enablement and ``_rebased_prs`` marking are linked:
>
> - If auto-merge gets enabled, mark ``_rebased_prs`` so Step
>   5.5 skips this PR; auto-merge will handle the bounded wait
>   server-side and Step 6's skip gate routes to
>   ``AUTO_MERGE_PENDING``.
> - If auto-merge cannot be enabled (repo doesn't allow it,
>   branch protection blocks it, etc.), do not mark
>   ``_rebased_prs`` so Step 5.5 still runs its bounded poll
>   loop. Without this, GitHub may still be recomputing
>   mergeability after the force-push when Step 6 fires and a
>   manual merge attempt would 405 transiently.

Tests:

- New ``test_auto_merge_unavailable_leaves_pr_unmarked``
  pins the new branch: ``enable_auto_merge`` returns False,
  REST ``update-branch`` still doesn't fire, and the PR is NOT
  in ``_rebased_prs``.
- All 894 tests pass; ruff clean.

Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: Matthew Watkins <mwatkins@linuxfoundation.org>
Two more substantive fixes from the latest review.

1. **Fail closed when head repo identity is unknown.**
   ``local_rebase_pr()`` previously synthesised
   ``head_clone_url`` from ``base_full`` when both
   ``head_repo_clone_url`` and ``head_repo_full_name`` were
   missing. ``PullRequestInfo`` objects from the merge workflow
   don't populate those fields (they're only set in the
   interactive fix flow), so for a fork PR this would have
   force-pushed the rebased branch to the *base* repo instead
   of the fork \u2014 a dangerous mis-target that could create or
   overwrite a branch on someone else's repository.

   Fix: when both fields are unset, return False immediately
   so the caller falls through to the auto-merge path. The
   existing tests already populate both fields, so this is a
   safety improvement against the production state, not a
   change in observed behaviour for code paths that today
   reach ``local_rebase_pr()``.

2. **Clamp wait-loop sleep duration to non-negative.**
   The Step 5.5 wait loop computed ``remaining = wait_deadline
   - time()`` after the ``while ... < wait_deadline`` check.
   The two reads aren't atomic, so a near-deadline crossing
   could produce a tiny negative ``remaining`` which would
   raise ``ValueError`` from ``asyncio.sleep`` and abort the
   merge worker.

   Fix: clamp via ``max(0.0, ...)`` so ``asyncio.sleep`` always
   gets a non-negative argument. Worst case: a no-op sleep
   right before the loop exits naturally on the next iteration.

Tests:

- New ``test_missing_head_repo_identity_returns_false``
  pins the new fail-closed branch in ``local_rebase_pr()``.
  895/895 tests pass; ruff clean.

Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: Matthew Watkins <mwatkins@linuxfoundation.org>
Two related Copilot comments addressed:

1. **Populate head/base repo metadata in PullRequestInfo
   constructors** so the local-rebase fail-closed gate (added
   in d36d6e7) doesn't unnecessarily refuse the local path for
   real-world PRs.

   Before this change, ``PullRequestInfo`` objects from the
   merge workflow had ``head_repo_full_name``,
   ``head_repo_clone_url``, ``base_repo_full_name``,
   ``base_repo_clone_url``, and ``is_fork`` left unset (those
   fields were only populated in the interactive ``blocked
   --fix`` flow). The fail-closed gate would then refuse the
   local path even for non-fork PRs that could safely be
   rebased.

   Two constructor sites updated:

   - ``GitHubClient.get_pull_request_info`` (REST source PR):
     pulls the fields straight out of the existing ``head.repo``
     / ``base.repo`` substructures of the REST PR payload.
   - ``GitHubService.to_pull_request_info`` (GraphQL similar
     PRs): added ``headRepository`` / ``baseRepository``
     queries to ``REPO_OPEN_PRS_PAGE`` and
     ``ORG_REPOS_WITH_OPEN_PRS``, then propagated the response
     fields. The GraphQL ``url`` field doesn't include the
     ``.git`` suffix that the REST ``clone_url`` does, so we
     synthesise the canonical ``.git`` form for parity.

2. **Refactor ``_make_mgr()`` to wrap the shared
   ``tests.conftest.make_merge_manager`` helper** instead of
   reimplementing it. The repo's test guidelines centralise
   the typed-mock convention there, and reimplementing it
   diverged the ``token`` default and the
   ``mgr._github_client`` assignment style. The new wrapper
   adds only the ``rebase_local=True`` default specific to
   this test module.

All 895 tests pass; ruff clean. The fail-closed gate added in
d36d6e7 remains in place as a defensive guard against any
future code path that constructs a ``PullRequestInfo`` without
populating the head/base repo fields.

Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: Matthew Watkins <mwatkins@linuxfoundation.org>
Previously the function caught all exceptions and returned
(True, []) on error, on the rationale that fail-open avoids
blocking merges unnecessarily. That default was wrong for the
new signature-preservation gate in ``rebase.py``, which
interprets ``all_verified=True`` as a positive confirmation
that the PR head is signed. An API failure could route a real PR
into the network-touching local-rebase path even though we
couldn't actually confirm anything about the signature state.

Two callers, two correct behaviours:

- ``_trigger_dependabot_recreate``: already wraps in
  ``try/except`` and returns ``None`` (don't recreate) on
  error \u2014 the existing handler matches a fail-safe interpretation.
- ``should_use_local_rebase``: wraps in ``try/except`` and
  returns `(False, ...)`` (fail-closed) on error \u2014 already
  added in d36d6e7 to match the documented gate invariant.

So both call sites now do the right thing for their own
semantics, and neither relies on the conflated default.

Updated ``test_api_error_returns_verified`` \u2192
``test_api_error_propagates`` to assert the new contract: the
exception surfaces to callers via ``pytest.raises``. 895/895
tests pass.

Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: Matthew Watkins <mwatkins@linuxfoundation.org>
Two more fixes from the latest review pass.

1. **Fork detection no longer collapses head_full to base_full
   too early.** The previous logic did ``head_full = head_full or
   base_full`` *before* the fork comparison
   ``(head_full or base_full) != base_full``, which always
   evaluated False whenever head_full was missing - incorrectly
   treating fork PRs as non-forks. The base branch would then be
   fetched from origin (the fork) instead of upstream, and the
   fork-vs-non-fork remote routing got it backwards.

   New decision order, all defensive:
     1. ``pr_info.is_fork`` from the API (most reliable)
     2. Direct full_name comparison
     3. Direct clone_url comparison
     4. Default to non-fork

   The ``head_full = head_full or base_full`` collapse now
   happens *after* ``is_fork`` is computed, so the fallback
   only affects clone-URL synthesis (safe - guarded by the
   missing-head_repo identity check that fails closed earlier).

2. **Unshallow + retry on rebase failure.** The default depth=50
   shallow clone can miss the merge base for PRs whose branch
   point is older than 50 commits, causing ``git rebase`` to fail
   with a non-zero exit. Since this code path deliberately doesn't
   fall back to REST update-branch, those PRs would be stuck
   forever on the old code.

   New recovery: when the first rebase exits non-zero, abort the
   rebase, unshallow both remotes (origin always; upstream when
   is_fork), and retry once. If that also fails, the cause is
   genuine (conflicts, corrupt index, etc.) and we return False
   as before. The cost of the unshallow only hits us on the
   rare PRs where the shallow depth wasn't enough, not the
   common case.

895/895 tests pass; ruff clean.

Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: Matthew Watkins <mwatkins@linuxfoundation.org>
Real-world failure on lfreleng-actions/lftools-uv#342 \u2014 a
dependabot PR sat blocked on a DCO check that GitHub never
completed.  DCO normally finishes in seconds; when it stalls
the only reliable recovery is to ask dependabot to recreate
the PR so the check fires again on a fresh head SHA.  This
class of failure previously required manual intervention.

Adds a stuck-check detector and routes it into the existing
`@dependabot recreate` path so the recovery is automatic.

Detection (`AsyncMergeManager._detect_stuck_dco`):

* Examines both the modern `/check-runs` API and the older
  `/commits/{sha}/status` contexts (some DCO bots use the
  status API).
* Matches DCO-shaped check names: `DCO`, `dco/dco`, `dcobot`,
  any `dco/...`, and any name containing `signoff` /
  `sign-off` / `signed-off`.
* Treats a check as stuck only when:
  - status is `queued` / `in_progress` / `pending`, AND
  - the check has been pending \u2265 60s (new module constant
    `STUCK_DCO_THRESHOLD_SECONDS`), AND
  - the PR itself was created \u2265 60s ago AND last updated
    \u2265 60s ago.
* The PR-level age + idle floors prevent false positives on
  PRs caught immediately after open or force-push, where the
  DCO check is simply running normally.
* All API failures degrade to `(False, None, 0.0)` so a
  network blip cannot fire a recreate.

Wiring (`AsyncMergeManager._merge_single_pr`):

The existing recreate-eligibility check (which fired only on
"branch protection" failure summaries) now also fires when
`_detect_stuck_dco` returns True for a dependabot PR, routing
through the same `_trigger_dependabot_recreate` flow.  That
path already posts the `@dependabot recreate` comment, polls
for the old PR to close, finds the replacement, and waits for
its checks to land \u2014 so the "park in the background while
other PRs continue" behaviour comes for free.

The new detection log line is printed with `markup=False` so
a check name containing characters that look like Rich markup
(e.g. `[required]`) is not silently swallowed by the console
parser.

Tests (`tests/test_stuck_dco_detection.py`, 10 new):

* `TestDetectStuckDcoCheckRuns` (6): positive detection,
  sub-threshold age, completed check, non-DCO check name,
  PR-level age floor, `dco/dco` name variant.
* `TestDetectStuckDcoStatusContexts` (1): detection via the
  older status API.
* `TestDetectStuckDcoRobustness` (3): PR fetch failure,
  unparseable timestamps, no client \u2014 each must return the
  safe default.

Full suite: 905/905 pass (892 baseline + 13 new).

Co-authored-by: Claude <claude@anthropic.com>
Signed-off-by: Matthew Watkins <mwatkins@linuxfoundation.org>
In repo-scoped runs the worker concurrency was forced to 1 to
avoid races between back-to-back merge_pull_request calls on
the same repo.  That made any PR sitting in Step 5.5's
auto-merge wait loop head-of-line block every other PR in the
batch with the default 5-minute merge_timeout, a single slow
PR could stall a batch of N PRs for up to N x 5 minutes.

This commit splits the two concerns:

* Workers run in parallel again (default cap of 5 for repo
  runs, capped by PR count).
* The actual merge_pull_request API call is serialised per
  repository via a new _get_merge_dispatch_lock(owner, repo)
  helper.  Workers on the same repo queue on the lock; workers
  on different repos hold distinct locks and dispatch in
  parallel.
* Approve, rebase polling, Step 5.5's auto-merge wait, and DCO
  status polling all run outside the lock, so a PR parked
  waiting for required checks no longer blocks anything else.

Wiring:

* New self._merge_dispatch_locks: dict[str, asyncio.Lock] on
  AsyncMergeManager, guarded by _merge_dispatch_locks_lock for
  safe lazy creation.
* New async helper _get_merge_dispatch_lock(owner, repo).
* Both merge_pull_request call sites in _merge_single_pr (main
  path and recreated-PR path) now wrap the call in
  async with await self._get_merge_dispatch_lock(...).
* cli.py's two repo-scoped _run_parallel_merge call sites
  raise concurrency=1 to min(5, len(prs)) or 1, with the
  rationale recorded inline.

Tests (tests/test_merge_dispatch_lock.py, 5 new):

* TestMergeDispatchLockIdentity (3): same repo returns the
  same lock, different repos return distinct locks, and
  concurrent first-acquire on the same key returns one shared
  lock (proving the outer lazy-init guard works).
* TestMergeDispatchLockSerialisation (2): same-repo workers
  serialise (peak == 1), different-repo workers run in
  parallel (peak == N).

Full suite: 910/910 pass (905 baseline + 5 new).

Co-authored-by: Claude <claude@anthropic.com>
Signed-off-by: Matthew Watkins <mwatkins@linuxfoundation.org>
@ModeSevenIndustrialSolutions ModeSevenIndustrialSolutions force-pushed the fix/preserve-signatures-on-rebase branch from f6202a0 to 7a86dea Compare May 13, 2026 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants