Auto-Resume-Paused-Session: 32 tasks across 2026-06-18-Auto-Resume-On-Limit/AUTO-RESUME-01, 2026-06-18-Auto-Resume-On-Limit/AUTO-RESUME-02 +2 more#1108
Conversation
Add the foundation for auto-resuming agents that pause on a token/API/credit limit (no behavior changes yet): - settingsMetadata.ts: autoResumeOnLimit (bool, default true), autoResumeCheckIntervalHours (number, default 2), autoResumeGiveUpDays (number, default 7), all in the 'updates' category next to checkForUpdatesOnStartup. - defaults.ts: matching SETTINGS_DEFAULTS entries. - settingsStore.ts: full wiring (interface values + setter decls, initial state, setter impls, loadAllSettings patch, getSettingsActions map). - useSettings.ts: getter/setter pairs on UseSettingsReturn. - GeneralTab.tsx: 'Auto-Resume on Limit' section with a toggle plus two clamped numeric inputs (interval hours, give-up days) and a helper caption. - searchableSettings.ts: two search entries (general-auto-resume, general-auto-resume-interval) for discoverability. - types.ts: AgentError.limitResetAt and AgentError.resumeAttemptCount for later retry/backoff phases. - tests: defaults + searchable-query coverage for the new settings.
Phase 2 records limit pauses (no resume action yet - that's Phase 3) for all three run kinds: standard query, spec-driven Auto Run, and goal-driven Auto Run. - shared/types.ts: isLimitError(err) single source of truth (rate_limited or token_exhaustion). - main/agents/limitResetEstimator.ts: getLimitResetAt(agentId, claudeConfigDir?) reads the Claude usage snapshot and returns the nearest future reset; undefined for non-Claude / missing / all-past. Exposed via agents:getLimitResetAt IPC (handler + preload bridge + renderer type). - useAgentErrorListener.ts: on a limit error, seed resumeAttemptCount=0 and best-effort patch limitResetAt after the synchronous pause (non-blocking, timestamp-guarded). Extend recoveryAction.lastUserPrompt capture to limit errors so a direct send can be re-fired. - useQueueProcessing.ts: document the runtime-recovery effect as the standard-query auto-resume path. - useGoalRunner.ts: detect a limit error after a failed spawn and park on the shared errorResolution promise (mirrors spec-driven), retrying the SAME iteration on resume without consuming it. errorResolutionRefs threaded through useBatchProcessor.ts. - useBatchControlActions.ts: mark resumeAfterError as the Phase 3 resume entry point (shared by spec- and goal-driven pauses). - tests: isLimitError truth table, getLimitResetAt cases, goal-runner pause-without-consuming-iteration, listener metadata + prompt capture.
Renderer singleton that, on the autoResumeCheckIntervalHours interval (default 2h), finds every limit-paused agent, probes whether its provider window has reopened, and resumes the clear ones - dispatching the correct resume action per run kind and firing a green "Resumed" toast. - useAutoResumeCoordinator.ts: interval timer (early-returns + clears when autoResumeOnLimit is off; kickoff tick shortly after mount), pure predicates (isLimitPausedSession / isEligibleToProbe), probeAvailability (Claude: fresh usage snapshot below LIMIT_THRESHOLD; all other providers: resume-as-probe), and runAutoResumeTick (select -> stamp limitPausedAt -> re-sample once -> probe -> resume). In-flight Set guards against double-resume. - Resume routes spec- AND goal-driven through resumeAutoRunAfterError (resolves the shared errorResolution promise both runners await); standard clears the error so the queue drains, re-firing a captured direct send. Bumps resumeAttemptCount before attempting. - AgentError.limitPausedAt: Phase 4 give-up stamp, seeded from the error timestamp. - claudeUsageStore: non-hook getClaudeUsageSnapshotForSession() for the probe. - Mounted once in App.tsx beside useAgentListeners. - 22 unit tests; full stores + batch + agent hook suites stay green.
…H, docs Make the feature survive an app restart, add a time-based give-up backstop, make the probe SSH-aware, and document the whole lifecycle. - Restart re-attachment: a limit pause is now the ONE error state that round-trips persistence. prepareSessionForPersistence (save) and restoreSession (load) preserve agentError/agentErrorPaused/agentErrorTabId/ state:'error' ONLY when isLimitError + paused; every other error stays stripped. The full agentError survives, so limitResetAt/resumeAttemptCount/ limitPausedAt round-trip. On cold start the coordinator re-finds the session with no new wiring (kickoff tick) and resumes the AGENT conversation via the native --resume; the persisted executionQueue drains. The in-memory Auto Run/ goal loop is NOT reconstructed (batchStore is in-memory), so even a formerly Auto-Run session resumes via the standard queue-drain path. Documented at all three sites + docs. - Give up (time-based): runAutoResumeTick gained a give-up pass off autoResumeGiveUpDays (default 7). After limitPausedAt + N days it parks the session and fires ONE distinct orange "Auto-resume stopped" toast. Purely time-based - resumeAttemptCount is telemetry only, no attempt cap, keeps retrying the whole window. The give-up anchor (in-memory, rebuilt from persisted limitPausedAt) survives a resume-then-re-hit so "N days of REPEATED limits" elapses; pruneGiveUpTracking forgets a session once it's no longer limit-paused so a later limit starts fresh. Removed the now-redundant stampLimitPausedAt (resolveGiveUpAnchor owns stamping). - SSH awareness: probeAvailability falls back to the interval-based resume-as- probe for sessionSshRemoteConfig.enabled sessions instead of reading the LOCAL usage snapshot (maestro-p --status runs locally only, wrong account for a remote). Limitation documented in probeAvailability + claude-usage-sampler. - Respect manual action + paused queue items: verified clearAgentError drops a session from the coordinator's selector; verified resume never un-pauses held queue items (nextRunnableQueueItem skips them). - Docs: STATE-PATTERNS.md "Auto-Resume On Limit" section + user-facing section in autorun-playbooks.md. - Tests: coordinator restart/give-up/manual-action/SSH (32 total), persistence + restoration limit-pause round-trip, useQueueProcessing paused- queue dispatch. 246 across the 4 touched files; broader sweep 394/42 green.
📝 WalkthroughWalkthroughAdds an "Auto-Resume On Limit" feature to Maestro. When an agent pauses due to provider rate/token/credit limits, a new coordinator periodically probes availability (using Claude usage snapshots for Claude, interval-based for others), then resumes the session automatically. Limit-pause state is persisted across restarts, goal-runner iterations can pause and retry, and users can configure or disable the behavior via new settings. ChangesAuto-Resume On Limit
Sequence Diagram(s)sequenceDiagram
rect rgba(100, 150, 255, 0.5)
Note over Agent,useAgentErrorListener: Phase 1 — Pause
end
participant Agent
participant useAgentErrorListener
participant getLimitResetAt as getLimitResetAt (main)
participant useDebouncedPersistence
Agent->>useAgentErrorListener: rate_limited / token_exhaustion error
useAgentErrorListener->>useAgentErrorListener: reset resumeAttemptCount=0, stash lastUserPrompt
useAgentErrorListener->>getLimitResetAt: invoke('agents:getLimitResetAt', agentId)
getLimitResetAt-->>useAgentErrorListener: limitResetAt (epoch ms) | undefined
useAgentErrorListener->>useAgentErrorListener: patch agentError.limitResetAt on session
useAgentErrorListener->>useDebouncedPersistence: persist limit-pause state (state='error')
rect rgba(100, 200, 100, 0.5)
Note over useAutoResumeCoordinator,sessionStore: Phase 2 — Periodic probe and resume
end
participant useAutoResumeCoordinator
participant probeAvailability
participant claudeUsageStore
participant sessionStore
useAutoResumeCoordinator->>probeAvailability: tick (10s initial, then interval)
probeAvailability->>claudeUsageStore: refresh snapshot (Claude only)
claudeUsageStore-->>probeAvailability: available or unavailable
alt available
probeAvailability-->>useAutoResumeCoordinator: true
useAutoResumeCoordinator->>sessionStore: re-enqueue prompt / resumeAutoRunAfterError
useAutoResumeCoordinator->>useAutoResumeCoordinator: green "Resumed" toast
else give-up window exceeded
useAutoResumeCoordinator->>useAutoResumeCoordinator: orange "Auto-resume stopped" toast
end
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
Greptile SummaryThis PR introduces a multi-phase "Auto-Resume on Limit" feature that automatically detects when an agent is paused due to a token, API, or credit limit, and resumes it once the provider window reopens. The implementation spans settings, error metadata, an IPC estimator for reset timing, a renderer-side coordinator that probes on a configurable interval, persistence changes to survive app restarts, and goal-runner integration.
Confidence Score: 4/5The feature is well-architected with clear separation of concerns, good test coverage, and thoughtful handling of SSH, restart durability, and give-up semantics. One timing assumption in the goal runner's limit detection path could leave a goal-driven session stuck in an unresumable paused state if the IPC error event arrives after the spawn invoke settles. The goal runner reads src/renderer/hooks/batch/internal/useGoalRunner.ts — the limit-pause detection block around the spawn result check. src/shared/settingsMetadata.ts — auto-resume settings carry the wrong category label. Important Files Changed
Sequence Diagram%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant Agent
participant ErrorListener as useAgentErrorListener
participant Coordinator as useAutoResumeCoordinator
participant Persistence as useDebouncedPersistence
participant Store as SessionStore
Agent->>ErrorListener: IPC agent:error (rate_limited / token_exhaustion)
ErrorListener->>Store: "set agentError + agentErrorPaused=true"
ErrorListener->>ErrorListener: "seed resumeAttemptCount=0"
ErrorListener-->>Store: async patch limitResetAt via getLimitResetAt IPC
ErrorListener->>Persistence: trigger persist (state:error kept for limit pauses)
Note over Coordinator: Tick fires after INITIAL_TICK_DELAY_MS or intervalHours
Coordinator->>Store: find isLimitPausedSession + isEligibleToProbe
Coordinator->>Coordinator: resolveGiveUpAnchor - check give-up window
alt Claude local session
Coordinator->>Store: probeAvailability reads Claude usage snapshot
else non-Claude or SSH session
Coordinator->>Coordinator: probeAvailability returns true - resume as probe
end
alt credits available
Coordinator->>Store: resume - clearAgentError or resumeAutoRunAfterError
Coordinator->>Store: fireResumedToast green
else not yet available
Coordinator->>Coordinator: skip and retry next interval
end
alt giveUpDays elapsed
Coordinator->>Store: fireGaveUpToast orange sticky
end
Note over Persistence: App restart
Persistence->>Store: restore limit-pause sessions as state:error
Coordinator->>Store: INITIAL_TICK_DELAY_MS tick re-finds paused sessions
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant Agent
participant ErrorListener as useAgentErrorListener
participant Coordinator as useAutoResumeCoordinator
participant Persistence as useDebouncedPersistence
participant Store as SessionStore
Agent->>ErrorListener: IPC agent:error (rate_limited / token_exhaustion)
ErrorListener->>Store: "set agentError + agentErrorPaused=true"
ErrorListener->>ErrorListener: "seed resumeAttemptCount=0"
ErrorListener-->>Store: async patch limitResetAt via getLimitResetAt IPC
ErrorListener->>Persistence: trigger persist (state:error kept for limit pauses)
Note over Coordinator: Tick fires after INITIAL_TICK_DELAY_MS or intervalHours
Coordinator->>Store: find isLimitPausedSession + isEligibleToProbe
Coordinator->>Coordinator: resolveGiveUpAnchor - check give-up window
alt Claude local session
Coordinator->>Store: probeAvailability reads Claude usage snapshot
else non-Claude or SSH session
Coordinator->>Coordinator: probeAvailability returns true - resume as probe
end
alt credits available
Coordinator->>Store: resume - clearAgentError or resumeAutoRunAfterError
Coordinator->>Store: fireResumedToast green
else not yet available
Coordinator->>Coordinator: skip and retry next interval
end
alt giveUpDays elapsed
Coordinator->>Store: fireGaveUpToast orange sticky
end
Note over Persistence: App restart
Persistence->>Store: restore limit-pause sessions as state:error
Coordinator->>Store: INITIAL_TICK_DELAY_MS tick re-finds paused sessions
Reviews (1): Last reviewed commit: "MAESTRO: Auto-Resume on Limit Phase 4 - ..." | Re-trigger Greptile |
| } | ||
| const elapsedTimeMs = Date.now() - iterationStart; | ||
|
|
||
| // Limit pause: if this iteration's spawn hit a token/API/credit or rate | ||
| // limit, do NOT consume the iteration. The agent-error listener has | ||
| // already stamped the session into the paused state (agentError + | ||
| // agentErrorPaused); here we mirror the spec-driven runner by awaiting an | ||
| // unblock signal on the shared errorResolution promise. On resume we retry | ||
| // the SAME iteration with the same prompt. Restart durability is out of | ||
| // scope (Phase 4): the loop is still alive in memory, it just parks here. | ||
| if (!result.success) { | ||
| const paused = selectSessionById(sessionId)(useSessionStore.getState()); | ||
| const pausedError = paused?.agentError; | ||
| // Guard on the timestamp so a stale agentError from an earlier iteration | ||
| // can't be mistaken for this spawn's failure. | ||
| if (pausedError && isLimitError(pausedError) && pausedError.timestamp >= iterationStart) { | ||
| // Ensure an unblock promise exists. The agent-error listener usually | ||
| // creates it via pauseBatchOnError (the goal batch is "running"), but | ||
| // create it here too in case we observed the failure first. | ||
| if (!errorResolutionRefs.current[sessionId]) { | ||
| let resolveFn: (action: ErrorResolutionAction) => void = () => {}; | ||
| const promise = new Promise<ErrorResolutionAction>((resolve) => { | ||
| resolveFn = resolve; | ||
| }); | ||
| errorResolutionRefs.current[sessionId] = { promise, resolve: resolveFn }; | ||
| } | ||
|
|
||
| window.maestro.logger.autorun( | ||
| `Goal run paused on ${pausedError.type}; awaiting resume`, | ||
| session.name, | ||
| { iteration, errorType: pausedError.type } | ||
| ); | ||
|
|
||
| const action = await errorResolutionRefs.current[sessionId].promise; | ||
| delete errorResolutionRefs.current[sessionId]; | ||
|
|
||
| if (action === 'abort' || action === 'skip-document') { | ||
| exitReason = 'stopped-by-user'; | ||
| exitDetail = 'Stopped by user.'; | ||
| break; | ||
| } | ||
|
|
||
| // Resume: retry the SAME iteration. Revert the counter so the limited | ||
| // attempt is not counted, then loop back to re-spawn the same prompt. | ||
| iteration--; | ||
| continue; | ||
| } | ||
| } | ||
|
|
||
| if (result.agentSessionId) { |
There was a problem hiding this comment.
IPC ordering assumption for limit-error detection
After a spawn returns !result.success the code immediately reads session.agentError from the Zustand store and expects to find the limit-error that caused the failure. This relies on the IPC agent-error event (handled by useAgentErrorListener) having been processed and committed to the store before the invoke response that carries result is settled in the goal runner. If those two IPC paths deliver in the reverse order — invoke response first, error event second — paused?.agentError will be undefined, the if block is skipped, and the goal runner falls through to regular failure handling. The session then gets stuck in errorPaused with nobody awaiting the errorResolution promise, requiring a manual resume or waiting for the next coordinator tick hours later. A short await for the next microtask flush before reading session.agentError, or checking the batch errorPaused state directly, would make the limit detection order-independent.
| autoResumeOnLimit: { | ||
| description: | ||
| 'Automatically resume agents that paused on a token, API, or credit limit once the provider window reopens.', | ||
| type: 'boolean', | ||
| default: true, | ||
| category: 'updates', | ||
| }, | ||
| autoResumeCheckIntervalHours: { | ||
| description: 'How often to probe for credit/limit availability before resuming paused agents.', | ||
| type: 'number', | ||
| default: 2, | ||
| category: 'updates', | ||
| }, | ||
| autoResumeGiveUpDays: { | ||
| description: | ||
| 'Stop auto-resuming a paused agent after this many days of repeated limits. Probing is cheap, so this is intentionally long.', | ||
| type: 'number', | ||
| default: 7, | ||
| category: 'updates', | ||
| }, |
There was a problem hiding this comment.
The three auto-resume settings are placed immediately after
checkForUpdatesOnStartup and inherit its 'updates' category, but they have nothing to do with update-checking. Any CLI tooling or UI that groups settings by category (the metadata's stated purpose) will surface these under "Updates & Crash Reporting" instead of a more appropriate heading. A new category value ('auto-resume' or 'agent') or the existing 'advanced' would be a better fit, and the SettingCategory union type would need to include it.
| autoResumeOnLimit: { | |
| description: | |
| 'Automatically resume agents that paused on a token, API, or credit limit once the provider window reopens.', | |
| type: 'boolean', | |
| default: true, | |
| category: 'updates', | |
| }, | |
| autoResumeCheckIntervalHours: { | |
| description: 'How often to probe for credit/limit availability before resuming paused agents.', | |
| type: 'number', | |
| default: 2, | |
| category: 'updates', | |
| }, | |
| autoResumeGiveUpDays: { | |
| description: | |
| 'Stop auto-resuming a paused agent after this many days of repeated limits. Probing is cheap, so this is intentionally long.', | |
| type: 'number', | |
| default: 7, | |
| category: 'updates', | |
| }, | |
| autoResumeOnLimit: { | |
| description: | |
| 'Automatically resume agents that paused on a token, API, or credit limit once the provider window reopens.', | |
| type: 'boolean', | |
| default: true, | |
| category: 'advanced', | |
| }, | |
| autoResumeCheckIntervalHours: { | |
| description: 'How often to probe for credit/limit availability before resuming paused agents.', | |
| type: 'number', | |
| default: 2, | |
| category: 'advanced', | |
| }, | |
| autoResumeGiveUpDays: { | |
| description: | |
| 'Stop auto-resuming a paused agent after this many days of repeated limits. Probing is cheap, so this is intentionally long.', | |
| type: 'number', | |
| default: 7, | |
| category: 'advanced', | |
| }, |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
src/renderer/hooks/agent/useAutoResumeCoordinator.ts (1)
396-469: 💤 Low value
runAutoResumeTickreturns before all probe/resume operations complete.The concurrent
void (async () => {...})()pattern at line 453 is intentional for parallel probing, but callers awaitingrunAutoResumeTickwon't know when all sessions have finished probing. This is fine for the timer-driven use case in the hook, but test assertions that callawait runAutoResumeTick(...)and immediately check state may race against the unfinished async operations.The test file (Context snippet 3) shows
await flush()afterrunAutoResumeTick, which likely handles this. If tests are passing consistently, this is acceptable as-is.Alternative: collect and await all probe promises
If you ever need callers to await completion (e.g., for deterministic testing without flush hacks), you could collect the promises:
+ const probePromises: Promise<void>[] = []; for (const session of retryable) { if (inFlight.has(session.id)) continue; inFlight.add(session.id); - void (async () => { + probePromises.push((async () => { try { // ... } finally { inFlight.delete(session.id); } - })(); + })()); } + await Promise.all(probePromises);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/renderer/hooks/agent/useAutoResumeCoordinator.ts` around lines 396 - 469, The `runAutoResumeTick` function returns before all concurrent probe/resume operations complete because the async operations are started with `void (async () => {...})()` pattern. To fix this for deterministic testing, collect all the async promises from the loop that iterates over retryable sessions into an array instead of using void, then await Promise.all() on that array before the function returns to ensure all probe/resume operations within the retryable loop are complete when runAutoResumeTick resolves.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/main/ipc/handlers/agents.ts`:
- Around line 1674-1686: The issue is that `getLimitResetAt` handler is
providing local Claude reset times for SSH-backed sessions, which causes
`isEligibleToProbe` to incorrectly defer probes based on the wrong account's
limit window instead of using the intended interval-based fallback. Either add a
check in `useAgentErrorListener.ts` (or where `getLimitResetAt` is called) to
prevent calling this handler for SSH-backed sessions and pass undefined instead,
or modify `isEligibleToProbe` to detect SSH-backed sessions and always return
`true` for them, bypassing the reset-time filter entirely so the interval-based
fallback in `probeAvailability` works as designed.
In `@src/renderer/hooks/utils/useDebouncedPersistence.ts`:
- Around line 134-137: The agentError persistence logic in the
useDebouncedPersistence hook currently preserves a limit error on any tab when
isLimitPause is true, but the intention is to preserve it only on the specific
paused tab. Add an additional condition to verify that the current tab being
processed is actually the paused tab before persisting its agentError. This
prevents stale errors from non-paused tabs from being revived after restart.
In `@src/renderer/stores/settingsStore.ts`:
- Around line 1037-1050: The numeric auto-resume settings
setAutoResumeCheckIntervalHours and setAutoResumeGiveUpDays accept raw values
without validation, allowing invalid values like zero, negative numbers, or
non-finite values to be persisted and cause runtime issues. Add clamping and
validation logic in both setter methods to ensure values are positive and finite
before persisting them via window.maestro.settings.set. Additionally, apply the
same validation logic at the hydration boundaries (mentioned at lines 2341-2348)
where these settings are read/initialized from the external store to prevent
invalid values from destabilizing the coordinator or causing immediate give-up
behavior.
---
Nitpick comments:
In `@src/renderer/hooks/agent/useAutoResumeCoordinator.ts`:
- Around line 396-469: The `runAutoResumeTick` function returns before all
concurrent probe/resume operations complete because the async operations are
started with `void (async () => {...})()` pattern. To fix this for deterministic
testing, collect all the async promises from the loop that iterates over
retryable sessions into an array instead of using void, then await Promise.all()
on that array before the function returns to ensure all probe/resume operations
within the retryable loop are complete when runAutoResumeTick resolves.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: aacaaf55-df91-4469-8b7b-b33aa56cefdc
📒 Files selected for processing (35)
docs/agent-guides/STATE-PATTERNS.mddocs/autorun-playbooks.mdsrc/__tests__/main/agents/limitResetEstimator.test.tssrc/__tests__/main/ipc/handlers/agents.test.tssrc/__tests__/main/stores/defaults.test.tssrc/__tests__/renderer/components/Settings/searchableSettings.test.tssrc/__tests__/renderer/hooks/agent/internal/useAgentErrorListener.test.tsxsrc/__tests__/renderer/hooks/agent/useAutoResumeCoordinator.test.tssrc/__tests__/renderer/hooks/batch/useGoalRunner.test.tssrc/__tests__/renderer/hooks/useQueueProcessing.test.tssrc/__tests__/renderer/hooks/useSessionRestoration.test.tssrc/__tests__/renderer/hooks/utils/useDebouncedPersistence.test.tssrc/__tests__/shared/isLimitError.test.tssrc/main/agents/claude-usage-sampler.tssrc/main/agents/limitResetEstimator.tssrc/main/ipc/handlers/agents.tssrc/main/preload/agents.tssrc/main/stores/defaults.tssrc/renderer/App.tsxsrc/renderer/components/Settings/searchableSettings.tssrc/renderer/components/Settings/tabs/GeneralTab.tsxsrc/renderer/global.d.tssrc/renderer/hooks/agent/internal/useAgentErrorListener.tssrc/renderer/hooks/agent/useAutoResumeCoordinator.tssrc/renderer/hooks/agent/useQueueProcessing.tssrc/renderer/hooks/batch/internal/useBatchControlActions.tssrc/renderer/hooks/batch/internal/useGoalRunner.tssrc/renderer/hooks/batch/useBatchProcessor.tssrc/renderer/hooks/session/useSessionRestoration.tssrc/renderer/hooks/settings/useSettings.tssrc/renderer/hooks/utils/useDebouncedPersistence.tssrc/renderer/stores/claudeUsageStore.tssrc/renderer/stores/settingsStore.tssrc/shared/settingsMetadata.tssrc/shared/types.ts
| // Best-effort estimate of when a paused agent's provider limit window reopens, | ||
| // used by auto-resume (Phase 3) to schedule its next probe. Claude reads its | ||
| // cached usage snapshot; other providers return undefined (fixed-interval | ||
| // fallback). Never throws - the renderer treats the result as advisory. | ||
| ipcMain.handle( | ||
| 'agents:getLimitResetAt', | ||
| withIpcErrorLogging( | ||
| handlerOpts('getLimitResetAt'), | ||
| async (agentId: string, claudeConfigDir?: string): Promise<number | undefined> => { | ||
| return getLimitResetAt(agentId, claudeConfigDir); | ||
| } | ||
| ) | ||
| ); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Where reset estimates are requested from renderer
rg -n --type=ts --type=tsx "getLimitResetAt\(" src/renderer -C 3
# 2) SSH handling + eligibility gating logic
rg -n --type=ts "isEligibleToProbe|limitResetAt|sessionSshRemoteConfig|probeAvailability|runAutoResumeTick" src/renderer/hooks/agent/useAutoResumeCoordinator.ts -C 6
# 3) Confirm whether limitResetAt stamping is conditioned on SSH state
rg -n --type=ts --type=tsx "window\.maestro\.agents\.getLimitResetAt|limitResetAt" src/renderer/hooks/agent/internal/useAgentErrorListener.ts -C 6Repository: RunMaestro/Maestro
Length of output: 90
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Search for getLimitResetAt calls in renderer
echo "=== getLimitResetAt usage in renderer ==="
rg -n "getLimitResetAt\(" src/renderer -A 3 -B 3
# 2) Examine useAutoResumeCoordinator for SSH/eligibility gating
echo "=== useAutoResumeCoordinator SSH and eligibility logic ==="
rg -n "isEligibleToProbe|limitResetAt|sshRemote|probeAvailability|runAutoResumeTick" src/renderer/hooks/agent/useAutoResumeCoordinator.ts -B 6 -A 6
# 3) Check useAgentErrorListener for how limitResetAt is used
echo "=== useAgentErrorListener limitResetAt usage ==="
fd "useAgentErrorListener" src/renderer --type f -exec cat -n {} \;Repository: RunMaestro/Maestro
Length of output: 6222
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Read full probeAvailability function
echo "=== Full probeAvailability function ==="
sed -n '137,158p' src/renderer/hooks/agent/useAutoResumeCoordinator.ts
# Read full context around getLimitResetAt call in useAgentErrorListener
echo "=== Context around getLimitResetAt call ==="
sed -n '245,275p' src/renderer/hooks/agent/internal/useAgentErrorListener.ts
# Check if Session type includes sshRemoteConfig
echo "=== Session type definition ==="
rg "interface Session|type Session" src/renderer -A 15Repository: RunMaestro/Maestro
Length of output: 50374
Remove local reset estimate from SSH-backed session probe eligibility gates.
When getLimitResetAt stamps a local Claude reset time onto SSH-backed sessions, isEligibleToProbe will defer their probes based on the wrong account's limit window, bypassing the intended interval-based fallback that probeAvailability designed for SSH sessions.
The probeAvailability function correctly returns true (unknown availability) for SSH because the usage sampler runs locally and doesn't honor sessionSshRemoteConfig. However, this fallback is unreachable: isEligibleToProbe filters SSH-backed sessions upstream, using a local-only limitResetAt to block probes even though their actual limit state is unknown.
Either prevent calling getLimitResetAt for SSH-backed sessions in useAgentErrorListener.ts, or modify isEligibleToProbe to always return true for SSH sessions, bypassing the reset-time filter for remote probes.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/main/ipc/handlers/agents.ts` around lines 1674 - 1686, The issue is that
`getLimitResetAt` handler is providing local Claude reset times for SSH-backed
sessions, which causes `isEligibleToProbe` to incorrectly defer probes based on
the wrong account's limit window instead of using the intended interval-based
fallback. Either add a check in `useAgentErrorListener.ts` (or where
`getLimitResetAt` is called) to prevent calling this handler for SSH-backed
sessions and pass undefined instead, or modify `isEligibleToProbe` to detect
SSH-backed sessions and always return `true` for them, bypassing the reset-time
filter entirely so the interval-based fallback in `probeAvailability` works as
designed.
| // Keep the paused tab's limit-pause error so it round-trips a restart; | ||
| // every other tab error is transient and stripped. | ||
| agentError: | ||
| isLimitPause && tab.agentError && isLimitError(tab.agentError) ? tab.agentError : undefined, |
There was a problem hiding this comment.
Persist only the paused tab’s limit error.
This condition currently preserves a limit error on any tab when isLimitPause is true, but the contract/comment says only the paused tab should round-trip. Persisting extra tab errors can revive stale tab-level error UI/state after restart.
Suggested fix
agentError:
- isLimitPause && tab.agentError && isLimitError(tab.agentError) ? tab.agentError : undefined,
+ isLimitPause &&
+ tab.id === session.agentErrorTabId &&
+ tab.agentError &&
+ isLimitError(tab.agentError)
+ ? tab.agentError
+ : undefined,📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Keep the paused tab's limit-pause error so it round-trips a restart; | |
| // every other tab error is transient and stripped. | |
| agentError: | |
| isLimitPause && tab.agentError && isLimitError(tab.agentError) ? tab.agentError : undefined, | |
| // Keep the paused tab's limit-pause error so it round-trips a restart; | |
| // every other tab error is transient and stripped. | |
| agentError: | |
| isLimitPause && | |
| tab.id === session.agentErrorTabId && | |
| tab.agentError && | |
| isLimitError(tab.agentError) | |
| ? tab.agentError | |
| : undefined, |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/renderer/hooks/utils/useDebouncedPersistence.ts` around lines 134 - 137,
The agentError persistence logic in the useDebouncedPersistence hook currently
preserves a limit error on any tab when isLimitPause is true, but the intention
is to preserve it only on the specific paused tab. Add an additional condition
to verify that the current tab being processed is actually the paused tab before
persisting its agentError. This prevents stale errors from non-paused tabs from
being revived after restart.
| setAutoResumeOnLimit: (value) => { | ||
| set({ autoResumeOnLimit: value }); | ||
| window.maestro.settings.set('autoResumeOnLimit', value); | ||
| }, | ||
|
|
||
| setAutoResumeCheckIntervalHours: (value) => { | ||
| set({ autoResumeCheckIntervalHours: value }); | ||
| window.maestro.settings.set('autoResumeCheckIntervalHours', value); | ||
| }, | ||
|
|
||
| setAutoResumeGiveUpDays: (value) => { | ||
| set({ autoResumeGiveUpDays: value }); | ||
| window.maestro.settings.set('autoResumeGiveUpDays', value); | ||
| }, |
There was a problem hiding this comment.
Clamp and validate auto-resume numeric settings at persistence and hydration boundaries.
autoResumeCheckIntervalHours and autoResumeGiveUpDays are written/read as raw numbers with no finite/min guard. Values from external writers (CLI/manual store edits) can be 0, negative, or non-finite, which can destabilize the coordinator cadence or cause immediate give-up behavior.
Suggested fix
setAutoResumeCheckIntervalHours: (value) => {
- set({ autoResumeCheckIntervalHours: value });
- window.maestro.settings.set('autoResumeCheckIntervalHours', value);
+ const normalized = Number.isFinite(value) ? Math.max(1, Math.floor(value)) : 2;
+ set({ autoResumeCheckIntervalHours: normalized });
+ window.maestro.settings.set('autoResumeCheckIntervalHours', normalized);
},
setAutoResumeGiveUpDays: (value) => {
- set({ autoResumeGiveUpDays: value });
- window.maestro.settings.set('autoResumeGiveUpDays', value);
+ const normalized = Number.isFinite(value) ? Math.max(1, Math.floor(value)) : 7;
+ set({ autoResumeGiveUpDays: normalized });
+ window.maestro.settings.set('autoResumeGiveUpDays', normalized);
},- if (allSettings['autoResumeCheckIntervalHours'] !== undefined)
- patch.autoResumeCheckIntervalHours = allSettings['autoResumeCheckIntervalHours'] as number;
+ if (allSettings['autoResumeCheckIntervalHours'] !== undefined) {
+ const raw = allSettings['autoResumeCheckIntervalHours'];
+ if (typeof raw === 'number' && Number.isFinite(raw)) {
+ patch.autoResumeCheckIntervalHours = Math.max(1, Math.floor(raw));
+ }
+ }
- if (allSettings['autoResumeGiveUpDays'] !== undefined)
- patch.autoResumeGiveUpDays = allSettings['autoResumeGiveUpDays'] as number;
+ if (allSettings['autoResumeGiveUpDays'] !== undefined) {
+ const raw = allSettings['autoResumeGiveUpDays'];
+ if (typeof raw === 'number' && Number.isFinite(raw)) {
+ patch.autoResumeGiveUpDays = Math.max(1, Math.floor(raw));
+ }
+ }Also applies to: 2341-2348
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/renderer/stores/settingsStore.ts` around lines 1037 - 1050, The numeric
auto-resume settings setAutoResumeCheckIntervalHours and setAutoResumeGiveUpDays
accept raw values without validation, allowing invalid values like zero,
negative numbers, or non-finite values to be persisted and cause runtime issues.
Add clamping and validation logic in both setter methods to ensure values are
positive and finite before persisting them via window.maestro.settings.set.
Additionally, apply the same validation logic at the hydration boundaries
(mentioned at lines 2341-2348) where these settings are read/initialized from
the external store to prevent invalid values from destabilizing the coordinator
or causing immediate give-up behavior.
Auto Run Summary
Documents processed:
Total tasks completed: 32
Changes
CHANGES
This PR was automatically created by Maestro Auto Run.
Summary by CodeRabbit
New Features
Documentation
Tests