diff --git a/docs/agent-guides/STATE-PATTERNS.md b/docs/agent-guides/STATE-PATTERNS.md
index b6cb2d324f..d85c0d2239 100644
--- a/docs/agent-guides/STATE-PATTERNS.md
+++ b/docs/agent-guides/STATE-PATTERNS.md
@@ -510,3 +510,27 @@ Each AI tab within a session:
| `agentError` | `AgentError?` | Per-tab error state |
**Model/effort resolution chain** (used at user-facing spawn time in `useInputProcessing` and `agentStore.processQueuedItem`): `tab.customModel ?? session.customModel ?? agentConfig.model`. The MainPanel model/effort pill writes to the active tab via `tabStore.setTabModel`/`setTabEffort` - only the Edit Agent modal mutates `session.customModel`/`customEffort`. Programmatic spawns (Auto Run batch, synopsis, Cue, group chat, fork/merge) intentionally read the session value only.
+
+## Auto-Resume On Limit
+
+When an agent pauses on a provider limit (a rate / token / credit limit - `isLimitError(err)` in `src/shared/types.ts`, true for `rate_limited` and `token_exhaustion`), Maestro can auto-resume it once the window reopens. The coordinator is a renderer singleton, `useAutoResumeCoordinator` (`src/renderer/hooks/agent/useAutoResumeCoordinator.ts`), mounted once in `App.tsx` beside the other agent listeners.
+
+**Settings** (General tab → "Auto-Resume on Limit"; metadata in `settingsMetadata.ts`, defaults in `main/stores/defaults.ts`):
+
+| Setting | Default | Meaning |
+| ------------------------------ | ------- | ------------------------------------------------------------ |
+| `autoResumeOnLimit` | `true` | Master toggle. Off = no timer, nothing scheduled. |
+| `autoResumeCheckIntervalHours` | `2` | How often the coordinator probes every limit-paused session. |
+| `autoResumeGiveUpDays` | `7` | Time-based give-up window measured from the first pause. |
+
+**Lifecycle** - limit-pause → probe → resume:
+
+1. **Pause** (Phase 2): `useAgentErrorListener` sets `agentError` + `agentErrorPaused: true` + `state: 'error'`, seeds `resumeAttemptCount`, and best-effort stamps `limitResetAt` (when the provider window reopens, via `agents:getLimitResetAt`).
+2. **Probe** (on the interval; a kickoff tick fires ~10s after mount so a restart probes fast). The coordinator selects sessions matching `isLimitPausedSession` and past their `limitResetAt`, then calls `probeAvailability`:
+ - **Claude** (local): reads a freshly re-sampled usage snapshot and returns available only when both the session and weekly windows are below `LIMIT_THRESHOLD`. Missing/unauthenticated snapshot → stay paused.
+ - **All other providers** (and **SSH-backed Claude**): no trustworthy usage signal, so availability is **unknown** and the coordinator falls back to **resume-as-probe** - the resume attempt itself is the probe; if it re-hits the limit, Phase 2 re-pauses it and the next interval retries. (The usage sampler `maestro-p --status` runs locally only and does not honor `sessionSshRemoteConfig`, so it can't describe a remote account - see `claude-usage-sampler.ts`.)
+3. **Resume**: dispatched by run kind. Spec-/goal-driven Auto Runs resolve the shared `errorResolution` promise both runners await (`resumeAfterError`); a standard query clears the error so the persisted `executionQueue` drains (re-firing any captured in-flight direct send). A green "Resumed" toast fires.
+
+**Restart behavior**: a limit pause is the ONE error state that survives an app restart. `prepareSessionForPersistence` (`useDebouncedPersistence.ts`) and `restoreSession` (`useSessionRestoration.ts`) preserve `agentError`/`agentErrorPaused`/`agentErrorTabId`/`state: 'error'` only for limit pauses (every other error stays stripped). On a cold start the coordinator re-finds the session with no extra wiring and resumes the **agent conversation** (the agent re-spawns with its native `--resume ` and the persisted queue drains). The in-memory **Auto Run / goal-run orchestration loop is NOT reconstructed** (`batchStore` is in-memory by design), so even a formerly Auto-Run session resumes via the standard queue-drain path - the agent continues from its own transcript, the loop controller does not resume.
+
+**Give-up (time-based)**: the coordinator keeps probing on the normal interval the entire window - there is NO attempt-count cap (`resumeAttemptCount` is telemetry only). Only once `limitPausedAt + autoResumeGiveUpDays` elapses does it stop retrying that session, leave it paused, and fire ONE distinct orange "Auto-resume stopped" toast. The window anchor survives a resume-then-re-hit (so "N days of repeated limits" actually elapses), and resets on any successful resume or manual clear (`clearAgentError`) so a later limit starts fresh. Any manual recovery action drops the session from consideration (its selector no longer matches).
diff --git a/docs/autorun-playbooks.md b/docs/autorun-playbooks.md
index 21f03d8ef4..c9c3ce880f 100644
--- a/docs/autorun-playbooks.md
+++ b/docs/autorun-playbooks.md
@@ -280,6 +280,18 @@ Click the **Stop** button at any time. The runner will:
- Preserve all completed work
- Allow you to resume later by clicking Run again
+## Auto-Resume on Limit
+
+If an agent pauses mid-run because it hit a provider limit (a rate, token, or credit limit), Maestro can pick the run back up on its own once the window reopens - so you can queue a batch of work, walk away, and come back to it finished. Enable it in **Settings → General → Auto-Resume on Limit**. Three settings drive it:
+
+- **Auto-Resume on Limit** (on by default) - the master toggle.
+- **Check interval** (default 2 hours) - how often Maestro re-checks each paused agent.
+- **Give up after** (default 7 days) - if an agent is still stuck this long after the first pause, Maestro stops retrying it, leaves it paused, and posts a one-time notice so you can resume manually.
+
+How it decides to resume: for Claude it reads your actual plan usage and only resumes when credits are genuinely available again; for every other provider (and Claude on an SSH remote) it simply retries on the interval - if the limit is still in force the agent re-pauses and the next check tries again. Probing is cheap, so it keeps trying the whole window.
+
+This survives a full app restart. If you reboot while an agent is limit-paused, Maestro restores the pause and resumes the **agent's conversation** (it continues from its own transcript) and drains any work you had queued. One caveat: the Auto Run / Goal-Driven **loop controller** does not survive a restart - the agent session and its queued messages resume, but the orchestration loop that was stepping through your document does not pick back up automatically. Manually resolving the error, or manually resuming or stopping the agent, always takes precedence and cancels auto-resume for that agent.
+
## Halt Marker (Agent Early Exit)
Sometimes the agent itself discovers that the rest of the playbook cannot meaningfully proceed - a missing dependency, a broken precondition, an ambiguous spec it cannot resolve, or a destructive change it refuses to make. In that case the agent can abort the entire run by writing a halt marker into the current document:
diff --git a/src/__tests__/main/agents/limitResetEstimator.test.ts b/src/__tests__/main/agents/limitResetEstimator.test.ts
new file mode 100644
index 0000000000..66daef164f
--- /dev/null
+++ b/src/__tests__/main/agents/limitResetEstimator.test.ts
@@ -0,0 +1,66 @@
+import { describe, it, expect, vi, beforeEach } from 'vitest';
+
+// Mock the usage store so the estimator can be exercised without electron-store.
+const { mockGetSnapshot } = vi.hoisted(() => ({ mockGetSnapshot: vi.fn() }));
+vi.mock('../../../main/stores/claudeUsageStore', () => ({
+ getSnapshot: (...args: unknown[]) => mockGetSnapshot(...args),
+ resolveConfigDirKey: (env: NodeJS.ProcessEnv) => env.CLAUDE_CONFIG_DIR ?? '/home/u/.claude',
+}));
+
+import { getLimitResetAt } from '../../../main/agents/limitResetEstimator';
+
+/** Build a minimal usage snapshot with the two reset windows the estimator reads. */
+function snap(sessionResetsAt: string, weekResetsAt: string) {
+ return {
+ sampledAt: new Date().toISOString(),
+ configDirKey: '/home/u/.claude',
+ session: { percent: 100, resetsAt: sessionResetsAt },
+ weekAllModels: { percent: 50, resetsAt: weekResetsAt },
+ weekSonnetOnly: { percent: 0, resetsAt: weekResetsAt },
+ };
+}
+
+describe('getLimitResetAt', () => {
+ beforeEach(() => {
+ mockGetSnapshot.mockReset();
+ });
+
+ it('returns the nearest FUTURE reset for Claude', () => {
+ const now = Date.now();
+ const soon = new Date(now + 60_000).toISOString();
+ const later = new Date(now + 3_600_000).toISOString();
+ mockGetSnapshot.mockReturnValue(snap(soon, later));
+
+ expect(getLimitResetAt('claude-code')).toBe(new Date(soon).getTime());
+ });
+
+ it('skips a past reset and returns the future one', () => {
+ const now = Date.now();
+ const past = new Date(now - 60_000).toISOString();
+ const future = new Date(now + 120_000).toISOString();
+ mockGetSnapshot.mockReturnValue(snap(past, future));
+
+ expect(getLimitResetAt('claude-code')).toBe(new Date(future).getTime());
+ });
+
+ it('returns undefined when every reset window is already in the past (stale/expired)', () => {
+ const now = Date.now();
+ const past1 = new Date(now - 120_000).toISOString();
+ const past2 = new Date(now - 60_000).toISOString();
+ mockGetSnapshot.mockReturnValue(snap(past1, past2));
+
+ expect(getLimitResetAt('claude-code')).toBeUndefined();
+ });
+
+ it('returns undefined when no snapshot is cached', () => {
+ mockGetSnapshot.mockReturnValue(null);
+
+ expect(getLimitResetAt('claude-code')).toBeUndefined();
+ });
+
+ it('returns undefined for non-Claude providers without touching the store', () => {
+ expect(getLimitResetAt('codex')).toBeUndefined();
+ expect(getLimitResetAt('opencode')).toBeUndefined();
+ expect(mockGetSnapshot).not.toHaveBeenCalled();
+ });
+});
diff --git a/src/__tests__/main/ipc/handlers/agents.test.ts b/src/__tests__/main/ipc/handlers/agents.test.ts
index 85d8ca415c..d0cd5871c8 100644
--- a/src/__tests__/main/ipc/handlers/agents.test.ts
+++ b/src/__tests__/main/ipc/handlers/agents.test.ts
@@ -185,6 +185,7 @@ describe('agents IPC handlers', () => {
'agents:getRemoteMaestroPAvailable',
'agents:getClaudeUsageSnapshots',
'agents:getClaudeUsageAccountKeys',
+ 'agents:getLimitResetAt',
'claude:usage:refresh-all',
'agents:getCodexUsageSnapshots',
'agents:getCodexUsageAccountKeys',
diff --git a/src/__tests__/main/stores/defaults.test.ts b/src/__tests__/main/stores/defaults.test.ts
index 72adfc7017..610fb48e22 100644
--- a/src/__tests__/main/stores/defaults.test.ts
+++ b/src/__tests__/main/stores/defaults.test.ts
@@ -149,6 +149,18 @@ describe('stores/defaults', () => {
it('should have null installationId by default', () => {
expect(SETTINGS_DEFAULTS.installationId).toBeNull();
});
+
+ it('should enable autoResumeOnLimit by default', () => {
+ expect(SETTINGS_DEFAULTS.autoResumeOnLimit).toBe(true);
+ });
+
+ it('should default autoResumeCheckIntervalHours to 2', () => {
+ expect(SETTINGS_DEFAULTS.autoResumeCheckIntervalHours).toBe(2);
+ });
+
+ it('should default autoResumeGiveUpDays to 7', () => {
+ expect(SETTINGS_DEFAULTS.autoResumeGiveUpDays).toBe(7);
+ });
});
describe('SESSIONS_DEFAULTS', () => {
diff --git a/src/__tests__/renderer/components/Settings/searchableSettings.test.ts b/src/__tests__/renderer/components/Settings/searchableSettings.test.ts
index 5e422e4c3c..288a1d5648 100644
--- a/src/__tests__/renderer/components/Settings/searchableSettings.test.ts
+++ b/src/__tests__/renderer/components/Settings/searchableSettings.test.ts
@@ -134,6 +134,10 @@ describe('searchableSettings', () => {
// General tab
['Auto Run Inactivity Timeout', 'general-autorun-inactivity-timeout'],
['refactor', 'general-autorun-inactivity-timeout'],
+ ['resume paused', 'general-auto-resume'],
+ ['rate limit', 'general-auto-resume'],
+ ['quota', 'general-auto-resume'],
+ ['give up', 'general-auto-resume-interval'],
['forced parallel execution', 'general-input-behavior'],
['shift+enter', 'general-input-behavior'],
['prompt composer', 'general-input-behavior'],
diff --git a/src/__tests__/renderer/hooks/agent/internal/useAgentErrorListener.test.tsx b/src/__tests__/renderer/hooks/agent/internal/useAgentErrorListener.test.tsx
index 8d7752ad78..471f94c318 100644
--- a/src/__tests__/renderer/hooks/agent/internal/useAgentErrorListener.test.tsx
+++ b/src/__tests__/renderer/hooks/agent/internal/useAgentErrorListener.test.tsx
@@ -1,5 +1,5 @@
import { describe, it, expect, vi, beforeEach } from 'vitest';
-import { renderHook } from '@testing-library/react';
+import { renderHook, waitFor } from '@testing-library/react';
import { useAgentErrorListener } from '../../../../../renderer/hooks/agent/internal/useAgentErrorListener';
import { useSessionStore } from '../../../../../renderer/stores/sessionStore';
import { useModalStore } from '../../../../../renderer/stores/modalStore';
@@ -191,6 +191,44 @@ describe('useAgentErrorListener', () => {
expect(deps.activeHiddenToolRef.current.has('sess-1:tab-1')).toBe(false);
});
+ it('seeds auto-resume metadata and stashes the last prompt on a limit error', async () => {
+ const getLimitResetAt = vi.fn().mockResolvedValue(1700000123456);
+ (window as any).maestro.agents = { getLimitResetAt };
+
+ const userLog = {
+ id: 'log-user',
+ timestamp: 100,
+ source: 'user' as const,
+ text: 'do the rate-limited thing',
+ };
+ const tab = createMockAITab({ id: 'tab-1', logs: [userLog] });
+ const session = createMockSession({ id: 'sess-1', aiTabs: [tab], activeTabId: 'tab-1' });
+ useSessionStore.setState({ sessions: [session] } as any);
+
+ renderHook(() => useAgentErrorListener(makeDeps()));
+ handler!('sess-1-ai-tab-1', { ...baseError, type: 'rate_limited', message: 'usage limit' });
+
+ // Synchronous pause: error stamped, paused, retry counter seeded to 0.
+ const paused = useSessionStore.getState().sessions[0];
+ expect(paused.state).toBe('error');
+ expect(paused.agentErrorPaused).toBe(true);
+ expect(paused.agentError?.type).toBe('rate_limited');
+ expect(paused.agentError?.resumeAttemptCount).toBe(0);
+
+ // The captured prompt rides along on the error log so Phase 3 can re-fire it.
+ const errorLog = paused.aiTabs[0].logs.find((l) => l.source === 'error');
+ expect(errorLog?.recoveryAction).toEqual({
+ lastUserPrompt: 'do the rate-limited thing',
+ tabId: 'tab-1',
+ });
+
+ // Best-effort reset estimate is patched in asynchronously.
+ expect(getLimitResetAt).toHaveBeenCalledWith('claude-code');
+ await waitFor(() => {
+ expect(useSessionStore.getState().sessions[0].agentError?.limitResetAt).toBe(1700000123456);
+ });
+ });
+
it('skips synopsis-process errors', () => {
const tab = createMockAITab({ id: 'tab-1' });
const session = createMockSession({ id: 'sess-1', aiTabs: [tab] });
diff --git a/src/__tests__/renderer/hooks/agent/useAutoResumeCoordinator.test.ts b/src/__tests__/renderer/hooks/agent/useAutoResumeCoordinator.test.ts
new file mode 100644
index 0000000000..198fe4ff70
--- /dev/null
+++ b/src/__tests__/renderer/hooks/agent/useAutoResumeCoordinator.test.ts
@@ -0,0 +1,649 @@
+import { describe, it, expect, beforeEach, afterEach, vi } from 'vitest';
+import { renderHook, act } from '@testing-library/react';
+import {
+ useAutoResumeCoordinator,
+ runAutoResumeTick,
+ probeAvailability,
+ isLimitPausedSession,
+ isEligibleToProbe,
+} from '../../../../renderer/hooks/agent/useAutoResumeCoordinator';
+import { useSessionStore } from '../../../../renderer/stores/sessionStore';
+import { useAgentStore } from '../../../../renderer/stores/agentStore';
+import { useBatchStore } from '../../../../renderer/stores/batchStore';
+import { useSettingsStore } from '../../../../renderer/stores/settingsStore';
+import { useNotificationStore } from '../../../../renderer/stores/notificationStore';
+import {
+ useClaudeUsageStore,
+ type ClaudeUsageSnapshot,
+} from '../../../../renderer/stores/claudeUsageStore';
+import { DEFAULT_BATCH_STATE } from '../../../../renderer/hooks/batch/batchReducer';
+import { createMockSession } from '../../../helpers/mockSession';
+import { createMockAITab } from '../../../helpers/mockTab';
+import type { AgentError, Session, BatchRunState, LogEntry } from '../../../../renderer/types';
+
+// ---------------------------------------------------------------------------
+// Test helpers
+// ---------------------------------------------------------------------------
+
+const DAY = 24 * 60 * 60 * 1000;
+
+/** Flush pending microtasks + the per-session fire-and-forget resume IIFEs. */
+async function flush(): Promise {
+ await new Promise((resolve) => setTimeout(resolve, 0));
+ await Promise.resolve();
+}
+
+function makeLimitError(overrides: Partial = {}): AgentError {
+ return {
+ type: 'rate_limited',
+ message: 'Rate limited',
+ recoverable: true,
+ agentId: 'claude-code',
+ // Default to "just now" so the give-up window (default 7 days) is nowhere
+ // near elapsed - give-up tests pin their own timestamp + `now` explicitly.
+ timestamp: Date.now(),
+ resumeAttemptCount: 0,
+ ...overrides,
+ };
+}
+
+function makeLimitPausedSession(overrides: Partial = {}): Session {
+ const error = (overrides.agentError as AgentError | undefined) ?? makeLimitError();
+ const logs = (overrides as { __logs?: LogEntry[] }).__logs ?? [];
+ return createMockSession({
+ id: 'sess-claude',
+ toolType: 'claude-code',
+ name: 'My Agent',
+ state: 'error',
+ agentErrorPaused: true,
+ agentError: error,
+ agentErrorTabId: 'tab-1',
+ aiTabs: [createMockAITab({ id: 'tab-1', agentError: error, logs })],
+ ...overrides,
+ });
+}
+
+function makeSnapshot(sessionPercent: number, weekPercent: number): ClaudeUsageSnapshot {
+ const future = '2099-01-01T00:00:00.000Z';
+ return {
+ sampledAt: future,
+ configDirKey: '/home/.claude',
+ authState: 'authenticated',
+ session: { percent: sessionPercent, resetsAt: future },
+ weekAllModels: { percent: weekPercent, resetsAt: future },
+ weekSonnetOnly: { percent: 0, resetsAt: future },
+ };
+}
+
+/** Snapshot map returned by the mocked getClaudeUsageSnapshots IPC. */
+let claudeSnapshotMap: Record = {};
+
+function setSessions(sessions: Session[]): void {
+ useSessionStore.getState().setSessions(sessions);
+}
+
+beforeEach(() => {
+ claudeSnapshotMap = {};
+ useSessionStore.getState().setSessions([]);
+ useBatchStore.setState({ batchRunStates: {}, customPrompts: {} });
+ useNotificationStore.getState().clearToasts();
+ useClaudeUsageStore.getState().__resetForTests();
+ useSettingsStore.setState({ autoResumeOnLimit: true, autoResumeCheckIntervalHours: 2 });
+
+ (window as unknown as { maestro: unknown }).maestro = {
+ agents: {
+ refreshClaudeUsageSnapshots: vi.fn().mockResolvedValue({ refreshed: 1 }),
+ getClaudeUsageSnapshots: vi.fn().mockImplementation(async () => claudeSnapshotMap),
+ },
+ agentError: {
+ clearError: vi.fn().mockResolvedValue(undefined),
+ },
+ logger: { toast: vi.fn(), log: vi.fn(), autorun: vi.fn() },
+ // `.show` must return a promise: notifyToast chains `.catch` on it.
+ notification: { show: vi.fn().mockResolvedValue(undefined), speak: vi.fn() },
+ };
+});
+
+afterEach(() => {
+ vi.useRealTimers();
+ vi.restoreAllMocks();
+});
+
+// ---------------------------------------------------------------------------
+// Pure predicates
+// ---------------------------------------------------------------------------
+
+describe('isLimitPausedSession / isEligibleToProbe', () => {
+ it('recognizes a rate-limited paused session', () => {
+ expect(isLimitPausedSession(makeLimitPausedSession())).toBe(true);
+ });
+
+ it('rejects a non-limit error pause', () => {
+ const s = makeLimitPausedSession({ agentError: makeLimitError({ type: 'auth_expired' }) });
+ expect(isLimitPausedSession(s)).toBe(false);
+ });
+
+ it('rejects an idle session', () => {
+ expect(isLimitPausedSession(createMockSession({ state: 'idle' }))).toBe(false);
+ });
+
+ it('skips a session whose limitResetAt is still in the future', () => {
+ const now = 10_000;
+ const s = makeLimitPausedSession({ agentError: makeLimitError({ limitResetAt: now + 5_000 }) });
+ expect(isEligibleToProbe(s, now)).toBe(false);
+ });
+
+ it('allows a session whose limitResetAt has passed', () => {
+ const now = 10_000;
+ const s = makeLimitPausedSession({ agentError: makeLimitError({ limitResetAt: now - 1 }) });
+ expect(isEligibleToProbe(s, now)).toBe(true);
+ });
+
+ it('allows a session with unknown limitResetAt', () => {
+ expect(isEligibleToProbe(makeLimitPausedSession(), 10_000)).toBe(true);
+ });
+});
+
+// ---------------------------------------------------------------------------
+// probeAvailability
+// ---------------------------------------------------------------------------
+
+describe('probeAvailability', () => {
+ it('returns true for non-Claude providers (resume-as-probe)', async () => {
+ const s = makeLimitPausedSession({ toolType: 'opencode' });
+ await expect(probeAvailability(s)).resolves.toBe(true);
+ });
+
+ it('returns true when both Claude windows are below the limit threshold', async () => {
+ useClaudeUsageStore.getState().setSnapshots({ '/home/.claude': makeSnapshot(40, 50) });
+ await expect(probeAvailability(makeLimitPausedSession())).resolves.toBe(true);
+ });
+
+ it('returns false when a Claude window is at/above the limit threshold', async () => {
+ useClaudeUsageStore.getState().setSnapshots({ '/home/.claude': makeSnapshot(100, 50) });
+ await expect(probeAvailability(makeLimitPausedSession())).resolves.toBe(false);
+ });
+
+ it('returns false when no snapshot is available', async () => {
+ await expect(probeAvailability(makeLimitPausedSession())).resolves.toBe(false);
+ });
+
+ it('returns false for an unauthenticated account', async () => {
+ const snap = { ...makeSnapshot(10, 10), authState: 'unauthenticated' as const };
+ useClaudeUsageStore.getState().setSnapshots({ '/home/.claude': snap });
+ await expect(probeAvailability(makeLimitPausedSession())).resolves.toBe(false);
+ });
+
+ it('SSH-backed Claude session ignores the (local) snapshot and falls back to the interval attempt', async () => {
+ // The local snapshot says OVER the limit, but for an SSH session it's the
+ // wrong account/machine - the probe must treat availability as unknown and
+ // return true (resume-as-probe on the remote) rather than read local numbers.
+ useClaudeUsageStore.getState().setSnapshots({ '/home/.claude': makeSnapshot(100, 100) });
+ const s = makeLimitPausedSession({
+ sessionSshRemoteConfig: { enabled: true, remoteId: 'remote-1' },
+ });
+ await expect(probeAvailability(s)).resolves.toBe(true);
+ });
+
+ it('local (SSH-disabled) Claude session still uses the snapshot', async () => {
+ useClaudeUsageStore.getState().setSnapshots({ '/home/.claude': makeSnapshot(100, 100) });
+ const s = makeLimitPausedSession({
+ sessionSshRemoteConfig: { enabled: false, remoteId: null },
+ });
+ await expect(probeAvailability(s)).resolves.toBe(false);
+ });
+});
+
+// ---------------------------------------------------------------------------
+// runAutoResumeTick — resume behavior
+// ---------------------------------------------------------------------------
+
+describe('runAutoResumeTick', () => {
+ it('(b) resumes an eligible Claude session when probe=true and fires a toast', async () => {
+ claudeSnapshotMap = { '/home/.claude': makeSnapshot(20, 30) };
+ setSessions([makeLimitPausedSession()]);
+ const resumeAutoRun = vi.fn();
+
+ await runAutoResumeTick(new Set(), resumeAutoRun);
+ await flush();
+
+ // Standard (no batch) path: error cleared, session idle.
+ const after = useSessionStore.getState().sessions[0];
+ expect(after.state).toBe('idle');
+ expect(after.agentError).toBeUndefined();
+
+ const toasts = useNotificationStore.getState().toasts;
+ expect(toasts).toHaveLength(1);
+ expect(toasts[0].title).toBe('Resumed');
+ expect(toasts[0].color).toBe('green');
+ expect(toasts[0].message).toContain('My Agent');
+ expect(toasts[0].clickAction).toEqual({
+ kind: 'jump-session',
+ sessionId: 'sess-claude',
+ tabId: 'tab-1',
+ });
+ });
+
+ it('(c) leaves a Claude session paused and fires no toast when probe=false', async () => {
+ claudeSnapshotMap = { '/home/.claude': makeSnapshot(100, 100) };
+ setSessions([makeLimitPausedSession()]);
+ const resumeAutoRun = vi.fn();
+
+ await runAutoResumeTick(new Set(), resumeAutoRun);
+ await flush();
+
+ const after = useSessionStore.getState().sessions[0];
+ expect(after.state).toBe('error');
+ expect(after.agentError).toBeDefined();
+ expect(resumeAutoRun).not.toHaveBeenCalled();
+ expect(useNotificationStore.getState().toasts).toHaveLength(0);
+ });
+
+ it('(d) attempts a resume for a non-Claude session on the interval (no probe signal)', async () => {
+ setSessions([
+ makeLimitPausedSession({ id: 'sess-oc', toolType: 'opencode', name: 'OC Agent' }),
+ ]);
+ const resumeAutoRun = vi.fn();
+
+ await runAutoResumeTick(new Set(), resumeAutoRun);
+ await flush();
+
+ // No batch run → standard path → error cleared.
+ const after = useSessionStore.getState().sessions[0];
+ expect(after.state).toBe('idle');
+ expect(after.agentError).toBeUndefined();
+ expect(useNotificationStore.getState().toasts).toHaveLength(1);
+ // Claude usage is never re-sampled when no candidate is a Claude agent.
+ const maestro = (
+ window as unknown as {
+ maestro: { agents: { refreshClaudeUsageSnapshots: ReturnType } };
+ }
+ ).maestro;
+ expect(maestro.agents.refreshClaudeUsageSnapshots).not.toHaveBeenCalled();
+ });
+
+ it('routes an error-paused batch run through resumeAutoRunAfterError', async () => {
+ setSessions([makeLimitPausedSession({ id: 'sess-batch', toolType: 'opencode' })]);
+ useBatchStore.setState({
+ batchRunStates: {
+ 'sess-batch': {
+ ...DEFAULT_BATCH_STATE,
+ isRunning: true,
+ errorPaused: true,
+ } as BatchRunState,
+ },
+ customPrompts: {},
+ });
+ const resumeAutoRun = vi.fn();
+
+ await runAutoResumeTick(new Set(), resumeAutoRun);
+ await flush();
+
+ expect(resumeAutoRun).toHaveBeenCalledTimes(1);
+ expect(resumeAutoRun).toHaveBeenCalledWith('sess-batch');
+ expect(useNotificationStore.getState().toasts).toHaveLength(1);
+ });
+
+ it('(e) skips a session whose limitResetAt is in the future', async () => {
+ const future = Date.now() + 60 * 60 * 1000;
+ setSessions([
+ makeLimitPausedSession({
+ toolType: 'opencode',
+ agentError: makeLimitError({ limitResetAt: future }),
+ }),
+ ]);
+ const resumeAutoRun = vi.fn();
+
+ await runAutoResumeTick(new Set(), resumeAutoRun);
+ await flush();
+
+ const after = useSessionStore.getState().sessions[0];
+ expect(after.state).toBe('error');
+ expect(resumeAutoRun).not.toHaveBeenCalled();
+ expect(useNotificationStore.getState().toasts).toHaveLength(0);
+ });
+
+ it('(f) does not start a second resume for a session already mid-resume', async () => {
+ setSessions([makeLimitPausedSession({ id: 'sess-batch', toolType: 'opencode' })]);
+ useBatchStore.setState({
+ batchRunStates: {
+ 'sess-batch': {
+ ...DEFAULT_BATCH_STATE,
+ isRunning: true,
+ errorPaused: true,
+ } as BatchRunState,
+ },
+ customPrompts: {},
+ });
+ const resumeAutoRun = vi.fn();
+
+ // Shared in-flight set across two back-to-back ticks: the second tick's
+ // synchronous loop must observe the id the first tick added and skip.
+ const inFlight = new Set();
+ const p1 = runAutoResumeTick(inFlight, resumeAutoRun);
+ const p2 = runAutoResumeTick(inFlight, resumeAutoRun);
+ await Promise.all([p1, p2]);
+ await flush();
+
+ expect(resumeAutoRun).toHaveBeenCalledTimes(1);
+ });
+
+ it('stamps limitPausedAt (seeded from the error timestamp) on first observation', async () => {
+ claudeSnapshotMap = { '/home/.claude': makeSnapshot(100, 100) }; // probe=false → stays paused
+ setSessions([makeLimitPausedSession({ agentError: makeLimitError({ timestamp: 4242 }) })]);
+
+ // now just after the pause so the give-up window is nowhere near elapsed.
+ await runAutoResumeTick(new Set(), vi.fn(), { now: 5242 });
+ await flush();
+
+ const after = useSessionStore.getState().sessions[0];
+ expect(after.agentError?.limitPausedAt).toBe(4242);
+ });
+
+ it('increments resumeAttemptCount before a batch resume', async () => {
+ setSessions([
+ makeLimitPausedSession({
+ id: 'sess-batch',
+ toolType: 'opencode',
+ agentError: makeLimitError({ resumeAttemptCount: 2 }),
+ }),
+ ]);
+ useBatchStore.setState({
+ batchRunStates: {
+ 'sess-batch': {
+ ...DEFAULT_BATCH_STATE,
+ isRunning: true,
+ errorPaused: true,
+ } as BatchRunState,
+ },
+ customPrompts: {},
+ });
+ let countAtResume: number | undefined;
+ const resumeAutoRun = vi.fn((sessionId: string) => {
+ countAtResume = useSessionStore.getState().sessions.find((s) => s.id === sessionId)
+ ?.agentError?.resumeAttemptCount;
+ });
+
+ await runAutoResumeTick(new Set(), resumeAutoRun);
+ await flush();
+
+ expect(countAtResume).toBe(3);
+ });
+
+ it('re-fires a captured in-flight direct send by enqueueing it', async () => {
+ const captureLog: LogEntry = {
+ id: 'log-err',
+ timestamp: 1000,
+ source: 'error',
+ text: 'Rate limited',
+ recoveryAction: { lastUserPrompt: 'continue please', tabId: 'tab-1' },
+ };
+ claudeSnapshotMap = { '/home/.claude': makeSnapshot(10, 10) };
+ const error = makeLimitError();
+ setSessions([
+ createMockSession({
+ id: 'sess-claude',
+ toolType: 'claude-code',
+ state: 'error',
+ agentErrorPaused: true,
+ agentError: error,
+ agentErrorTabId: 'tab-1',
+ aiTabs: [createMockAITab({ id: 'tab-1', agentError: error, logs: [captureLog] })],
+ }),
+ ]);
+
+ await runAutoResumeTick(new Set(), vi.fn());
+ await flush();
+
+ const after = useSessionStore.getState().sessions[0];
+ expect(after.executionQueue).toHaveLength(1);
+ expect(after.executionQueue[0]).toMatchObject({
+ type: 'message',
+ text: 'continue please',
+ tabId: 'tab-1',
+ });
+ // recoveryAction consumed so it fires only once.
+ expect(after.aiTabs[0].logs[0].recoveryAction).toBeUndefined();
+ });
+});
+
+// ---------------------------------------------------------------------------
+// Restart re-attachment (Phase 4): a persisted limit pause survives a cold
+// start (see useDebouncedPersistence + useSessionRestoration round-trip tests)
+// and the coordinator must pick it up on its first tick with no extra wiring.
+// ---------------------------------------------------------------------------
+
+describe('restart re-attachment', () => {
+ it('considers a restored limit-paused session on the first (kickoff) tick and resumes it via the standard queue-drain path', async () => {
+ vi.useFakeTimers();
+ // Post-restart conditions: batchStore is in-memory and is NOT
+ // reconstructed on a cold start, so a session that paused mid Auto Run
+ // has no batch state here. The session itself comes back shaped exactly
+ // like restoreSession leaves a limit pause (state 'error', paused, error
+ // intact). opencode = resume-as-probe so the tick is deterministic.
+ useBatchStore.setState({ batchRunStates: {}, customPrompts: {} });
+ setSessions([
+ makeLimitPausedSession({ id: 'sess-oc', toolType: 'opencode', name: 'OC Agent' }),
+ ]);
+ const resumeAutoRun = vi.fn();
+
+ // The restored session matches the coordinator's selector with no extra wiring.
+ expect(isLimitPausedSession(useSessionStore.getState().sessions[0])).toBe(true);
+
+ renderHook(() => useAutoResumeCoordinator({ resumeAutoRunAfterError: resumeAutoRun }));
+ await act(async () => {
+ await vi.advanceTimersByTimeAsync(11_000); // past the 10s kickoff tick
+ });
+
+ const after = useSessionStore.getState().sessions[0];
+ // Standard path: error cleared, session idle, persisted queue drains and
+ // the agent resumes its own transcript via the native --resume.
+ expect(after.state).toBe('idle');
+ expect(after.agentError).toBeUndefined();
+ // The orchestration LOOP did NOT resume on a cold start - no batch state survived.
+ expect(resumeAutoRun).not.toHaveBeenCalled();
+ expect(useNotificationStore.getState().toasts).toHaveLength(1);
+ });
+});
+
+// ---------------------------------------------------------------------------
+// Respect manual user action
+// ---------------------------------------------------------------------------
+
+describe('respect manual user action', () => {
+ it('drops a session from consideration once the user clears the error (clearAgentError)', async () => {
+ setSessions([makeLimitPausedSession()]);
+ // Sanity: it WAS a candidate before the user acted.
+ expect(isLimitPausedSession(useSessionStore.getState().sessions[0])).toBe(true);
+
+ // A manual recovery action / retry / start-new / restart all funnel through
+ // clearAgentError, which resets the exact fields the coordinator selects on.
+ act(() => {
+ useAgentStore.getState().clearAgentError('sess-claude');
+ });
+
+ const cleared = useSessionStore.getState().sessions[0];
+ expect(cleared.agentError).toBeUndefined();
+ expect(cleared.agentErrorPaused).toBe(false);
+ expect(cleared.state).toBe('idle');
+ expect(isLimitPausedSession(cleared)).toBe(false);
+
+ // The coordinator no longer touches it: no probe, no resume, no toast.
+ const resumeAutoRun = vi.fn();
+ await runAutoResumeTick(new Set(), resumeAutoRun);
+ await flush();
+ expect(resumeAutoRun).not.toHaveBeenCalled();
+ expect(useNotificationStore.getState().toasts).toHaveLength(0);
+ });
+});
+
+// ---------------------------------------------------------------------------
+// Give up (time-based, off autoResumeGiveUpDays)
+// ---------------------------------------------------------------------------
+
+describe('give up (time-based)', () => {
+ it('parks the session, fires one distinct toast, and stops resuming after the cutoff', async () => {
+ const pausedAt = 1_000_000;
+ const now = pausedAt + 8 * DAY; // past the default 7-day window
+ setSessions([
+ makeLimitPausedSession({
+ id: 'sess-oc',
+ toolType: 'opencode',
+ name: 'OC Agent',
+ agentError: makeLimitError({ timestamp: pausedAt }),
+ }),
+ ]);
+ const resumeAutoRun = vi.fn();
+
+ await runAutoResumeTick(new Set(), resumeAutoRun, { giveUp: new Map(), giveUpDays: 7, now });
+ await flush();
+
+ const after = useSessionStore.getState().sessions[0];
+ expect(after.state).toBe('error'); // left paused
+ expect(after.agentError).toBeDefined();
+ expect(resumeAutoRun).not.toHaveBeenCalled(); // NOT resumed
+
+ const toasts = useNotificationStore.getState().toasts;
+ expect(toasts).toHaveLength(1);
+ expect(toasts[0].title).toBe('Auto-resume stopped');
+ expect(toasts[0].color).toBe('orange');
+ expect(toasts[0].message).toContain('7 days');
+ expect(toasts[0].message).toContain('OC Agent');
+ });
+
+ it('keeps retrying (and resumes) inside the give-up window', async () => {
+ const pausedAt = 1_000_000;
+ const now = pausedAt + 3 * DAY; // well within 7 days
+ setSessions([
+ makeLimitPausedSession({
+ id: 'sess-oc',
+ toolType: 'opencode',
+ agentError: makeLimitError({ timestamp: pausedAt }),
+ }),
+ ]);
+ const resumeAutoRun = vi.fn();
+
+ await runAutoResumeTick(new Set(), resumeAutoRun, { giveUp: new Map(), giveUpDays: 7, now });
+ await flush();
+
+ const after = useSessionStore.getState().sessions[0];
+ expect(after.state).toBe('idle'); // resumed via standard path
+ expect(after.agentError).toBeUndefined();
+ expect(useNotificationStore.getState().toasts[0]?.title).toBe('Resumed');
+ });
+
+ it('fires the give-up toast only once across repeated ticks', async () => {
+ const pausedAt = 1_000_000;
+ const now = pausedAt + 8 * DAY;
+ setSessions([
+ makeLimitPausedSession({
+ id: 'sess-oc',
+ toolType: 'opencode',
+ agentError: makeLimitError({ timestamp: pausedAt }),
+ }),
+ ]);
+ const giveUp = new Map();
+
+ await runAutoResumeTick(new Set(), vi.fn(), { giveUp, giveUpDays: 7, now });
+ await runAutoResumeTick(new Set(), vi.fn(), { giveUp, giveUpDays: 7, now: now + DAY });
+ await flush();
+
+ expect(useNotificationStore.getState().toasts).toHaveLength(1);
+ });
+
+ it('measures the window from the original pause, surviving a resume-then-re-hit', async () => {
+ const original = 1_000_000;
+ const now = original + 8 * DAY;
+ // Pre-seeded as if an earlier tick anchored the window at the first pause.
+ const giveUp = new Map([['sess-oc', { anchor: original, toastFired: false }]]);
+ // The live error is a fresh RE-HIT - on its own timestamp it'd be deep
+ // inside the window, but the preserved anchor wins and we still give up.
+ setSessions([
+ makeLimitPausedSession({
+ id: 'sess-oc',
+ toolType: 'opencode',
+ agentError: makeLimitError({ timestamp: now - 60_000 }),
+ }),
+ ]);
+ const resumeAutoRun = vi.fn();
+
+ await runAutoResumeTick(new Set(), resumeAutoRun, { giveUp, giveUpDays: 7, now });
+ await flush();
+
+ const after = useSessionStore.getState().sessions[0];
+ expect(after.state).toBe('error');
+ expect(resumeAutoRun).not.toHaveBeenCalled();
+ expect(useNotificationStore.getState().toasts[0]?.title).toBe('Auto-resume stopped');
+ });
+
+ it('forgets the window once the session is no longer limit-paused (fresh window later)', async () => {
+ const giveUp = new Map([['sess-oc', { anchor: 1_000_000, toastFired: true }]]);
+ // Session resumed successfully: idle, not limit-paused.
+ setSessions([createMockSession({ id: 'sess-oc', toolType: 'opencode', state: 'idle' })]);
+
+ await runAutoResumeTick(new Set(), vi.fn(), { giveUp, giveUpDays: 7, now: 2_000_000 });
+ await flush();
+
+ expect(giveUp.has('sess-oc')).toBe(false);
+ });
+
+ it('does not give up on attempt count alone within the window', async () => {
+ const pausedAt = 1_000_000;
+ const now = pausedAt + 2 * DAY; // within window
+ setSessions([
+ makeLimitPausedSession({
+ id: 'sess-oc',
+ toolType: 'opencode',
+ agentError: makeLimitError({ timestamp: pausedAt, resumeAttemptCount: 999 }),
+ }),
+ ]);
+ const resumeAutoRun = vi.fn();
+
+ await runAutoResumeTick(new Set(), resumeAutoRun, { giveUp: new Map(), giveUpDays: 7, now });
+ await flush();
+
+ const after = useSessionStore.getState().sessions[0];
+ expect(after.state).toBe('idle'); // resumed despite 999 attempts - probing is cheap
+ expect(useNotificationStore.getState().toasts[0]?.title).toBe('Resumed');
+ });
+});
+
+// ---------------------------------------------------------------------------
+// Hook timer wiring
+// ---------------------------------------------------------------------------
+
+describe('useAutoResumeCoordinator (timer)', () => {
+ it('(a) schedules no timer and resumes nothing when the setting is disabled', () => {
+ vi.useFakeTimers();
+ useSettingsStore.setState({ autoResumeOnLimit: false, autoResumeCheckIntervalHours: 2 });
+ setSessions([makeLimitPausedSession({ toolType: 'opencode' })]);
+ const resumeAutoRun = vi.fn();
+
+ renderHook(() => useAutoResumeCoordinator({ resumeAutoRunAfterError: resumeAutoRun }));
+
+ act(() => {
+ vi.advanceTimersByTime(3 * 60 * 60 * 1000); // 3h: past kickoff + a 2h interval
+ });
+
+ const after = useSessionStore.getState().sessions[0];
+ expect(after.state).toBe('error');
+ expect(resumeAutoRun).not.toHaveBeenCalled();
+ expect(useNotificationStore.getState().toasts).toHaveLength(0);
+ });
+
+ it('fires a kickoff tick shortly after mount when enabled', async () => {
+ vi.useFakeTimers();
+ useSettingsStore.setState({ autoResumeOnLimit: true, autoResumeCheckIntervalHours: 2 });
+ setSessions([makeLimitPausedSession({ id: 'sess-oc', toolType: 'opencode' })]);
+ const resumeAutoRun = vi.fn();
+
+ renderHook(() => useAutoResumeCoordinator({ resumeAutoRunAfterError: resumeAutoRun }));
+
+ await act(async () => {
+ await vi.advanceTimersByTimeAsync(11_000); // past the 10s kickoff
+ });
+
+ // Standard non-Claude session resumed → error cleared.
+ expect(useSessionStore.getState().sessions[0].state).toBe('idle');
+ });
+});
diff --git a/src/__tests__/renderer/hooks/batch/useGoalRunner.test.ts b/src/__tests__/renderer/hooks/batch/useGoalRunner.test.ts
index 3441d471c8..773b17c909 100644
--- a/src/__tests__/renderer/hooks/batch/useGoalRunner.test.ts
+++ b/src/__tests__/renderer/hooks/batch/useGoalRunner.test.ts
@@ -12,6 +12,7 @@ import { renderHook, act, waitFor } from '@testing-library/react';
import type { Session, Group, BatchRunConfig } from '../../../../renderer/types';
import { useBatchProcessor } from '../../../../renderer/hooks';
import { useSettingsStore } from '../../../../renderer/stores/settingsStore';
+import { useSessionStore } from '../../../../renderer/stores/sessionStore';
import { createMockSession as baseCreateMockSession } from '../../../helpers/mockSession';
import { GOAL_RUN_HARD_ITERATION_CAP } from '../../../../shared/goalDriven/types';
@@ -669,6 +670,87 @@ describe('useGoalRunner (Goal-Driven Auto Run engine)', () => {
expect(prompt).toContain('Iteration: 00001');
});
+ it('pauses on a limit error without consuming an iteration, then resumes the same iteration', async () => {
+ // Start each run from a clean session store so the limit re-read is deterministic.
+ useSessionStore.setState({ sessions: [] } as any);
+
+ let spawnCalls = 0;
+ mockOnSpawnAgent.mockImplementation(async () => {
+ spawnCalls++;
+ if (spawnCalls === 1) {
+ // Mimic the agent-error listener stamping the session into the
+ // limit-paused state that the goal runner re-reads from the store.
+ useSessionStore.setState({
+ sessions: [
+ {
+ ...createMockSession(),
+ agentError: {
+ type: 'rate_limited',
+ message: 'Usage limit reached',
+ recoverable: true,
+ agentId: 'claude-code',
+ timestamp: Date.now(),
+ },
+ },
+ ],
+ } as any);
+ return { success: false, error: 'Usage limit reached' };
+ }
+ // The retried (same) iteration succeeds and completes the goal.
+ return {
+ success: true,
+ agentSessionId: 'goal-agent',
+ response: progressResponse(100, 'done'),
+ };
+ });
+
+ const { result } = renderProcessor([createMockSession()], [createMockGroup()]);
+
+ let finished = false;
+ act(() => {
+ void result.current
+ .startBatchRun(SESSION_ID, goalConfig('Ship it', 'Done when X', null), '/test/folder')
+ .then(() => {
+ finished = true;
+ });
+ });
+
+ // First spawn hit the limit; the loop parks awaiting an unblock signal.
+ await waitFor(() => {
+ expect(mockOnSpawnAgent).toHaveBeenCalledTimes(1);
+ });
+ expect(finished).toBe(false);
+ expect(result.current.getBatchState(SESSION_ID).isRunning).toBe(true);
+
+ // The coordinator (or the user's Resume button) unblocks the run.
+ act(() => {
+ result.current.resumeAfterError(SESSION_ID);
+ });
+
+ await waitFor(() => {
+ expect(finished).toBe(true);
+ });
+
+ // Two spawns total: the limited attempt + the retried successful attempt.
+ expect(mockOnSpawnAgent).toHaveBeenCalledTimes(2);
+
+ // Exactly ONE per-iteration progress entry: the limited attempt did not
+ // consume an iteration (no failed-iteration entry was recorded for it).
+ const progressEntries = mockOnAddHistoryEntry.mock.calls
+ .map((c) => c[0])
+ .filter((e) => typeof e?.summary === 'string' && e.summary.startsWith('Goal progress:'));
+ expect(progressEntries).toHaveLength(1);
+ expect(progressEntries[0].summary).toContain('Goal progress: 100%');
+ expect(progressEntries[0].summary).toContain('done');
+
+ // Run completed normally after resume.
+ expect(mockOnComplete).toHaveBeenCalledWith(
+ expect.objectContaining({ sessionId: SESSION_ID, completedTasks: 100, wasStopped: false })
+ );
+
+ useSessionStore.setState({ sessions: [] } as any);
+ });
+
it('does not start when Auto Run is globally disabled', async () => {
useSettingsStore.setState({ autoRunDisabled: true });
diff --git a/src/__tests__/renderer/hooks/useQueueProcessing.test.ts b/src/__tests__/renderer/hooks/useQueueProcessing.test.ts
index d51457fa9d..f89af21bf7 100644
--- a/src/__tests__/renderer/hooks/useQueueProcessing.test.ts
+++ b/src/__tests__/renderer/hooks/useQueueProcessing.test.ts
@@ -1287,6 +1287,100 @@ describe('runtime queue recovery', () => {
expect(mockSetSessions).not.toHaveBeenCalled();
});
+
+ it('after auto-resume, dispatches only the runnable item and leaves paused items held', async () => {
+ // Auto-Resume On Limit (Phase 4): when the coordinator clears a limit pause
+ // the session goes idle and this effect drains the persisted queue. Items the
+ // user individually held (paused: true) must stay skipped - the resume must
+ // not un-pause them.
+ vi.useFakeTimers();
+
+ mockSessionStoreState.sessionsLoaded = true;
+ mockSessionStoreState.sessions = [createSession({ state: 'idle', executionQueue: [] })];
+
+ const { rerender } = renderHook(() => useQueueProcessing(createDeps()));
+
+ // Complete startup recovery so runtime recovery is allowed to fire.
+ act(() => {
+ vi.advanceTimersByTime(600);
+ });
+
+ mockSetSessions.mockClear();
+ mockAgentStoreProcessQueuedItem.mockClear();
+
+ const tab = createTab({ id: 'tab-1', state: 'idle' });
+ const heldItem = createQueuedItem({ id: 'held', tabId: 'tab-1', paused: true });
+ const runnableItem = createQueuedItem({ id: 'runnable', tabId: 'tab-1' });
+ const session = createSession({
+ id: 'session-1',
+ state: 'idle',
+ aiTabs: [tab],
+ activeTabId: 'tab-1',
+ // Held item is ahead of the runnable one in the queue.
+ executionQueue: [heldItem, runnableItem],
+ });
+
+ mockSessionStoreState.sessions = [session];
+ mockGetActiveTab.mockReturnValue(tab);
+
+ let capturedUpdater: ((prev: Session[]) => Session[]) | null = null;
+ mockSetSessions.mockImplementation((updater: any) => {
+ capturedUpdater = updater;
+ });
+
+ await act(async () => {
+ rerender();
+ await Promise.resolve();
+ });
+
+ // Only the runnable item was dispatched; the held one was skipped.
+ expect(mockAgentStoreProcessQueuedItem).toHaveBeenCalledOnce();
+ expect(mockAgentStoreProcessQueuedItem.mock.calls[0][1].id).toBe('runnable');
+
+ // The held item stays in the queue (still paused); the runnable one is removed.
+ const updated = capturedUpdater!([session]);
+ expect(updated[0].executionQueue.map((i) => i.id)).toEqual(['held']);
+ expect(updated[0].executionQueue[0].paused).toBe(true);
+ });
+
+ it('does not dispatch at all when every queued item is paused', () => {
+ vi.useFakeTimers();
+
+ mockSessionStoreState.sessionsLoaded = true;
+ mockSessionStoreState.sessions = [createSession({ state: 'idle', executionQueue: [] })];
+
+ const { rerender } = renderHook(() => useQueueProcessing(createDeps()));
+
+ act(() => {
+ vi.advanceTimersByTime(600);
+ });
+
+ mockSetSessions.mockClear();
+ mockAgentStoreProcessQueuedItem.mockClear();
+
+ const tab = createTab({ id: 'tab-1', state: 'idle' });
+ mockSessionStoreState.sessions = [
+ createSession({
+ id: 'session-1',
+ state: 'idle',
+ aiTabs: [tab],
+ activeTabId: 'tab-1',
+ executionQueue: [
+ createQueuedItem({ id: 'held-a', tabId: 'tab-1', paused: true }),
+ createQueuedItem({ id: 'held-b', tabId: 'tab-1', paused: true }),
+ ],
+ }),
+ ];
+ mockGetActiveTab.mockReturnValue(tab);
+
+ act(() => {
+ rerender();
+ });
+
+ // All held → queue reads as drained → nothing dispatched, session stays put.
+ expect(mockSetSessions).not.toHaveBeenCalled();
+ expect(mockAgentStoreProcessQueuedItem).not.toHaveBeenCalled();
+ });
});
// ============================================================================
diff --git a/src/__tests__/renderer/hooks/useSessionRestoration.test.ts b/src/__tests__/renderer/hooks/useSessionRestoration.test.ts
index b91d24df77..da7074cdc0 100644
--- a/src/__tests__/renderer/hooks/useSessionRestoration.test.ts
+++ b/src/__tests__/renderer/hooks/useSessionRestoration.test.ts
@@ -665,6 +665,40 @@ describe('restoreSession — Runtime state reset', () => {
expect(restored!.agentErrorPaused).toBe(false);
});
+ it('preserves a limit pause so auto-resume re-attaches after restart', async () => {
+ // Auto-Resume On Limit: a persisted limit pause must come back live (state
+ // 'error', paused, error intact) so the Phase 3 coordinator's startup tick
+ // re-finds it. The give-up/backoff fields (limitPausedAt, resumeAttemptCount,
+ // limitResetAt) must survive the round-trip too.
+ const limitError = {
+ type: 'rate_limited',
+ message: 'Rate limited',
+ recoverable: true,
+ agentId: 'claude-code',
+ timestamp: 1000,
+ resumeAttemptCount: 2,
+ limitResetAt: 5000,
+ limitPausedAt: 1000,
+ };
+ const session = createMockSession({
+ state: 'error' as any,
+ agentError: limitError as any,
+ agentErrorPaused: true,
+ agentErrorTabId: 'tab-1',
+ });
+ const { result } = renderHook(() => useSessionRestoration());
+
+ let restored: Session;
+ await act(async () => {
+ restored = await result.current.restoreSession(session);
+ });
+
+ expect(restored!.state).toBe('error');
+ expect(restored!.agentErrorPaused).toBe(true);
+ expect(restored!.agentError).toEqual(limitError);
+ expect(restored!.agentErrorTabId).toBe('tab-1');
+ });
+
it('resets isLive and liveUrl', async () => {
const session = createMockSession({ isLive: true, liveUrl: 'http://localhost:3000' });
const { result } = renderHook(() => useSessionRestoration());
diff --git a/src/__tests__/renderer/hooks/utils/useDebouncedPersistence.test.ts b/src/__tests__/renderer/hooks/utils/useDebouncedPersistence.test.ts
index 1536d56c23..313128fdb1 100644
--- a/src/__tests__/renderer/hooks/utils/useDebouncedPersistence.test.ts
+++ b/src/__tests__/renderer/hooks/utils/useDebouncedPersistence.test.ts
@@ -735,6 +735,100 @@ describe('useDebouncedPersistence', () => {
});
});
+ describe('limit-pause persistence (Auto-Resume On Limit)', () => {
+ // A limit pause is the one error state we deliberately KEEP so auto-resume
+ // can re-find the paused session after an app restart. Every other error
+ // stays stripped (covered by the tests above).
+ const makeLimitError = () => ({
+ type: 'rate_limited' as const,
+ message: 'Rate limited',
+ recoverable: true,
+ agentId: 'claude-code',
+ timestamp: 1000,
+ resumeAttemptCount: 1,
+ limitResetAt: 5000,
+ limitPausedAt: 1000,
+ });
+
+ it('persists session-level limit-pause state and keeps state error', () => {
+ const error = makeLimitError();
+ const tab = makeTab({ id: 'paused', agentError: error as any });
+ const session = makeSession({
+ state: 'error',
+ agentError: error as any,
+ agentErrorPaused: true,
+ agentErrorTabId: 'paused',
+ aiTabs: [tab],
+ activeTabId: 'paused',
+ });
+
+ const initialLoadRef = makeInitialLoadRef(true);
+ const { result } = renderHook(() => useDebouncedPersistence([session], initialLoadRef));
+
+ act(() => {
+ result.current.flushNow();
+ });
+
+ const persisted = vi.mocked(window.maestro.sessions.setAll).mock.calls[0][0] as Session[];
+ expect(persisted[0].state).toBe('error');
+ expect(persisted[0].agentErrorPaused).toBe(true);
+ expect(persisted[0].agentErrorTabId).toBe('paused');
+ // Give-up/backoff fields survive the round-trip.
+ expect(persisted[0].agentError).toEqual(error);
+ });
+
+ it('keeps the paused tab agentError so the coordinator can re-attach', () => {
+ const error = makeLimitError();
+ const tab = makeTab({ id: 'paused', agentError: error as any });
+ const session = makeSession({
+ state: 'error',
+ agentError: error as any,
+ agentErrorPaused: true,
+ agentErrorTabId: 'paused',
+ aiTabs: [tab],
+ activeTabId: 'paused',
+ });
+
+ const initialLoadRef = makeInitialLoadRef(true);
+ const { result } = renderHook(() => useDebouncedPersistence([session], initialLoadRef));
+
+ act(() => {
+ result.current.flushNow();
+ });
+
+ const persisted = vi.mocked(window.maestro.sessions.setAll).mock.calls[0][0] as Session[];
+ expect(persisted[0].aiTabs[0].agentError).toEqual(error);
+ });
+
+ it('still strips a non-limit error pause (auth/crash must not survive restart)', () => {
+ const session = makeSession({
+ state: 'error',
+ agentError: {
+ type: 'auth_expired',
+ message: 'Auth expired',
+ recoverable: true,
+ agentId: 'claude-code',
+ timestamp: 1000,
+ } as any,
+ agentErrorPaused: true,
+ agentErrorTabId: 'default-tab',
+ });
+
+ const initialLoadRef = makeInitialLoadRef(true);
+ const { result } = renderHook(() => useDebouncedPersistence([session], initialLoadRef));
+
+ act(() => {
+ result.current.flushNow();
+ });
+
+ const persisted = vi.mocked(window.maestro.sessions.setAll).mock.calls[0][0] as Session[];
+ expect(persisted[0].state).toBe('idle');
+ expect(persisted[0].agentError).toBeUndefined();
+ expect(persisted[0].agentErrorPaused).toBeUndefined();
+ expect(persisted[0].agentErrorTabId).toBeUndefined();
+ });
+ });
+
describe('session runtime state reset', () => {
it('should reset session state to idle', () => {
const session = makeSession({ state: 'busy' });
diff --git a/src/__tests__/shared/isLimitError.test.ts b/src/__tests__/shared/isLimitError.test.ts
new file mode 100644
index 0000000000..84f249880a
--- /dev/null
+++ b/src/__tests__/shared/isLimitError.test.ts
@@ -0,0 +1,35 @@
+import { describe, it, expect } from 'vitest';
+import { isLimitError } from '../../shared/types';
+import type { AgentError, AgentErrorType } from '../../shared/types';
+
+function makeError(type: AgentErrorType): AgentError {
+ return {
+ type,
+ message: 'boom',
+ recoverable: true,
+ agentId: 'claude-code',
+ timestamp: 0,
+ };
+}
+
+describe('isLimitError', () => {
+ it('returns true for the two limit-pause error types', () => {
+ expect(isLimitError(makeError('rate_limited'))).toBe(true);
+ expect(isLimitError(makeError('token_exhaustion'))).toBe(true);
+ });
+
+ it('returns false for every non-limit error type', () => {
+ const nonLimit: AgentErrorType[] = [
+ 'auth_expired',
+ 'network_error',
+ 'agent_crashed',
+ 'permission_denied',
+ 'session_not_found',
+ 'hitl_gate',
+ 'unknown',
+ ];
+ for (const type of nonLimit) {
+ expect(isLimitError(makeError(type))).toBe(false);
+ }
+ });
+});
diff --git a/src/main/agents/claude-usage-sampler.ts b/src/main/agents/claude-usage-sampler.ts
index b9b4999c27..32ade4c906 100644
--- a/src/main/agents/claude-usage-sampler.ts
+++ b/src/main/agents/claude-usage-sampler.ts
@@ -14,6 +14,13 @@
* the binary's node-script-with-shebang packaging stays valid on Windows
* where shebangs aren't honored.
*
+ * - SSH limitation: this spawn is always LOCAL - it does NOT honor a session's
+ * `sshRemoteConfig` / `wrapSpawnWithSsh`, so the snapshot reflects the local
+ * account keyed by `CLAUDE_CONFIG_DIR`, never a remote host's Claude account.
+ * Consumers that act per remote session (notably Auto-Resume On Limit's
+ * `probeAvailability`) must NOT trust this snapshot for an SSH-backed session;
+ * they fall back to a resume-as-probe interval attempt instead.
+ *
* - Env precedence: `process.env` < `customEnvVars` < explicit `configDir`.
* Explicit `configDir` wins so a caller cannot accidentally smuggle a
* `CLAUDE_CONFIG_DIR` through `customEnvVars` that contradicts the path the
diff --git a/src/main/agents/limitResetEstimator.ts b/src/main/agents/limitResetEstimator.ts
new file mode 100644
index 0000000000..db7d24037d
--- /dev/null
+++ b/src/main/agents/limitResetEstimator.ts
@@ -0,0 +1,53 @@
+/**
+ * Limit Reset Estimator
+ *
+ * Best-effort: estimate when a paused agent's provider limit window is expected
+ * to reopen, so the auto-resume coordinator (Phase 3) can schedule its next
+ * probe instead of polling blindly on the fixed interval.
+ *
+ * Claude is the only provider with a reliable signal today - the
+ * `maestro-p --status` snapshot cached in `claudeUsageStore` carries the
+ * session (5-hour) and weekly reset times. Every other provider returns
+ * `undefined`; the coordinator falls back to its fixed probe interval there.
+ *
+ * Never throws - callers treat the result as advisory.
+ */
+
+import { getSnapshot, resolveConfigDirKey } from '../stores/claudeUsageStore';
+
+/**
+ * Epoch ms when `agentId`'s limit window is expected to reopen, or `undefined`
+ * when there's no reliable signal.
+ *
+ * For Claude: reads the cached usage snapshot for the account and returns the
+ * nearest FUTURE reset across the 5-hour session and 7-day all-models windows
+ * (the two windows the mode selector treats as limits). A snapshot whose resets
+ * are all in the past - or that is missing/expired - yields `undefined`.
+ *
+ * `claudeConfigDir` selects the account; when omitted, the process
+ * `CLAUDE_CONFIG_DIR` (falling back to `~/.claude`) is used.
+ */
+export function getLimitResetAt(agentId: string, claudeConfigDir?: string): number | undefined {
+ // Claude Code is the only provider with a reliable reset signal today.
+ if (agentId !== 'claude-code') {
+ return undefined;
+ }
+
+ const key = resolveConfigDirKey(
+ claudeConfigDir ? { CLAUDE_CONFIG_DIR: claudeConfigDir } : process.env
+ );
+ const snapshot = getSnapshot(key);
+ if (!snapshot) {
+ return undefined;
+ }
+
+ const now = Date.now();
+ const futureResets = [snapshot.session.resetsAt, snapshot.weekAllModels.resetsAt]
+ .map((iso) => new Date(iso).getTime())
+ .filter((ms) => Number.isFinite(ms) && ms > now);
+
+ if (futureResets.length === 0) {
+ return undefined;
+ }
+ return Math.min(...futureResets);
+}
diff --git a/src/main/ipc/handlers/agents.ts b/src/main/ipc/handlers/agents.ts
index 70e9d601ae..c0333723f7 100644
--- a/src/main/ipc/handlers/agents.ts
+++ b/src/main/ipc/handlers/agents.ts
@@ -33,6 +33,7 @@ import {
getAllSnapshots as getAllClaudeUsageSnapshots,
resolveConfigDirKey,
} from '../../stores/claudeUsageStore';
+import { getLimitResetAt } from '../../agents/limitResetEstimator';
import { getAllCodexUsageSnapshots, resolveCodexHomeKey } from '../../stores/codexUsageStore';
import type { UsageSnapshot } from '../../agents/claude-mode-selector';
import type { CodexUsageSnapshot } from '../../stores/codexUsageStore';
@@ -1670,6 +1671,20 @@ export function registerAgentsHandlers(deps: AgentsHandlerDependencies): void {
})
);
+ // Best-effort estimate of when a paused agent's provider limit window reopens,
+ // used by auto-resume (Phase 3) to schedule its next probe. Claude reads its
+ // cached usage snapshot; other providers return undefined (fixed-interval
+ // fallback). Never throws - the renderer treats the result as advisory.
+ ipcMain.handle(
+ 'agents:getLimitResetAt',
+ withIpcErrorLogging(
+ handlerOpts('getLimitResetAt'),
+ async (agentId: string, claudeConfigDir?: string): Promise => {
+ return getLimitResetAt(agentId, claudeConfigDir);
+ }
+ )
+ );
+
// On-demand re-sampler. Delegates to the same `runStartupUsageSampling()`
// the boot path calls, so the dashboard / settings refresh button takes the
// exact same code path that populated the store on launch. Returns a count
diff --git a/src/main/preload/agents.ts b/src/main/preload/agents.ts
index ac407b9abc..73467610ff 100644
--- a/src/main/preload/agents.ts
+++ b/src/main/preload/agents.ts
@@ -248,6 +248,14 @@ export function createAgentsApi() {
getClaudeUsageAccountKeys: (): Promise =>
ipcRenderer.invoke('agents:getClaudeUsageAccountKeys'),
+ /**
+ * Best-effort epoch-ms estimate of when a paused agent's provider limit
+ * window reopens (Claude only - undefined for other providers). Used by
+ * auto-resume to schedule the next probe; callers treat it as advisory.
+ */
+ getLimitResetAt: (agentId: string, claudeConfigDir?: string): Promise =>
+ ipcRenderer.invoke('agents:getLimitResetAt', agentId, claudeConfigDir),
+
/**
* Fetch sanitized Codex quota snapshots keyed by canonical CODEX_HOME.
* Main owns auth.json reads and quota endpoint calls.
diff --git a/src/main/stores/defaults.ts b/src/main/stores/defaults.ts
index a4cf920541..0b173ebe61 100644
--- a/src/main/stores/defaults.ts
+++ b/src/main/stores/defaults.ts
@@ -88,6 +88,10 @@ export const SETTINGS_DEFAULTS: MaestroSettings = {
annotatorTextFont: 'sans-serif',
annotatorTextBgColor: '',
globalShowHotkey: [],
+ // Auto-resume agents that paused on a token/API/credit limit
+ autoResumeOnLimit: true,
+ autoResumeCheckIntervalHours: 2,
+ autoResumeGiveUpDays: 7,
};
export const SESSIONS_DEFAULTS: SessionsData = {
diff --git a/src/renderer/App.tsx b/src/renderer/App.tsx
index 04b04e8004..602c413f82 100644
--- a/src/renderer/App.tsx
+++ b/src/renderer/App.tsx
@@ -142,6 +142,7 @@ import { useChatFileDropZone } from './hooks/ui/useChatFileDropZone';
import { useMainPanelProps, useSessionListProps, useRightPanelProps } from './hooks/props';
import { useAgentListeners } from './hooks/agent/useAgentListeners';
import { useSessionRecovery } from './hooks/agent/useSessionRecovery';
+import { useAutoResumeCoordinator } from './hooks/agent/useAutoResumeCoordinator';
import { useSymphonyContribution } from './hooks/symphony/useSymphonyContribution';
import { useCueAutoDiscovery } from './hooks/useCueAutoDiscovery';
import { useCueVisibilityWiring } from './hooks/cue/useCueVisibilityWiring';
@@ -1505,6 +1506,14 @@ function MaestroConsoleInner() {
contextWarningYellowThreshold: contextManagementSettings.contextWarningYellowThreshold,
});
+ // --- AUTO-RESUME ON LIMIT (Phase 3) ---
+ // Renderer singleton: on the autoResumeCheckIntervalHours interval, probe
+ // every limit-paused agent and resume the ones whose provider window has
+ // reopened. Reads its own settings from the store; early-returns + clears
+ // the timer when autoResumeOnLimit is off. `resumeAutoRunAfterError` is the
+ // shared entry point that unblocks both spec- and goal-driven Auto Runs.
+ useAutoResumeCoordinator({ resumeAutoRunAfterError });
+
const handleRemoveQueuedItem = useCallback((itemId: string) => {
updateSessionWith(activeSessionIdRef.current, (s) => ({
...s,
diff --git a/src/renderer/components/Settings/searchableSettings.ts b/src/renderer/components/Settings/searchableSettings.ts
index 3725fa6bda..ab1e1cae00 100644
--- a/src/renderer/components/Settings/searchableSettings.ts
+++ b/src/renderer/components/Settings/searchableSettings.ts
@@ -284,6 +284,46 @@ export const GENERAL_SETTINGS: SearchableSetting[] = [
description: 'GPU acceleration and confetti animations',
keywords: ['gpu', 'rendering', 'acceleration', 'confetti', 'animation', 'hardware'],
},
+ {
+ id: 'general-auto-resume',
+ tab: 'general',
+ tabLabel: 'General',
+ label: 'Resume Paused Sessions on Limit',
+ description:
+ 'Automatically resume agents that paused on a token, API, or credit limit once the provider window reopens',
+ keywords: [
+ 'resume',
+ 'limit',
+ 'credit',
+ 'token',
+ 'rate limit',
+ 'quota',
+ 'auto resume',
+ 'exhaustion',
+ 'paused',
+ ],
+ },
+ {
+ id: 'general-auto-resume-interval',
+ tab: 'general',
+ tabLabel: 'General',
+ label: 'Auto-Resume Check Interval',
+ description:
+ 'How often to probe for credit/limit availability before resuming paused agents, and how long to keep trying before giving up',
+ keywords: [
+ 'resume',
+ 'limit',
+ 'credit',
+ 'token',
+ 'rate limit',
+ 'quota',
+ 'auto resume',
+ 'exhaustion',
+ 'paused',
+ 'interval',
+ 'give up',
+ ],
+ },
{
id: 'general-updates',
tab: 'general',
diff --git a/src/renderer/components/Settings/tabs/GeneralTab.tsx b/src/renderer/components/Settings/tabs/GeneralTab.tsx
index 5f64a3ad9c..c0a8bf1e48 100644
--- a/src/renderer/components/Settings/tabs/GeneralTab.tsx
+++ b/src/renderer/components/Settings/tabs/GeneralTab.tsx
@@ -123,6 +123,13 @@ export function GeneralTab({ theme, isOpen }: GeneralTabProps) {
// Updates
checkForUpdatesOnStartup,
setCheckForUpdatesOnStartup,
+ // Auto-resume on limit
+ autoResumeOnLimit,
+ setAutoResumeOnLimit,
+ autoResumeCheckIntervalHours,
+ setAutoResumeCheckIntervalHours,
+ autoResumeGiveUpDays,
+ setAutoResumeGiveUpDays,
enableBetaUpdates,
setEnableBetaUpdates,
crashReportingEnabled,
@@ -897,6 +904,89 @@ export function GeneralTab({ theme, isOpen }: GeneralTabProps) {
+ {/* Auto-Resume on Limit */}
+
+
+
+ Auto-Resume on Limit
+
+
+ {/* Resume Paused Sessions Toggle */}
+
setAutoResumeOnLimit(!autoResumeOnLimit)}
+ role="button"
+ tabIndex={0}
+ onKeyDown={(e) => {
+ if (e.key === 'Enter' || e.key === ' ') {
+ e.preventDefault();
+ setAutoResumeOnLimit(!autoResumeOnLimit);
+ }
+ }}
+ >
+
+
+ Resume paused sessions when token/API credits are available
+
+
+ Maestro probes every provider on a fixed interval and automatically resumes any
+ queued work once the limit window reopens. Probing is cheap, so the give-up window
+ is intentionally long.
+
+
+
+
+
+ {autoResumeOnLimit && (
+
+ )}
+
+
+
{/* Default History Toggle */}
>;
getClaudeUsageAccountKeys: () => Promise;
+ getLimitResetAt: (agentId: string, claudeConfigDir?: string) => Promise;
getCodexUsageSnapshots: () => Promise<
Record<
string,
diff --git a/src/renderer/hooks/agent/internal/useAgentErrorListener.ts b/src/renderer/hooks/agent/internal/useAgentErrorListener.ts
index f6f06618b8..05c51b270a 100644
--- a/src/renderer/hooks/agent/internal/useAgentErrorListener.ts
+++ b/src/renderer/hooks/agent/internal/useAgentErrorListener.ts
@@ -32,6 +32,7 @@ import { generateId } from '../../../utils/ids';
import { logger } from '../../../utils/logger';
import { removeHiddenProgressLog } from './helpers/exitTabCleanup';
import { getErrorTitleForType } from './helpers/errorTitles';
+import { isLimitError } from '../../../../shared/types';
import type { AgentError, GroupChatMessage, LogEntry, SessionState } from '../../../types';
import type { UseAgentListenersDeps, ToolProgressState } from './types';
@@ -62,6 +63,14 @@ export function useAgentErrorListener(deps: UseAgentErrorListenerDeps): void {
parsedJson: error.parsedJson,
};
+ // Limit pauses (rate-limit / token-or-credit exhaustion) get auto-resume
+ // bookkeeping seeded here: a zeroed retry counter now, and a best-effort
+ // `limitResetAt` patched in below (asynchronously, never blocking the pause).
+ const isLimit = isLimitError(agentError);
+ if (isLimit) {
+ agentError.resumeAttemptCount = 0;
+ }
+
const groupChatParsed = parseGroupChatSessionId(sessionId);
if (groupChatParsed.isGroupChat) {
const groupChatId = groupChatParsed.groupChatId!;
@@ -175,8 +184,11 @@ export function useAgentErrorListener(deps: UseAgentErrorListenerDeps): void {
// For session_not_found, find the most recent user message on the
// target tab so the recovery modal can re-send it after grooming.
// Without this, the prompt that triggered the dead session is lost.
+ // Limit pauses reuse the same capture: when the prompt that hit the
+ // limit was a direct send (not a queued item the drainer would replay),
+ // stashing it as `recoveryAction.lastUserPrompt` lets Phase 3 re-fire it.
const lastUserPrompt =
- isSessionNotFound && targetTab
+ (isSessionNotFound || isLimit) && targetTab
? [...targetTab.logs].reverse().find((l) => l.source === 'user')?.text
: undefined;
@@ -189,6 +201,11 @@ export function useAgentErrorListener(deps: UseAgentErrorListenerDeps): void {
// recovery) stay untagged — they aren't real Claude turns.
const isInteractive = s.claudeInteractive?.mode === 'interactive';
const canOfferRecovery = isSessionNotFound && !!lastUserPrompt && !!targetTab;
+ // Limit pauses keep the normal error log (message + agentError), but
+ // also carry the captured prompt so the auto-resume coordinator can
+ // re-fire a direct send. The `canOfferRecovery` session_not_found flow
+ // owns the special "recover raw or compressed" copy; this only adds data.
+ const stashLimitPrompt = isLimit && !!lastUserPrompt && !!targetTab;
const errorLogEntry: LogEntry = {
id: generateId(),
timestamp: agentError.timestamp,
@@ -198,7 +215,7 @@ export function useAgentErrorListener(deps: UseAgentErrorListenerDeps): void {
: agentError.message,
agentError: isSessionNotFound ? undefined : agentError,
...(isInteractive && !isSessionNotFound ? { renderStyle: 'text-stream' as const } : {}),
- ...(canOfferRecovery
+ ...(canOfferRecovery || stashLimitPrompt
? { recoveryAction: { lastUserPrompt: lastUserPrompt!, tabId: targetTab!.id } }
: {}),
};
@@ -230,6 +247,41 @@ export function useAgentErrorListener(deps: UseAgentErrorListenerDeps): void {
})
);
+ // Best-effort: estimate when the provider limit window reopens and stamp
+ // it onto the paused error so the auto-resume coordinator (Phase 3) can
+ // schedule its probe. Fired AFTER the synchronous pause above so it never
+ // blocks it; a missing bridge / non-Claude provider just leaves it unset.
+ if (isLimit && window.maestro.agents?.getLimitResetAt) {
+ void window.maestro.agents
+ .getLimitResetAt(agentError.agentId)
+ .then((resetAt) => {
+ if (typeof resetAt !== 'number') return;
+ setSessions((prev) =>
+ prev.map((s) => {
+ // Only patch if THIS error is still the active one (a newer
+ // error would carry a different timestamp).
+ if (s.id !== actualSessionId || s.agentError?.timestamp !== agentError.timestamp) {
+ return s;
+ }
+ const patchedError: AgentError = { ...s.agentError, limitResetAt: resetAt };
+ return {
+ ...s,
+ agentError: patchedError,
+ aiTabs: s.aiTabs.map((tab) =>
+ tab.agentError?.timestamp === agentError.timestamp
+ ? { ...tab, agentError: patchedError }
+ : tab
+ ),
+ };
+ })
+ );
+ })
+ .catch(() => {
+ // Reset estimate is advisory - swallow so a probe failure never
+ // disrupts the pause/notification flow.
+ });
+ }
+
// Pause active Auto Run batch and record history when applicable.
if (deps.getBatchStateRef.current && deps.pauseBatchOnErrorRef.current) {
const batchState = deps.getBatchStateRef.current(actualSessionId);
diff --git a/src/renderer/hooks/agent/useAutoResumeCoordinator.ts b/src/renderer/hooks/agent/useAutoResumeCoordinator.ts
new file mode 100644
index 0000000000..1b9836193f
--- /dev/null
+++ b/src/renderer/hooks/agent/useAutoResumeCoordinator.ts
@@ -0,0 +1,517 @@
+/**
+ * useAutoResumeCoordinator - the brain of Auto-Resume On Limit (Phase 3).
+ *
+ * A renderer-side singleton (mounted once in App.tsx, mirroring the other
+ * `useAgent*Listener` hooks) that, on a fixed interval (the
+ * `autoResumeCheckIntervalHours` setting, default 2h), finds every
+ * limit-paused session, probes whether its provider window has reopened, and
+ * resumes the ones that are clear - dispatching the correct resume action per
+ * run kind and firing a toast.
+ *
+ * Why renderer-side: every resume action (clearing error state,
+ * `resumeAfterError`, re-entering the goal loop, draining the queue) lives in
+ * the renderer, and the app must be open to resume regardless, so a renderer
+ * singleton avoids new cross-process dispatch plumbing.
+ *
+ * Run-kind dispatch on resume:
+ * - Spec- or goal-driven (an error-paused batch run exists in batchStore):
+ * `resumeAutoRunAfterError(sessionId)` resolves the shared in-memory
+ * `errorResolution` promise that BOTH the document runner and the goal
+ * runner await, and clears the session/batch error. The goal runner retries
+ * the same iteration; the document runner re-runs the paused task.
+ * - Standard query: clear the paused error so the session falls back to idle
+ * and the persisted execution queue drains automatically (the
+ * runtime-recovery effect in useQueueProcessing), re-firing any captured
+ * in-flight direct send.
+ */
+
+import { useEffect, useRef } from 'react';
+import type { Session, QueuedItem } from '../../types';
+import { isLimitError } from '../../../shared/types';
+import { getAgentDisplayName } from '../../../shared/agentMetadata';
+import { useSettingsStore } from '../../stores/settingsStore';
+import { useSessionStore, updateSessionWith } from '../../stores/sessionStore';
+import { useBatchStore } from '../../stores/batchStore';
+import { useAgentStore } from '../../stores/agentStore';
+import {
+ useClaudeUsageStore,
+ getClaudeUsageSnapshotForSession,
+} from '../../stores/claudeUsageStore';
+import { notifyToast } from '../../stores/notificationStore';
+import { LIMIT_THRESHOLD } from '../../components/UsageDashboard/quota/quotaFormatting';
+import { generateId } from '../../utils/ids';
+import { logger } from '../../utils/logger';
+
+const LOG_CONTEXT = '[AutoResume]';
+
+/**
+ * Run one tick promptly after mount (not a full interval later) so a day-later
+ * restart probes limit-paused agents within seconds instead of hours. Kept
+ * short but non-zero so the rest of the app's startup wiring settles first.
+ */
+const INITIAL_TICK_DELAY_MS = 10_000;
+
+const MS_PER_DAY = 24 * 60 * 60 * 1000;
+
+/** Fallback give-up window when the setting is missing (matches the default). */
+const DEFAULT_GIVE_UP_DAYS = 7;
+
+/**
+ * Per-session give-up bookkeeping, held in memory across ticks (rebuilt from the
+ * persisted `limitPausedAt` after a restart). `anchor` is when the give-up
+ * window started - it deliberately survives a probe that resumes-then-re-hits
+ * the limit (resume-as-probe for non-Claude), so "N days of REPEATED limits"
+ * actually elapses instead of resetting every interval. `toastFired` guards the
+ * single give-up toast.
+ */
+interface GiveUpTracking {
+ anchor: number;
+ toastFired: boolean;
+}
+
+/** Options bag for {@link runAutoResumeTick}. All optional for ease of testing. */
+export interface RunAutoResumeTickOptions {
+ /** Cross-tick give-up bookkeeping (the hook passes a stable ref'd map). */
+ giveUp?: Map;
+ /** `autoResumeGiveUpDays` setting (default 7). */
+ giveUpDays?: number;
+ /** Injectable clock for tests. */
+ now?: number;
+}
+
+export interface UseAutoResumeCoordinatorDeps {
+ /**
+ * The batch resume entry point (`resumeAfterError` wrapped to also clear the
+ * session's agent error). Resolves the shared `errorResolution` promise that
+ * both the document and goal runners await, unblocking either kind of paused
+ * Auto Run with its in-memory loop state intact.
+ */
+ resumeAutoRunAfterError: (sessionId: string) => void;
+}
+
+// ============================================================================
+// Pure predicates / selectors
+// ============================================================================
+
+/** A session paused specifically on a provider limit (rate / token / credit). */
+export function isLimitPausedSession(session: Session): boolean {
+ return (
+ session.state === 'error' &&
+ session.agentErrorPaused === true &&
+ !!session.agentError &&
+ isLimitError(session.agentError)
+ );
+}
+
+/**
+ * Whether a limit-paused session is eligible to probe on this tick. When the
+ * provider told us when the window reopens (`limitResetAt`) and it's still in
+ * the future, skip - it's not time yet. Sessions with an unknown reset time
+ * (non-Claude / unparseable) are always eligible: the interval itself is the
+ * backoff.
+ */
+export function isEligibleToProbe(session: Session, now: number): boolean {
+ const resetAt = session.agentError?.limitResetAt;
+ if (typeof resetAt === 'number' && resetAt > now) return false;
+ return true;
+}
+
+// ============================================================================
+// Probe
+// ============================================================================
+
+/**
+ * Probe whether a paused session's provider window has reopened.
+ *
+ * Claude: reads the (freshly re-sampled, see `refreshClaudeUsageForTick`) usage
+ * snapshot for that session's account and returns true only when both relevant
+ * windows are below `LIMIT_THRESHOLD` - i.e. credits are actually available
+ * again. A missing snapshot or an unauthenticated account returns false (we
+ * can't confirm availability, so stay paused and retry next interval).
+ *
+ * All other providers: return true. No usage signal exists for them, so the
+ * resume attempt itself is the probe - if it re-hits the limit, Phase 2's pause
+ * path re-pauses it and the next interval retries. This is how "all providers"
+ * is supported.
+ */
+export async function probeAvailability(session: Session): Promise {
+ if (session.toolType !== 'claude-code') return true;
+
+ // SSH-backed sessions: the usage sampler (`maestro-p --status` in
+ // claude-usage-sampler.ts) runs LOCALLY and is keyed by the LOCAL account's
+ // CLAUDE_CONFIG_DIR - it does NOT honor `sessionSshRemoteConfig` /
+ // `wrapSpawnWithSsh`, so the cached snapshot describes the wrong machine's
+ // account. Rather than make a remote-account decision from local numbers,
+ // treat availability as UNKNOWN and fall back to the interval-based attempt
+ // (resume-as-probe, like non-Claude providers): the resume runs on the remote,
+ // and if it re-hits the limit Phase 2 re-pauses it for the next interval. If a
+ // real remote `--status` probe is added later, run it here via wrapSpawnWithSsh.
+ if (session.sessionSshRemoteConfig?.enabled) return true;
+
+ const snapshot = getClaudeUsageSnapshotForSession(session);
+ if (!snapshot) return false;
+ if (snapshot.authState === 'unauthenticated') return false;
+
+ return (
+ snapshot.session.percent < LIMIT_THRESHOLD && snapshot.weekAllModels.percent < LIMIT_THRESHOLD
+ );
+}
+
+/**
+ * Re-sample Claude plan usage once per tick (best-effort), then pull the
+ * refreshed map into the renderer mirror so `probeAvailability` reads current
+ * numbers. Re-sampling spawns `maestro-p --status` per account on main, so we
+ * do it once per tick covering every account rather than once per paused
+ * session.
+ */
+async function refreshClaudeUsageForTick(): Promise {
+ try {
+ await window.maestro.agents.refreshClaudeUsageSnapshots();
+ } catch (err) {
+ // Best-effort: fall back to whatever's already cached in the mirror.
+ logger.warn('Claude usage re-sample failed; using cached snapshot', LOG_CONTEXT, {
+ error: err instanceof Error ? err.message : String(err),
+ });
+ }
+ try {
+ await useClaudeUsageStore.getState().refresh();
+ } catch {
+ // refresh() already swallows its own IPC errors; guard anyway.
+ }
+}
+
+// ============================================================================
+// Session mutations
+// ============================================================================
+
+/**
+ * Patch a session's `agentError` (and the matching tab copy) in place, keyed on
+ * the error timestamp so a newer error isn't clobbered. No-op if the session
+ * has no agent error or the updater returns the same object.
+ */
+function patchAgentError(
+ sessionId: string,
+ updater: (err: NonNullable) => NonNullable
+): void {
+ updateSessionWith(sessionId, (s) => {
+ if (!s.agentError) return s;
+ const patched = updater(s.agentError);
+ if (patched === s.agentError) return s;
+ const ts = s.agentError.timestamp;
+ return {
+ ...s,
+ agentError: patched,
+ aiTabs: s.aiTabs.map((tab) =>
+ tab.agentError?.timestamp === ts ? { ...tab, agentError: patched } : tab
+ ),
+ };
+ });
+}
+
+/**
+ * Find a captured in-flight direct send (a prompt that hit the limit but was
+ * never queued, so the queue drainer won't replay it). Phase 2 stashes it as
+ * `recoveryAction.lastUserPrompt` on the error log entry of the paused tab.
+ */
+function findCapturedPrompt(
+ session: Session
+): { tabId: string; text: string; logId: string } | null {
+ const tab =
+ session.aiTabs.find((t) => t.id === session.agentErrorTabId) ??
+ session.aiTabs.find((t) => t.agentError?.timestamp === session.agentError?.timestamp);
+ if (!tab) return null;
+ for (let i = tab.logs.length - 1; i >= 0; i--) {
+ const log = tab.logs[i];
+ const prompt = log.recoveryAction?.lastUserPrompt;
+ if (prompt) return { tabId: tab.id, text: prompt, logId: log.id };
+ }
+ return null;
+}
+
+/**
+ * Enqueue a captured direct send at the front of the session's execution queue
+ * and consume its `recoveryAction` so it only fires once. Done WHILE the session
+ * is still in the error state (the drainer ignores non-idle sessions); the
+ * subsequent `clearAgentError` flips it to idle and the runtime-recovery effect
+ * dispatches the queue front-to-back, preserving order.
+ */
+function enqueueCapturedPrompt(
+ sessionId: string,
+ captured: { tabId: string; text: string; logId: string }
+): void {
+ const item: QueuedItem = {
+ id: generateId(),
+ timestamp: Date.now(),
+ tabId: captured.tabId,
+ type: 'message',
+ text: captured.text,
+ };
+ updateSessionWith(sessionId, (s) => ({
+ ...s,
+ executionQueue: [item, ...s.executionQueue],
+ aiTabs: s.aiTabs.map((tab) =>
+ tab.id === captured.tabId
+ ? {
+ ...tab,
+ logs: tab.logs.map((log) =>
+ log.id === captured.logId ? { ...log, recoveryAction: undefined } : log
+ ),
+ }
+ : tab
+ ),
+ }));
+}
+
+// ============================================================================
+// Resume
+// ============================================================================
+
+/**
+ * Resume a single limit-paused session, dispatching by run kind. Increments
+ * `resumeAttemptCount` BEFORE attempting so backoff/telemetry observe it.
+ */
+function resume(session: Session, resumeAutoRunAfterError: (sessionId: string) => void): void {
+ patchAgentError(session.id, (err) => ({
+ ...err,
+ resumeAttemptCount: (err.resumeAttemptCount ?? 0) + 1,
+ }));
+
+ // After an app restart this batch state is gone: batchStore is in-memory and
+ // is intentionally NOT reconstructed on cold start. So a session that paused
+ // mid Auto Run before the restart has no `batch` here and falls through to the
+ // standard path below - it clears the error, the persisted executionQueue
+ // drains, and the agent continues from its own transcript via the native
+ // `--resume`. The orchestration LOOP does not resume; the agent session and
+ // its queued work do. (Within a single app run, the loop state is still live
+ // and this takes the Auto Run branch.)
+ const batch = useBatchStore.getState().batchRunStates[session.id];
+ const isAutoRunPaused = !!batch && batch.errorPaused === true;
+
+ if (isAutoRunPaused) {
+ // Spec- or goal-driven: one entry point unblocks both (it resolves the
+ // shared errorResolution promise the runners await and clears the error).
+ logger.info('Resuming Auto Run after limit', LOG_CONTEXT, {
+ sessionId: session.id,
+ goalMode: batch.goalMode === true,
+ });
+ resumeAutoRunAfterError(session.id);
+ return;
+ }
+
+ // Standard query: re-fire a captured direct send (prepended while still
+ // paused), then clear the error so the session goes idle and the queue
+ // drains.
+ const captured = findCapturedPrompt(session);
+ if (captured) {
+ enqueueCapturedPrompt(session.id, captured);
+ }
+ logger.info('Resuming standard session after limit', LOG_CONTEXT, {
+ sessionId: session.id,
+ refiredCapturedPrompt: !!captured,
+ });
+ useAgentStore.getState().clearAgentError(session.id);
+}
+
+/**
+ * Fire one green "Resumed" toast for a resumed session. Uses the pre-resume
+ * session snapshot so the tab id (cleared by the resume) is still available for
+ * click-to-jump. The session's own display name is preferred for the agent name
+ * (falling back to the provider display name) so a user with several agents can
+ * tell which one came back.
+ */
+function fireResumedToast(session: Session): void {
+ const agentName = session.name?.trim() || getAgentDisplayName(session.toolType);
+ notifyToast({
+ color: 'green',
+ title: 'Resumed',
+ message: `${agentName} resumed - credits available`,
+ project: session.name,
+ clickAction: { kind: 'jump-session', sessionId: session.id, tabId: session.agentErrorTabId },
+ });
+}
+
+// ============================================================================
+// Give up (time-based)
+// ============================================================================
+
+/**
+ * Resolve the give-up anchor for a limit-paused session and remember it in the
+ * cross-tick map. Order of preference: an anchor already tracked this app
+ * session (survives a resume-then-re-hit so the window doesn't reset) > the
+ * persisted `limitPausedAt` (rebuilds the window after a restart) > the error
+ * timestamp (first pause) > now. Also keeps the persisted `limitPausedAt` in
+ * sync with the live anchor so a restart measures the same window.
+ */
+function resolveGiveUpAnchor(
+ giveUp: Map,
+ session: Session,
+ now: number
+): number {
+ const existing = giveUp.get(session.id);
+ const err = session.agentError;
+ const anchor = existing?.anchor ?? err?.limitPausedAt ?? err?.timestamp ?? now;
+ giveUp.set(session.id, { anchor, toastFired: existing?.toastFired ?? false });
+ if (err && err.limitPausedAt !== anchor) {
+ patchAgentError(session.id, (e) =>
+ e.limitPausedAt === anchor ? e : { ...e, limitPausedAt: anchor }
+ );
+ }
+ return anchor;
+}
+
+/**
+ * Drop give-up tracking for any session that is no longer limit-paused - i.e. it
+ * resumed successfully (or the user cleared it). This is what makes a LATER limit
+ * start a fresh give-up window. A resume that immediately re-hits the limit does
+ * NOT prune here: ticks are hours apart while a re-hit re-pauses within seconds,
+ * so by the next tick the session is limit-paused again and its anchor survives.
+ */
+function pruneGiveUpTracking(
+ giveUp: Map,
+ limitPausedIds: Set
+): void {
+ for (const id of [...giveUp.keys()]) {
+ if (!limitPausedIds.has(id)) giveUp.delete(id);
+ }
+}
+
+/**
+ * Fire one distinct give-up toast (orange, sticky) telling the user auto-resume
+ * has stopped for this session and they must resume manually. Trigger is purely
+ * time-based (the give-up window), never an attempt count.
+ */
+function fireGaveUpToast(session: Session, giveUpDays: number): void {
+ const agentName = session.name?.trim() || getAgentDisplayName(session.toolType);
+ const dayWord = giveUpDays === 1 ? 'day' : 'days';
+ notifyToast({
+ color: 'orange',
+ title: 'Auto-resume stopped',
+ message: `${agentName}: auto-resume gave up after ${giveUpDays} ${dayWord} of repeated limits - resume manually`,
+ project: session.name,
+ dismissible: true,
+ clickAction: { kind: 'jump-session', sessionId: session.id, tabId: session.agentErrorTabId },
+ });
+}
+
+// ============================================================================
+// Tick
+// ============================================================================
+
+/**
+ * One coordinator pass: select limit-paused + eligible sessions, stamp their
+ * pause-start, give up on any past their time-based cutoff, then probe
+ * availability and resume the clear ones. `inFlight` guards against a later tick
+ * starting a second probe/resume for a session still mid-resume from this one.
+ */
+export async function runAutoResumeTick(
+ inFlight: Set,
+ resumeAutoRunAfterError: (sessionId: string) => void,
+ options: RunAutoResumeTickOptions = {}
+): Promise {
+ const { giveUp = new Map(), giveUpDays = DEFAULT_GIVE_UP_DAYS, now = Date.now() } = options;
+
+ const sessions = useSessionStore.getState().sessions;
+ const limitPaused = sessions.filter(isLimitPausedSession);
+
+ // A session that left the limit-paused state resumed successfully (or was
+ // cleared by the user); forget its give-up window so a future limit is fresh.
+ pruneGiveUpTracking(giveUp, new Set(limitPaused.map((s) => s.id)));
+
+ const candidates = limitPaused.filter((s) => isEligibleToProbe(s, now));
+ if (candidates.length === 0) return;
+
+ // Give-up pass: resolve each candidate's give-up anchor (which also stamps the
+ // persisted `limitPausedAt` for the give-up window), then park any candidate
+ // past its time-based cutoff and fire one toast. Probing is cheap, so
+ // everything inside the window keeps retrying on the normal interval - the
+ // cutoff is the ONLY thing that ends retries, never an attempt count.
+ const retryable: Session[] = [];
+ for (const session of candidates) {
+ const anchor = resolveGiveUpAnchor(giveUp, session, now);
+ if (now >= anchor + Math.max(1, giveUpDays) * MS_PER_DAY) {
+ const tracking = giveUp.get(session.id);
+ if (tracking && !tracking.toastFired) {
+ tracking.toastFired = true;
+ fireGaveUpToast(session, giveUpDays);
+ }
+ continue; // leave it paused; stop auto-retrying this one
+ }
+ retryable.push(session);
+ }
+ if (retryable.length === 0) return;
+
+ // Re-sample Claude usage once per tick if any retryable candidate is Claude.
+ if (retryable.some((s) => s.toolType === 'claude-code')) {
+ await refreshClaudeUsageForTick();
+ }
+
+ for (const session of retryable) {
+ // Don't start a second probe/resume for a session already mid-resume.
+ if (inFlight.has(session.id)) continue;
+ inFlight.add(session.id);
+
+ void (async () => {
+ try {
+ const available = await probeAvailability(session);
+ if (!available) return; // stay paused; retried next interval
+ resume(session, resumeAutoRunAfterError);
+ fireResumedToast(session);
+ } catch (err) {
+ logger.warn('Auto-resume probe/resume failed', LOG_CONTEXT, {
+ sessionId: session.id,
+ error: err instanceof Error ? err.message : String(err),
+ });
+ } finally {
+ inFlight.delete(session.id);
+ }
+ })();
+ }
+}
+
+// ============================================================================
+// Hook
+// ============================================================================
+
+export function useAutoResumeCoordinator(deps: UseAutoResumeCoordinatorDeps): void {
+ const enabled = useSettingsStore((s) => s.autoResumeOnLimit);
+ const intervalHours = useSettingsStore((s) => s.autoResumeCheckIntervalHours);
+ const giveUpDays = useSettingsStore((s) => s.autoResumeGiveUpDays);
+
+ // Always call the latest resume fn / give-up window from the interval without
+ // recreating the timer on every render.
+ const resumeRef = useRef(deps.resumeAutoRunAfterError);
+ resumeRef.current = deps.resumeAutoRunAfterError;
+ const giveUpDaysRef = useRef(giveUpDays);
+ giveUpDaysRef.current = giveUpDays;
+
+ // Session ids with an in-flight probe/resume - persists across ticks.
+ const inFlightRef = useRef>(new Set());
+
+ // Per-session give-up bookkeeping - persists across ticks (rebuilt from
+ // persisted limitPausedAt after a restart). See GiveUpTracking.
+ const giveUpRef = useRef