feat(platform): add nightly copilot automation flow#12407
feat(platform): add nightly copilot automation flow#12407
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThis PR implements a comprehensive Autopilot/Nightly Copilot feature for the AutoGPT platform. It introduces callback-token-based session resumption, email notification workflows for Copilot completions, admin APIs for user and session management, database schema extensions to support session start types and lifecycle tracking, and frontend components for admin Copilot management and callback token consumption flows. Changes
Sequence Diagram(s)sequenceDiagram
participant Admin
participant AdminAPI as Admin API<br/>(user_admin_routes)
participant DB as Database
participant Copilot as Copilot Core<br/>(autopilot_dispatch)
participant Session as Session Manager<br/>(model.py)
participant Stream as Stream Registry
Admin->>AdminAPI: POST /admin/copilot/trigger<br/>TriggerCopilotSessionRequest
AdminAPI->>DB: fetch User
AdminAPI->>Copilot: trigger_autopilot_session_for_user
Copilot->>DB: check recent sessions & messages
Copilot->>DB: create ChatSession (with start_type, execution_tag)
Copilot->>Session: new ChatSession instance
Session->>DB: enqueue initial turn
Copilot-->>AdminAPI: ChatSession created
AdminAPI-->>Admin: TriggerCopilotSessionResponse
sequenceDiagram
participant User
participant Frontend as Chat Frontend
participant API as Chat API<br/>(routes.py)
participant TokenMgr as Token Manager<br/>(autopilot.py)
participant DB as Database
participant Copilot as Copilot Engine<br/>(session lifecycle)
User->>Frontend: Click callback token link<br/>with ?callbackToken=xyz
Frontend->>Frontend: useCallbackToken hook<br/>detects token
Frontend->>API: POST /callback-token/consume<br/>ConsumeCallbackTokenRequest
API->>TokenMgr: consume_callback_token(token_id, user_id)
TokenMgr->>DB: get_chat_session_callback_token(token_id)
alt Token already consumed
TokenMgr-->>API: return session_id
else New consumption
TokenMgr->>DB: create_chat_session (MANUAL type)
TokenMgr->>DB: mark_chat_session_callback_token_consumed
TokenMgr-->>API: CallbackTokenConsumeResult
end
API-->>Frontend: ConsumeCallbackTokenResponse
Frontend->>Frontend: onConsumed(session_id)
Frontend->>Frontend: Invalidate session list
Frontend->>Frontend: Load Copilot chat
Frontend-->>User: Display conversation
sequenceDiagram
participant Scheduler
participant Dispatch as Autopilot Dispatch<br/>(autopilot_dispatch.py)
participant DB as Database
participant Prompt as Prompts Engine<br/>(autopilot_prompts.py)
participant Session as Session Manager<br/>(model.py)
participant Email as Email Service<br/>(autopilot_email.py)
Scheduler->>Dispatch: dispatch_nightly_copilot()
Dispatch->>DB: list_users (batched)
loop For each user
Dispatch->>DB: check nightly feature flag<br/>resolve timezone
Dispatch->>DB: _crosses_local_midnight<br/>(compute target date)
Dispatch->>DB: session_exists_for_execution_tag
Dispatch->>Prompt: _build_autopilot_system_prompt<br/>(with context)
Dispatch->>Session: _create_autopilot_session<br/>(NIGHTLY start_type)
Dispatch->>DB: enqueue_session_turn
end
Scheduler->>Email: send_nightly_copilot_emails()
Email->>DB: get_pending_notification_chat_sessions
loop For each pending session
Email->>DB: handle_non_manual_session_completion<br/>(if not manual)
Email->>Email: _send_completion_email
end
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches🧪 Generate unit tests (beta)
|
🔍 PR Overlap DetectionThis check compares your PR against all other open PRs targeting the same branch to detect potential merge conflicts early. 🔴 Merge Conflicts DetectedThe following PRs have been tested and will have merge conflicts if merged after this PR. Consider coordinating with the authors.
🟡 Medium Risk — Some Line OverlapThese PRs have some overlapping changes:
🟢 Low Risk — File Overlap OnlyThese PRs touch the same files but different sections (click to expand)
Summary: 14 conflict(s), 1 medium risk, 28 low risk (out of 43 PRs with file overlap) Auto-generated on push. Ignores: |
|
|
||
|
|
||
| async def get_user_session_count(user_id: str) -> int: | ||
| async def get_pending_notification_chat_sessions( |
There was a problem hiding this comment.
🤖 🟠 Should Fix: get_pending_notification_chat_sessions does a global scan — it filters on startType != MANUAL AND completedAt IS NOT NULL AND notificationEmailSentAt IS NULL AND notificationEmailSkippedAt IS NULL, but the only new index is (userId, startType, updatedAt). Without a userId predicate, that index can't be used here. At scale this is a full table scan every 30 minutes.
Consider a partial index:
CREATE INDEX "ChatSession_pending_notification_idx"
ON "ChatSession" ("updatedAt")
WHERE "startType" != 'MANUAL'
AND "completedAt" IS NOT NULL
AND "notificationEmailSentAt" IS NULL
AND "notificationEmailSkippedAt" IS NULL;Or add completedAt, notificationEmailSentAt, notificationEmailSkippedAt to the existing index and always include a date range predicate in the query.
| ][:limit] | ||
|
|
||
|
|
||
| async def get_recent_completion_report_chat_sessions( |
There was a problem hiding this comment.
🤖 🟠 Should Fix: Over-fetches max(limit * 5, 10) rows with only startType != MANUAL as the filter, then discards rows without completion_report in Python. For a user with 50+ autopilot sessions that lack a completion report, the effective yield could be well under limit.
Either add a DB-level filter that checks if completionReport is not NULL (e.g., Prisma completionReport: { not: DbNull }), or document why this can't be done and make the multiplier configurable. As-is, callers can silently receive fewer results than requested.
| "callbackSessionMessage" TEXT NOT NULL, | ||
| "expiresAt" TIMESTAMP(3) NOT NULL, | ||
| "consumedAt" TIMESTAMP(3), | ||
| "consumedSessionId" TEXT, |
There was a problem hiding this comment.
🤖 🟡 Nice to Have: consumedSessionId has an index but no foreign key constraint, while the sibling column sourceSessionId has both. This means a ChatSessionCallbackToken can have a consumedSessionId pointing to a non-existent or deleted session with no referential integrity enforcement. Consider adding:
ALTER TABLE "ChatSessionCallbackToken"
ADD CONSTRAINT "ChatSessionCallbackToken_consumedSessionId_fkey"
FOREIGN KEY ("consumedSessionId") REFERENCES "ChatSession"("id")
ON DELETE SET NULL ON UPDATE CASCADE;| failed_count: int = 0 | ||
|
|
||
|
|
||
| async def _ensure_session_title_for_completed_session(session: ChatSession) -> None: |
There was a problem hiding this comment.
🤖 🟡 Nice to Have: If _generate_session_title raises (LLM timeout, API error, etc.), the exception propagates out of this function uncaught, which bubbles up through _process_pending_copilot_email_candidates and gets caught by _NON_FATAL_EMAIL_SWEEP_ERRORS — but only if the exception type matches. A generic Exception would escape that handler and abort the entire sweep iteration.
Even in the best case (exception is caught), the email goes out with subject "Autopilot update" instead of a meaningful title and there's no log entry explaining why. Suggest wrapping the _generate_session_title call in a try/except with a warning log:
try:
generated_title = await _generate_session_title(...)
title = generated_title.strip() if generated_title else ""
except Exception:
logger.warning("Failed to generate title for session %s", session.session_id, exc_info=True)
title = ""| try: | ||
| from backend.copilot.autopilot import handle_non_manual_session_completion | ||
|
|
||
| await handle_non_manual_session_completion(session_id) |
There was a problem hiding this comment.
🤖 🟡 Nice to Have: handle_non_manual_session_completion runs in the critical path of mark_session_completed with no timeout guard. If a DB call inside hangs (e.g., Prisma connection pool exhausted during a spike), this blocks the entire completion handler indefinitely.
The try/except Exception does protect against failures, but only once an exception is raised — a hung coroutine won't raise. Consider wrapping with asyncio.wait_for(..., timeout=30) and catching asyncio.TimeoutError alongside the generic handler, or running it via asyncio.create_task to decouple it from the hot path.
| return result | ||
|
|
||
|
|
||
| async def _send_nightly_copilot_emails() -> int: |
There was a problem hiding this comment.
🤖 🔵 Nit: _send_nightly_copilot_emails returns processed_count (= sent + skipped + failed), which excludes running_count and repair_queued_count. This means the scheduler log shows e.g. 150 when 200 candidates were picked up and 50 were deferred — the 50 running sessions look like they were never seen.
Consider returning the full PendingCopilotEmailSweepResult struct from the inner function and logging a richer summary at the outer layer (or at minimum returning candidate_count for the scheduler).
| } | ||
|
|
||
| for user in batch: | ||
| if not await is_feature_enabled( |
There was a problem hiding this comment.
🤖 🟠 Should Fix: The dispatch loop processes all 500 users in a batch sequentially, with is_feature_enabled (which hits Supabase to build the LaunchDarkly context) called for every user — even users who will immediately fail the midnight check. This is the dominant cost:
for user in batch:
if not await is_feature_enabled(Flag.NIGHTLY_COPILOT, user.id, ...): # Supabase DB call
continue # most users exit here, but AFTER the network round-tripAt 500 users × ~10 ms per flag call = 5+ seconds per batch, 10k users = 50+ seconds just for flag evaluation. The 30-minute cron window will hold, but this leaves little headroom and will silently degrade as user count grows.
Two fixes, in order of impact:
- Bulk flag evaluation — LaunchDarkly supports bulk context evaluation; batch all user contexts and evaluate them in one call, or add a TTL cache (e.g.,
functools.lru_cache+ expiry, or Redis) so the same user is not re-fetched on every 30-min run. - Parallelize eligible users — Move the per-user session-creation work into
asyncio.gatherafter the flag/midnight checks have filtered the cohort down:
eligible = [u for u in batch if passes_fast_checks(u)]
await asyncio.gather(*[_dispatch_for_user(u, ...) for u in eligible])| return False | ||
| if invited_user.created_at > now_utc - invite_cta_delay: | ||
| return False | ||
| if await _session_exists_for_execution_tag(user.id, get_invite_cta_execution_tag()): |
There was a problem hiding this comment.
🤖 🟡 Nice to Have: _try_create_invite_cta_session (line 210) and _try_create_callback_session (line 254) each call _session_exists_for_execution_tag explicitly, then immediately call _create_autopilot_session — which calls it again for the same execution tag. This is a redundant double round-trip to DB for every eligible user.
Remove the pre-check from _try_create_* helpers and rely solely on the guard inside _create_autopilot_session, or vice versa. As-is the responsibility is split confusingly across two layers.
|
|
||
|
|
||
| def get_callback_execution_tag() -> str: | ||
| return AUTOPILOT_CALLBACK_TAG |
There was a problem hiding this comment.
🤖 🟠 Should Fix: AUTOPILOT_CALLBACK_TAG = "autopilot-callback:v1" is a static string — every user can only ever receive ONE callback session via the dispatch loop, for their entire account lifetime. Once a callback session exists for a user, the dispatch loop silently skips them forever.
This is likely intentional for the v1 rollout, but it means:
- A user who got a callback in Feb 2026 and ignored it will never get another one, no matter how much they use the product later.
- There is no "re-engagement" path through dispatch — only admin override.
If this is intentional, add a comment explaining the one-shot design. If not, the tag should incorporate a date or cohort period (like nightly does with autopilot-nightly:{date}) so users can re-enter the callback funnel.
|
This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request. |
Changes 🏗️
ChatSessionmetadata, execution tags, callback tokens, and thecompletion_reportrepair flowChecklist 📋
For code changes:
mise exec node@22 -- make formatpoetry run pytest autogpt_platform/backend/backend/copilot/autopilot_test.py autogpt_platform/backend/backend/api/features/chat/routes_test.py autogpt_platform/backend/backend/copilot/sdk/tool_adapter_test.py autogpt_platform/backend/backend/copilot/sdk/service_test.py autogpt_platform/backend/backend/copilot/tools/completion_report_test.py autogpt_platform/backend/backend/notifications/email_test.py autogpt_platform/backend/backend/copilot/model_test.py::test_chatsession_serialization_deserializationFor configuration changes:
.env.defaultis updated or already compatible with my changesdocker-compose.ymlis updated or already compatible with my changes