feat(gmail): add include_analysis flag to get_gmail_thread_content#702
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughAdds an optional include_analysis flag to get_gmail_thread_content (now returns str or {"content","analysis"}) and introduces helpers to normalize emails, parse timestamps with UTC coercion, and compute thread-ownership metadata (last non-draft sender, participants, per-sender counts, excluded drafts, ball-in-court). Changes
Sequence DiagramsequenceDiagram
participant Client as Client
participant Tool as get_gmail_thread_content
participant Gmail as Gmail API
participant Formatter as Content Formatter
participant Analyzer as Ownership Analyzer
Client->>Tool: call(thread_id, include_analysis=True)
Tool->>Gmail: threads.get(format="full")
Gmail-->>Tool: raw thread payload
Tool->>Formatter: format thread -> content (str)
Formatter-->>Tool: content
Tool->>Analyzer: _analyze_thread_ownership_impl(raw thread, user_email)
Analyzer->>Analyzer: normalize emails\nparse Date/internalDate (coerce naive→UTC)\nexclude DRAFTs\ncount messages by sender\ndetermine ball_in_court_of
Analyzer-->>Tool: analysis dict
Tool-->>Client: {"content": content, "analysis": analysis}
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
gmail/gmail_tools.py (1)
2442-2502: Optional: hoistemail.utils/datetime/collectionsimports to module level.
_normalize_email,_parse_date_header, and_analyze_thread_ownership_impleach importparseaddr/parsedate_to_datetime/getaddresses/Counter/datetime/timezoneinside the function body. The module already imports fromemail.utilsat the top; consolidating these at module scope avoids repeated import lookups on each call and matches the rest of the file's style. Purely stylistic; safe to defer.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gmail/gmail_tools.py` around lines 2442 - 2502, The functions _normalize_email, _parse_date_header, and _analyze_thread_ownership_impl perform repeated local imports (parseaddr, parsedate_to_datetime, getaddresses, Counter, datetime, timezone); hoist these imports to the module level instead: add the required imports (from email.utils import parseaddr, parsedate_to_datetime, getaddresses; from collections import Counter; from datetime import datetime, timezone) near the top of the file and remove the in-function import statements so the functions reference the module-level symbols directly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@gmail/gmail_tools.py`:
- Around line 2544-2567: The participant and sender counters are currently
adding malformed non-email tokens because _normalize_email can return strings
without "@"; update both places to only accept normalized values that contain
"@" before adding: in the getaddresses loop (the norm variable) only call
participants.add(norm) when norm and "@" in norm, and for the From header use
the from_norm check to require "@" in from_norm before incrementing
sender_counter[from_norm]; keep using _normalize_email and getaddresses but add
the explicit "@"-presence guard to both the participant loop and the
sender_counter increment to prevent free-text leakage.
---
Nitpick comments:
In `@gmail/gmail_tools.py`:
- Around line 2442-2502: The functions _normalize_email, _parse_date_header, and
_analyze_thread_ownership_impl perform repeated local imports (parseaddr,
parsedate_to_datetime, getaddresses, Counter, datetime, timezone); hoist these
imports to the module level instead: add the required imports (from email.utils
import parseaddr, parsedate_to_datetime, getaddresses; from collections import
Counter; from datetime import datetime, timezone) near the top of the file and
remove the in-function import statements so the functions reference the
module-level symbols directly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 174bc372-158b-47c5-b7fb-34020a11918e
📒 Files selected for processing (3)
gmail/gmail_tools.pytests/gmail/test_get_gmail_thread_content_analysis.pytests/gmail/test_thread_ownership_helpers.py
eb8fd67 to
c78b6ac
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
gmail/gmail_tools.py (1)
2467-2487: Good UTC coercion; one small thought on observability.Coercing naive
parsedate_to_datetimeresults to UTC before comparing withinternalDate-derived datetimes avoids theTypeErroron mixed-aware comparisons — nice catch in the docstring. Consider alogger.debugon theexceptbranches so a thread with an entirely malformed Date header (and nointernalDate) leaves a breadcrumb; currently both fallbacks swallow silently, which can make "why islast_timestampNone?" hard to diagnose in production.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gmail/gmail_tools.py` around lines 2467 - 2487, Add debug-level logging inside both except blocks so failures parsing the Date header or converting internal_date_ms leave breadcrumbs; specifically, in the block handling parsedate_to_datetime failures (around the parsedate_to_datetime call and dt variable) log the caught exception and the raw date_str, and in the block converting internal_date_ms to a timestamp (around the ms conversion and datetime.fromtimestamp call) log the exception and the raw internal_date_ms value so callers of the surrounding function (which computes the email timestamp used as the internalDate/last_timestamp fallback) can trace why None was returned.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@gmail/gmail_tools.py`:
- Around line 2551-2567: The participants set is being populated before the
draft check so draft recipients leak into participants and skew
ball_in_court_of; either move the participants.add(norm) block to after the
is_draft guard (so addresses are only added for non-draft messages) or change
the external participant calculation to derive from sender_counter (e.g.
external_participants = set(sender_counter.keys()) - {normalized_user}) which is
already draft-filtered; update occurrences around the participants loop, the
is_draft check, and the external_participants/last_sender_norm logic to use the
chosen approach (refer to participants, is_draft, sender_counter,
external_participants, last_sender_norm, normalized_user).
- Line 2453: Add the missing return type annotation to the _parse_date_header
function: update its signature def _parse_date_header(date_str: str,
internal_date_ms: str | int | None) -> tuple[Optional[str], Optional[datetime]]
and ensure Optional and datetime are imported (from typing import Optional or
use PEP 604 equivalents and from datetime import datetime) so the function
explicitly returns (Optional[str], Optional[datetime]) as documented; keep
parameter names date_str and internal_date_ms unchanged.
- Line 2355: The _parse_date_header helper currently lacks a return type hint;
update its signature to return tuple[str | None, datetime | None] (i.e.,
annotate _parse_date_header as -> tuple[str | None, datetime | None]) so it
satisfies the project's mandatory type hints; ensure the function body still
returns values matching that tuple shape and that datetime is available in the
module namespace.
---
Nitpick comments:
In `@gmail/gmail_tools.py`:
- Around line 2467-2487: Add debug-level logging inside both except blocks so
failures parsing the Date header or converting internal_date_ms leave
breadcrumbs; specifically, in the block handling parsedate_to_datetime failures
(around the parsedate_to_datetime call and dt variable) log the caught exception
and the raw date_str, and in the block converting internal_date_ms to a
timestamp (around the ms conversion and datetime.fromtimestamp call) log the
exception and the raw internal_date_ms value so callers of the surrounding
function (which computes the email timestamp used as the
internalDate/last_timestamp fallback) can trace why None was returned.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 19ce1b5f-425e-4fb2-b3dd-0260becdd1ec
📒 Files selected for processing (3)
gmail/gmail_tools.pytests/gmail/test_get_gmail_thread_content_analysis.pytests/gmail/test_thread_ownership_helpers.py
✅ Files skipped from review due to trivial changes (2)
- tests/gmail/test_get_gmail_thread_content_analysis.py
- tests/gmail/test_thread_ownership_helpers.py
c78b6ac to
24db65e
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (2)
gmail/gmail_tools.py (2)
2343-2434: Consider ananalysis_onlymode to keep payloads concise.When a "who-owes-whom" agent sets
include_analysis=True, the response still embeds the full formatted threadcontent. For long threads (many messages, HTML-converted bodies), that inflates the LLM context with data the caller may not need. A follow-up could let callers opt out of thecontentpayload (e.g., an additionalresponse_mode: Literal["content", "analysis", "both"] = "content"parameter, or returning just the analysis dict wheninclude_analysis="only").As per coding guidelines: "Tool response payloads should be concise; redact verbose HTML/email bodies by default and offer links or IDs to fetch full content."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gmail/gmail_tools.py` around lines 2343 - 2434, Add an option to return only the ownership analysis to avoid sending the full formatted thread: extend get_gmail_thread_content signature (e.g., add response_mode: Literal["content","analysis","both"] = "content" or a boolean analysis_only flag) and update its docstring/return type accordingly; inside get_gmail_thread_content, call _analyze_thread_ownership_impl(thread_response, user_google_email) before or instead of building content when response_mode=="analysis" (or analysis_only True) and skip calling _format_thread_content/_fetch_raw_message_contents to avoid heavy payloads; preserve current default behavior ("content"/both) for backwards compatibility and ensure callers still receive {"content", "analysis"} when response_mode=="both"; update logging and any callers/tests that expect the old return shape.
2533-2533: ParameterizeCounterfor strict typing.Per coding guidelines requiring mypy-strict type hints, annotate the element type. All keys are normalized email strings, so
Counter[str]is precise.♻️ Proposed diff
- sender_counter: Counter = Counter() + sender_counter: Counter[str] = Counter()As per coding guidelines: "Use mandatory type hints (mypy strict)".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@gmail/gmail_tools.py` at line 2533, The variable sender_counter is currently unparameterized; update its annotation from "sender_counter: Counter = Counter()" to a generic form like "sender_counter: Counter[str] = Counter()" to satisfy mypy-strict, and ensure the appropriate generic Counter is available (either use typing.Counter or collections.Counter with generics depending on project Python typing conventions) so the key type (normalized email strings) is enforced.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@gmail/gmail_tools.py`:
- Around line 2343-2434: Add an option to return only the ownership analysis to
avoid sending the full formatted thread: extend get_gmail_thread_content
signature (e.g., add response_mode: Literal["content","analysis","both"] =
"content" or a boolean analysis_only flag) and update its docstring/return type
accordingly; inside get_gmail_thread_content, call
_analyze_thread_ownership_impl(thread_response, user_google_email) before or
instead of building content when response_mode=="analysis" (or analysis_only
True) and skip calling _format_thread_content/_fetch_raw_message_contents to
avoid heavy payloads; preserve current default behavior ("content"/both) for
backwards compatibility and ensure callers still receive {"content", "analysis"}
when response_mode=="both"; update logging and any callers/tests that expect the
old return shape.
- Line 2533: The variable sender_counter is currently unparameterized; update
its annotation from "sender_counter: Counter = Counter()" to a generic form like
"sender_counter: Counter[str] = Counter()" to satisfy mypy-strict, and ensure
the appropriate generic Counter is available (either use typing.Counter or
collections.Counter with generics depending on project Python typing
conventions) so the key type (normalized email strings) is enforced.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 174b4d83-5f42-4e4e-b974-da99d9c1bdf2
📒 Files selected for processing (3)
gmail/gmail_tools.pytests/gmail/test_get_gmail_thread_content_analysis.pytests/gmail/test_thread_ownership_helpers.py
✅ Files skipped from review due to trivial changes (2)
- tests/gmail/test_get_gmail_thread_content_analysis.py
- tests/gmail/test_thread_ownership_helpers.py
24db65e to
75d5c35
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@gmail/gmail_tools.py`:
- Around line 2469-2477: The parsed date can be aware-with-offset so the
function must normalize all datetimes to UTC before returning; update the code
that uses parsedate_to_datetime to call astimezone(timezone.utc) for both naive
and aware datetimes (so the returned dt and dt.isoformat() are UTC), ensuring
the value assigned to analysis.last_timestamp (and any callers expecting
"ISO-8601, UTC-aware") always has a UTC tzinfo.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e3ec2613-91c5-441c-8a75-30f428d6dc75
📒 Files selected for processing (3)
gmail/gmail_tools.pytests/gmail/test_get_gmail_thread_content_analysis.pytests/gmail/test_thread_ownership_helpers.py
✅ Files skipped from review due to trivial changes (1)
- tests/gmail/test_thread_ownership_helpers.py
75d5c35 to
26a7ce1
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@gmail/gmail_tools.py`:
- Around line 2610-2650: The payload currently returns last_sender as the raw
From header (last_sender_raw) while callers and other fields use normalized
addresses; change the return to include both the raw header and a normalized
email field (e.g., keep "last_sender" as last_sender_raw or rename to
"last_sender_raw" and add "last_sender_email" set to last_sender_norm/normalized
email) so programmatic consumers can reliably index message_count_by_sender and
ball-in-court logic remains unchanged; update the tool/module docstring to
document the two fields and their formats and ensure the returned dict from the
function that builds the thread analysis includes the new key name you choose
(refer to last_sender_raw, last_sender_norm, normalized_user, and the returned
dict).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: dd934f8e-f62b-4494-a667-53a25abb0b2d
📒 Files selected for processing (3)
gmail/gmail_tools.pytests/gmail/test_get_gmail_thread_content_analysis.pytests/gmail/test_thread_ownership_helpers.py
✅ Files skipped from review due to trivial changes (1)
- tests/gmail/test_thread_ownership_helpers.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/gmail/test_get_gmail_thread_content_analysis.py
…is flag Bundles thread-ownership analysis (last sender, ball-in-court verdict, per-sender message counts, participants) into the existing thread-read tool as an opt-in flag, rather than adding a standalone tool. Per maintainer guidance on PR taylorwilsdon#701: adding a new @server.tool would push against the OpenAI native tool-calling payload limit, and the analysis is only useful when reading a thread anyway, so a separate call would be wasteful. Design: - `get_gmail_thread_content` gains `include_analysis: bool = False`. - When False (default), returns the same formatted string — behavior unchanged for every existing caller. - When True, returns a dict: {"content": <existing formatted string>, "analysis": <ownership metadata dict>}. Structured so downstream agents don't have to parse prose. Analysis helpers (`_analyze_thread_ownership_impl`, `_normalize_email`, `_parse_date_header`) are pure functions added to `gmail/gmail_tools.py`. They handle: - Plus-addressing normalization (alex+foo@domain → alex@domain) - Drafts (label DRAFT) excluded from last-message determination; counted separately in `excluded_drafts` - Malformed Date headers fall back to Gmail's `internalDate` (Unix ms) - All timestamps coerced to UTC-aware at parse time so mixed naive/aware comparisons don't raise TypeError - Quoted display names with embedded commas (e.g., `"Doe, John" <john@ex.com>, vendor@ex.com`) parsed via `email.utils.getaddresses` — not naive `split(",")` - Self-only threads return `ball_in_court_of=None` (no external party) - Unparseable From headers return `ball_in_court_of=None` `ball_in_court_of` names whose court the ball is in: "user" → authenticated user owes the next reply (external sent last) "them" → other party owes the next reply (user sent last) None → unresolvable Tests: - `tests/gmail/test_thread_ownership_helpers.py` — 21 pytest cases covering the pure helpers (normalization, date parsing, ball-in-court logic, drafts, multi-party threads, empty/malformed thread data, quoted-comma recipients, naive/aware datetime regression). - `tests/gmail/test_get_gmail_thread_content_analysis.py` — 5 integration cases verifying the flag toggles the return shape correctly: default returns str, True returns dict with both keys, analysis shape matches helper contract, content parity across modes. Full gmail suite: 95 passed. Known limitation: `user_google_email` is a single address. A user with multiple aliases (e.g., personal + work) won't be recognized as the same identity across a multi-account thread. Documented in the docstring. Supersedes closed PR taylorwilsdon#701, which proposed the same functionality as a standalone `analyze_thread_ownership` tool.
26a7ce1 to
772c6fc
Compare
|
Thanks! |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
|
@coderabbitai re-review please |
|
🧠 Learnings used✅ Actions performedFull review triggered. |
Description
Supersedes #701 per your feedback. Instead of adding a new
@server.tool, this PR bundles the thread-ownership analysis into the existingget_gmail_thread_contentas an opt-ininclude_analysis: bool = Falseparameter.Existing behavior preserved. When
include_analysis=False(the default),get_gmail_thread_contentreturns the same formatted string it always has. Wheninclude_analysis=True, it returns a dict{"content": <existing string>, "analysis": <ownership metadata>}. No existing caller is affected.Why the analysis exists. A common MCP-client pattern is synthesizing who-owes-whom from a thread. Clients often infer this from a date-range search on sent/received messages, which is fragile — same-day follow-ups get missed, thread context is lost, and the "last message" determination ends up wrong. The helper reads the actual thread chronology returned by
threads().get(format='full')and extracts structured metadata.Analysis shape
{ "thread_id": str, "thread_subject": str | None, "last_sender": str | None, # raw "Name <addr>" "last_timestamp": str | None, # ISO-8601, UTC-aware "ball_in_court_of": "user" | "them" | None, "message_count_by_sender": {normalized_email: int}, "participants": [normalized_email, ...], # sorted, union of From/To/Cc "excluded_drafts": int, "message_count": int, }ball_in_court_ofnames whose court the ball is in (who owes the next reply):"user"— authenticated user owes the reply (external party sent last)"them"— other party owes the reply (user sent last)None— unresolvable (all drafts, self-only thread, or malformed headers)Edge cases handled
alex+foo@domain.comnormalizes toalex@domain.comfor identity comparison.DRAFTlabel are excluded from the last-message determination and counted separately inexcluded_drafts. A newer draft does NOT flip ball-in-court.internalDate(Unix ms) for chronological ordering.email.utils.parsedate_to_datetimereturns a naive datetime when the Date header has no timezone. TheinternalDatefallback is aware UTC. Mixing them in>/<comparisons raisesTypeErrorat runtime — the helper coerces naive → UTC at parse time so downstream comparisons are safe. Regression test included.email.utils.getaddressesacross From/To/Cc in one pass, not naivesplit(","). Handles quoted display names with embedded commas like"Doe, John" <john@example.com>, vendor@example.comcorrectly.ball_in_court_of=Nonerather than silently defaulting.Type of Change
Testing
Added two test files:
tests/gmail/test_thread_ownership_helpers.py— 21 pytest cases exercising the pure helpers (_analyze_thread_ownership_impl,_normalize_email,_parse_date_header) against fabricatedthreads.get(format='full')responses. No network, no credentials.tests/gmail/test_get_gmail_thread_content_analysis.py— 5 integration cases covering the wire-up: default returnsstr, flag=True returns dict with both keys, analysis shape matches the documented contract, ball-in-court verdict is correct, content is byte-identical across both modes.Full
tests/gmail/suite passes: 95 passed.Checklist
Known limitation
user_google_emailis a single address. A user with multiple aliases (e.g., personal + work) won't be recognized as the same identity across a multi-account thread. Documented in the docstring.Related
Closes #701 (replaced by this PR per maintainer guidance to bundle rather than add a new tool).
Summary by CodeRabbit
New Features
Tests