Claude Optimize: Use structured outputs to guarantee valid JSON from Claude, +3 more#1283
Conversation
|
@saharmor is attempting to deploy a commit to the Cua Team on Vercel. A member of the Team first needs to authorize it. |
📝 WalkthroughWalkthroughThese changes update Anthropic model configurations across multiple components, transitioning to newer model versions (claude-sonnet-4-6), enhance caching behavior in the agent loop with configurable breakpoints, and refactor snapshot analysis from JSON parsing to schema-backed tool-based response processing. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@libs/python/cua-cli/cua_cli/commands/do.py`:
- Around line 861-865: The code uses next(b for b in response.content if b.type
== "tool_use") which will raise StopIteration if no tool_use block exists;
change this to use next(..., None) and then explicitly handle a missing
tool_block by logging or raising a clear, specific error (e.g., "missing
tool_use block in Claude response") before proceeding to access
parsed/summary/elements; update the block around response, tool_block, parsed,
summary, and elements in do.py to check for None and bail with a helpful message
instead of relying on the outer except.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b8cf2ade-4de0-4198-b35b-e50776ce7da2
📒 Files selected for processing (4)
libs/cua-bench/cua_bench/agents/cua_agent.pylibs/python/agent/agent/loops/anthropic.pylibs/python/agent/agent/ui/gradio/app.pylibs/python/cua-cli/cua_cli/commands/do.py
| "default": "anthropic/claude-sonnet-4-6", | ||
| "Anthropic: Claude Opus 4.6": "anthropic/claude-opus-4-6", | ||
| "Anthropic: Claude Sonnet 4.6": "anthropic/claude-sonnet-4-6", | ||
| "Anthropic: Claude Sonnet 4.5": "anthropic/claude-sonnet-4-5-20250929", |
There was a problem hiding this comment.
Model label mismatch causes silent fallback to the wrong Anthropic model.
At Lines 118-121, the old keys were removed, but libs/python/agent/agent/ui/gradio/ui_components.py (Lines 41-52) still emits the old labels. Because get_model_string() falls back at Line 150, user selection can be ignored without warning.
Proposed backward-compatible fix
"anthropic": {
"default": "anthropic/claude-sonnet-4-6",
+ # Backward-compatible labels still used by ui_components.py
+ "Anthropic: Claude 4 Opus (20250514)": "anthropic/claude-opus-4-6",
+ "Anthropic: Claude 4 Sonnet (20250514)": "anthropic/claude-sonnet-4-6",
+ "Anthropic: Claude 3.7 Sonnet (20250219)": "anthropic/claude-sonnet-4-5-20250929",
"Anthropic: Claude Opus 4.6": "anthropic/claude-opus-4-6",
"Anthropic: Claude Sonnet 4.6": "anthropic/claude-sonnet-4-6",
"Anthropic: Claude Sonnet 4.5": "anthropic/claude-sonnet-4-5-20250929",
},📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "default": "anthropic/claude-sonnet-4-6", | |
| "Anthropic: Claude Opus 4.6": "anthropic/claude-opus-4-6", | |
| "Anthropic: Claude Sonnet 4.6": "anthropic/claude-sonnet-4-6", | |
| "Anthropic: Claude Sonnet 4.5": "anthropic/claude-sonnet-4-5-20250929", | |
| "default": "anthropic/claude-sonnet-4-6", | |
| # Backward-compatible labels still used by ui_components.py | |
| "Anthropic: Claude 4 Opus (20250514)": "anthropic/claude-opus-4-6", | |
| "Anthropic: Claude 4 Sonnet (20250514)": "anthropic/claude-sonnet-4-6", | |
| "Anthropic: Claude 3.7 Sonnet (20250219)": "anthropic/claude-sonnet-4-5-20250929", | |
| "Anthropic: Claude Opus 4.6": "anthropic/claude-opus-4-6", | |
| "Anthropic: Claude Sonnet 4.6": "anthropic/claude-sonnet-4-6", | |
| "Anthropic: Claude Sonnet 4.5": "anthropic/claude-sonnet-4-5-20250929", |
| # Extract the tool use result — guaranteed valid JSON matching the schema | ||
| tool_block = next(b for b in response.content if b.type == "tool_use") | ||
| parsed = tool_block.input | ||
| summary = parsed.get("summary", "") | ||
| elements = parsed.get("elements", []) |
There was a problem hiding this comment.
Handle missing tool_use block gracefully.
Using next() without a default raises StopIteration if Claude's response unexpectedly lacks a tool_use block. While caught by the outer except Exception, the error message would be unhelpful.
🛡️ Proposed fix
# Extract the tool use result — guaranteed valid JSON matching the schema
- tool_block = next(b for b in response.content if b.type == "tool_use")
+ tool_block = next((b for b in response.content if b.type == "tool_use"), None)
+ if tool_block is None:
+ await _print_context(state["provider"], state.get("name", ""), state)
+ return _fail("AI response missing tool_use block")
parsed = tool_block.input🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@libs/python/cua-cli/cua_cli/commands/do.py` around lines 861 - 865, The code
uses next(b for b in response.content if b.type == "tool_use") which will raise
StopIteration if no tool_use block exists; change this to use next(..., None)
and then explicitly handle a missing tool_block by logging or raising a clear,
specific error (e.g., "missing tool_use block in Claude response") before
proceeding to access parsed/summary/elements; update the block around response,
tool_block, parsed, summary, and elements in do.py to check for None and bail
with a helpful message instead of relying on the outer except.
Summary
This PR applies 4 optimization(s) identified by Claude Optimize, an automated tool that scans your codebase for Claude API and agentic SDK usage patterns and suggests improvements to reduce cost, latency, and improve reliability.
Optimizations applied
Use structured outputs to guarantee valid JSON from Claude
libs/python/cua-cli/cua_cli/commands/do.pyClaude Haiku 4.5 supports native structured outputs via the
tool_usepattern. Instead of asking Claude to respond with JSON in a prompt and hoping it parses correctly, define a tool whoseinput_schemamatches the desired output shape. Claude will be forced to return valid JSON matching the schema — no parsing failures, no fallback to raw text, and you can remove the format instructions from the prompt entirely. This eliminates the JSONDecodeError fallback path and guarantees you always get the structured element list.Expected impact: High reliability improvement
Read more in the docs
Upgrade Sonnet 4.0 defaults to Sonnet 4.6 across cua-bench
libs/cua-bench/cua_bench/agents/cua_agent.pyclaude-sonnet-4-6 is the latest Sonnet model and a direct upgrade from claude-sonnet-4-20250514 at the same price ($3/$15 per MTok). Key benefits for this computer-use agent: 5x larger context window (1M vs 200K tokens) allows much longer agent trajectories before hitting context limits, improved tool-use reliability reduces wasted steps, and the model is already supported by the existing MODEL_TOOL_MAPPING in anthropic.py (which correctly routes it to the computer-use-2025-11-24 beta flag). Apply the same change in cloud.py:112 and manifest.json:71. Note: Sonnet 4.6 defaults to 'high' effort, which may increase latency per step. If step latency is a concern for benchmarking, consider setting effort to 'medium' via output_config.
Expected impact: Medium latency reduction, Medium reliability improvement
Read more in the docs
Update Gradio UI model menu to include Sonnet 4.6 and Opus 4.6
libs/python/agent/agent/ui/gradio/app.pyThe Gradio UI model selector is stuck on models from early 2025. claude-sonnet-4-6 should be the new default — it's the same price as Sonnet 3.7 ($3/$15) but with dramatically better capabilities: 1M context window, improved computer-use performance, and better tool-use reliability. Add Opus 4.6 as the premium option (same $5/$25 price as Opus 4.5, but 3x cheaper than the currently-listed Opus 4.0 at $15/$75). Keep one older model as a fallback option if needed.
Expected impact: Medium latency reduction, Medium reliability improvement
Read more in the docs
Add cache_control to tool definitions in Anthropic agent loop
libs/python/agent/agent/loops/anthropic.pyAnthropic's prompt caching uses prefix matching in the order: tools → system → messages. The tools array is identical across every turn of a computer-use session, making it the ideal candidate for caching. By adding cache_control to the last tool in the array, the entire tools prefix is cached after the first request. Subsequent requests read the tool definitions from cache at ~90% lower cost. This is especially impactful for computer-use agents that run many turns (10-50+) per session with the same tool set. Since tools form the first prefix segment, caching them also ensures the tools prefix remains stable for downstream system/message caching.
Expected impact: High cost reduction, Medium latency reduction
Read more in the docs
Generated by Claude Optimize. Claude Optimize is an open-source tool that analyzes how your project uses the Claude API and Anthropic's agentic SDKs, identifies optimization opportunities (prompt caching, batching, model selection, and more), and can automatically apply the recommended changes.
Summary by CodeRabbit
Release Notes
New Features
Updates