Skip to content

Claude Optimize: Use structured outputs to guarantee valid JSON from Claude, +3 more#1283

Open
saharmor wants to merge 1 commit intotrycua:mainfrom
saharmor:claude-optimize/00715f26-575e-4b96-87f1-26e365833235
Open

Claude Optimize: Use structured outputs to guarantee valid JSON from Claude, +3 more#1283
saharmor wants to merge 1 commit intotrycua:mainfrom
saharmor:claude-optimize/00715f26-575e-4b96-87f1-26e365833235

Conversation

@saharmor
Copy link
Copy Markdown

@saharmor saharmor commented Apr 8, 2026

Summary

This PR applies 4 optimization(s) identified by Claude Optimize, an automated tool that scans your codebase for Claude API and agentic SDK usage patterns and suggests improvements to reduce cost, latency, and improve reliability.

Optimizations applied

Use structured outputs to guarantee valid JSON from Claude

libs/python/cua-cli/cua_cli/commands/do.py

Claude Haiku 4.5 supports native structured outputs via the tool_use pattern. Instead of asking Claude to respond with JSON in a prompt and hoping it parses correctly, define a tool whose input_schema matches the desired output shape. Claude will be forced to return valid JSON matching the schema — no parsing failures, no fallback to raw text, and you can remove the format instructions from the prompt entirely. This eliminates the JSONDecodeError fallback path and guarantees you always get the structured element list.

Expected impact: High reliability improvement

Read more in the docs

Upgrade Sonnet 4.0 defaults to Sonnet 4.6 across cua-bench

libs/cua-bench/cua_bench/agents/cua_agent.py

claude-sonnet-4-6 is the latest Sonnet model and a direct upgrade from claude-sonnet-4-20250514 at the same price ($3/$15 per MTok). Key benefits for this computer-use agent: 5x larger context window (1M vs 200K tokens) allows much longer agent trajectories before hitting context limits, improved tool-use reliability reduces wasted steps, and the model is already supported by the existing MODEL_TOOL_MAPPING in anthropic.py (which correctly routes it to the computer-use-2025-11-24 beta flag). Apply the same change in cloud.py:112 and manifest.json:71. Note: Sonnet 4.6 defaults to 'high' effort, which may increase latency per step. If step latency is a concern for benchmarking, consider setting effort to 'medium' via output_config.

Expected impact: Medium latency reduction, Medium reliability improvement

Read more in the docs

Update Gradio UI model menu to include Sonnet 4.6 and Opus 4.6

libs/python/agent/agent/ui/gradio/app.py

The Gradio UI model selector is stuck on models from early 2025. claude-sonnet-4-6 should be the new default — it's the same price as Sonnet 3.7 ($3/$15) but with dramatically better capabilities: 1M context window, improved computer-use performance, and better tool-use reliability. Add Opus 4.6 as the premium option (same $5/$25 price as Opus 4.5, but 3x cheaper than the currently-listed Opus 4.0 at $15/$75). Keep one older model as a fallback option if needed.

Expected impact: Medium latency reduction, Medium reliability improvement

Read more in the docs

Add cache_control to tool definitions in Anthropic agent loop

libs/python/agent/agent/loops/anthropic.py

Anthropic's prompt caching uses prefix matching in the order: tools → system → messages. The tools array is identical across every turn of a computer-use session, making it the ideal candidate for caching. By adding cache_control to the last tool in the array, the entire tools prefix is cached after the first request. Subsequent requests read the tool definitions from cache at ~90% lower cost. This is especially impactful for computer-use agents that run many turns (10-50+) per session with the same tool set. Since tools form the first prefix segment, caching them also ensures the tools prefix remains stable for downstream system/message caching.

Expected impact: High cost reduction, Medium latency reduction

Read more in the docs


Generated by Claude Optimize. Claude Optimize is an open-source tool that analyzes how your project uses the Claude API and Anthropic's agentic SDKs, identifies optimization opportunities (prompt caching, batching, model selection, and more), and can automatically apply the recommended changes.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added Claude Sonnet 4.5 and Claude Opus 4.6 model options.
    • Enhanced screenshot analysis with improved response handling.
  • Updates

    • Changed default model to Claude Sonnet 4.6.
    • Optimized prompt caching with refined cache control strategy.

@vercel
Copy link
Copy Markdown
Contributor

vercel bot commented Apr 8, 2026

@saharmor is attempting to deploy a commit to the Cua Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 8, 2026

📝 Walkthrough

Walkthrough

These changes update Anthropic model configurations across multiple components, transitioning to newer model versions (claude-sonnet-4-6), enhance caching behavior in the agent loop with configurable breakpoints, and refactor snapshot analysis from JSON parsing to schema-backed tool-based response processing.

Changes

Cohort / File(s) Summary
Model Configuration Updates
libs/cua-bench/cua_bench/agents/cua_agent.py, libs/python/agent/agent/ui/gradio/app.py
Updated default Anthropic models from older versions (claude-sonnet-4-20250514, claude-3-7-sonnet-20250219) to claude-sonnet-4-6; added claude-opus-4-6 and claude-sonnet-4-5-20250929 mappings in UI; removed deprecated model versions.
Cache Control Enhancement
libs/python/agent/agent/loops/anthropic.py
Added configurable max_breakpoints parameter to _add_cache_control() with default value of 4; updated cache control logic in predict_step to apply ephemeral caching to tools list and adjusted message caching with max_breakpoints=3.
Tool-Based Response Processing
libs/python/cua-cli/cua_cli/commands/do.py
Refactored snapshot analysis from requiring JSON text response to tool-based response with defined schema; replaced JSON parsing with tool-use extraction via tool_choice enforcement and direct schema field access.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Hoppity-hop, the models ascend,
From Sonnet to Opus, on updates we depend!
Cache breakpoints now dance with precision so fine,
Tools parse the snapshots—a design so divine!
Configuration updated, control flow refined,
A meadow of changes, so clever designed! 🌱

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title identifies the primary optimization (structured outputs for guaranteed JSON) and references additional changes. It directly aligns with the main theme of applying Claude Optimize improvements across multiple files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@libs/python/cua-cli/cua_cli/commands/do.py`:
- Around line 861-865: The code uses next(b for b in response.content if b.type
== "tool_use") which will raise StopIteration if no tool_use block exists;
change this to use next(..., None) and then explicitly handle a missing
tool_block by logging or raising a clear, specific error (e.g., "missing
tool_use block in Claude response") before proceeding to access
parsed/summary/elements; update the block around response, tool_block, parsed,
summary, and elements in do.py to check for None and bail with a helpful message
instead of relying on the outer except.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b8cf2ade-4de0-4198-b35b-e50776ce7da2

📥 Commits

Reviewing files that changed from the base of the PR and between 1e62756 and 0e14600.

📒 Files selected for processing (4)
  • libs/cua-bench/cua_bench/agents/cua_agent.py
  • libs/python/agent/agent/loops/anthropic.py
  • libs/python/agent/agent/ui/gradio/app.py
  • libs/python/cua-cli/cua_cli/commands/do.py

Comment on lines +118 to +121
"default": "anthropic/claude-sonnet-4-6",
"Anthropic: Claude Opus 4.6": "anthropic/claude-opus-4-6",
"Anthropic: Claude Sonnet 4.6": "anthropic/claude-sonnet-4-6",
"Anthropic: Claude Sonnet 4.5": "anthropic/claude-sonnet-4-5-20250929",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Model label mismatch causes silent fallback to the wrong Anthropic model.

At Lines 118-121, the old keys were removed, but libs/python/agent/agent/ui/gradio/ui_components.py (Lines 41-52) still emits the old labels. Because get_model_string() falls back at Line 150, user selection can be ignored without warning.

Proposed backward-compatible fix
 "anthropic": {
     "default": "anthropic/claude-sonnet-4-6",
+    # Backward-compatible labels still used by ui_components.py
+    "Anthropic: Claude 4 Opus (20250514)": "anthropic/claude-opus-4-6",
+    "Anthropic: Claude 4 Sonnet (20250514)": "anthropic/claude-sonnet-4-6",
+    "Anthropic: Claude 3.7 Sonnet (20250219)": "anthropic/claude-sonnet-4-5-20250929",
     "Anthropic: Claude Opus 4.6": "anthropic/claude-opus-4-6",
     "Anthropic: Claude Sonnet 4.6": "anthropic/claude-sonnet-4-6",
     "Anthropic: Claude Sonnet 4.5": "anthropic/claude-sonnet-4-5-20250929",
 },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"default": "anthropic/claude-sonnet-4-6",
"Anthropic: Claude Opus 4.6": "anthropic/claude-opus-4-6",
"Anthropic: Claude Sonnet 4.6": "anthropic/claude-sonnet-4-6",
"Anthropic: Claude Sonnet 4.5": "anthropic/claude-sonnet-4-5-20250929",
"default": "anthropic/claude-sonnet-4-6",
# Backward-compatible labels still used by ui_components.py
"Anthropic: Claude 4 Opus (20250514)": "anthropic/claude-opus-4-6",
"Anthropic: Claude 4 Sonnet (20250514)": "anthropic/claude-sonnet-4-6",
"Anthropic: Claude 3.7 Sonnet (20250219)": "anthropic/claude-sonnet-4-5-20250929",
"Anthropic: Claude Opus 4.6": "anthropic/claude-opus-4-6",
"Anthropic: Claude Sonnet 4.6": "anthropic/claude-sonnet-4-6",
"Anthropic: Claude Sonnet 4.5": "anthropic/claude-sonnet-4-5-20250929",

Comment on lines +861 to +865
# Extract the tool use result — guaranteed valid JSON matching the schema
tool_block = next(b for b in response.content if b.type == "tool_use")
parsed = tool_block.input
summary = parsed.get("summary", "")
elements = parsed.get("elements", [])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Handle missing tool_use block gracefully.

Using next() without a default raises StopIteration if Claude's response unexpectedly lacks a tool_use block. While caught by the outer except Exception, the error message would be unhelpful.

🛡️ Proposed fix
             # Extract the tool use result — guaranteed valid JSON matching the schema
-            tool_block = next(b for b in response.content if b.type == "tool_use")
+            tool_block = next((b for b in response.content if b.type == "tool_use"), None)
+            if tool_block is None:
+                await _print_context(state["provider"], state.get("name", ""), state)
+                return _fail("AI response missing tool_use block")
             parsed = tool_block.input
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@libs/python/cua-cli/cua_cli/commands/do.py` around lines 861 - 865, The code
uses next(b for b in response.content if b.type == "tool_use") which will raise
StopIteration if no tool_use block exists; change this to use next(..., None)
and then explicitly handle a missing tool_block by logging or raising a clear,
specific error (e.g., "missing tool_use block in Claude response") before
proceeding to access parsed/summary/elements; update the block around response,
tool_block, parsed, summary, and elements in do.py to check for None and bail
with a helpful message instead of relying on the outer except.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant