Claude Optimize: Use structured outputs to guarantee valid JSON from Claude, +3 more by saharmor · Pull Request #1283 · trycua/cua

saharmor · 2026-04-08T19:22:22Z

Summary

This PR applies 4 optimization(s) identified by Claude Optimize, an automated tool that scans your codebase for Claude API and agentic SDK usage patterns and suggests improvements to reduce cost, latency, and improve reliability.

Optimizations applied

Use structured outputs to guarantee valid JSON from Claude

libs/python/cua-cli/cua_cli/commands/do.py

Claude Haiku 4.5 supports native structured outputs via the tool_use pattern. Instead of asking Claude to respond with JSON in a prompt and hoping it parses correctly, define a tool whose input_schema matches the desired output shape. Claude will be forced to return valid JSON matching the schema — no parsing failures, no fallback to raw text, and you can remove the format instructions from the prompt entirely. This eliminates the JSONDecodeError fallback path and guarantees you always get the structured element list.

Expected impact: High reliability improvement

Read more in the docs

Upgrade Sonnet 4.0 defaults to Sonnet 4.6 across cua-bench

libs/cua-bench/cua_bench/agents/cua_agent.py

claude-sonnet-4-6 is the latest Sonnet model and a direct upgrade from claude-sonnet-4-20250514 at the same price ($3/$15 per MTok). Key benefits for this computer-use agent: 5x larger context window (1M vs 200K tokens) allows much longer agent trajectories before hitting context limits, improved tool-use reliability reduces wasted steps, and the model is already supported by the existing MODEL_TOOL_MAPPING in anthropic.py (which correctly routes it to the computer-use-2025-11-24 beta flag). Apply the same change in cloud.py:112 and manifest.json:71. Note: Sonnet 4.6 defaults to 'high' effort, which may increase latency per step. If step latency is a concern for benchmarking, consider setting effort to 'medium' via output_config.

Expected impact: Medium latency reduction, Medium reliability improvement

Read more in the docs

Update Gradio UI model menu to include Sonnet 4.6 and Opus 4.6

libs/python/agent/agent/ui/gradio/app.py

The Gradio UI model selector is stuck on models from early 2025. claude-sonnet-4-6 should be the new default — it's the same price as Sonnet 3.7 ($3/$15) but with dramatically better capabilities: 1M context window, improved computer-use performance, and better tool-use reliability. Add Opus 4.6 as the premium option (same $5/$25 price as Opus 4.5, but 3x cheaper than the currently-listed Opus 4.0 at $15/$75). Keep one older model as a fallback option if needed.

Expected impact: Medium latency reduction, Medium reliability improvement

Read more in the docs

Add cache_control to tool definitions in Anthropic agent loop

libs/python/agent/agent/loops/anthropic.py

Anthropic's prompt caching uses prefix matching in the order: tools → system → messages. The tools array is identical across every turn of a computer-use session, making it the ideal candidate for caching. By adding cache_control to the last tool in the array, the entire tools prefix is cached after the first request. Subsequent requests read the tool definitions from cache at ~90% lower cost. This is especially impactful for computer-use agents that run many turns (10-50+) per session with the same tool set. Since tools form the first prefix segment, caching them also ensures the tools prefix remains stable for downstream system/message caching.

Expected impact: High cost reduction, Medium latency reduction

Read more in the docs

Generated by Claude Optimize. Claude Optimize is an open-source tool that analyzes how your project uses the Claude API and Anthropic's agentic SDKs, identifies optimization opportunities (prompt caching, batching, model selection, and more), and can automatically apply the recommended changes.

Summary by CodeRabbit

Release Notes

New Features
- Added Claude Sonnet 4.5 and Claude Opus 4.6 model options.
- Enhanced screenshot analysis with improved response handling.
Updates
- Changed default model to Claude Sonnet 4.6.
- Optimized prompt caching with refined cache control strategy.

…Claude, +3 more

vercel · 2026-04-08T19:22:44Z

@saharmor is attempting to deploy a commit to the Cua Team on Vercel.

A member of the Team first needs to authorize it.

coderabbitai · 2026-04-08T19:24:02Z

📝 Walkthrough

Walkthrough

These changes update Anthropic model configurations across multiple components, transitioning to newer model versions (claude-sonnet-4-6), enhance caching behavior in the agent loop with configurable breakpoints, and refactor snapshot analysis from JSON parsing to schema-backed tool-based response processing.

Changes

Cohort / File(s)	Summary
Model Configuration Updates `libs/cua-bench/cua_bench/agents/cua_agent.py`, `libs/python/agent/agent/ui/gradio/app.py`	Updated default Anthropic models from older versions (claude-sonnet-4-20250514, claude-3-7-sonnet-20250219) to claude-sonnet-4-6; added claude-opus-4-6 and claude-sonnet-4-5-20250929 mappings in UI; removed deprecated model versions.
Cache Control Enhancement `libs/python/agent/agent/loops/anthropic.py`	Added configurable `max_breakpoints` parameter to `_add_cache_control()` with default value of 4; updated cache control logic in `predict_step` to apply ephemeral caching to tools list and adjusted message caching with max_breakpoints=3.
Tool-Based Response Processing `libs/python/cua-cli/cua_cli/commands/do.py`	Refactored snapshot analysis from requiring JSON text response to tool-based response with defined schema; replaced JSON parsing with tool-use extraction via `tool_choice` enforcement and direct schema field access.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Hoppity-hop, the models ascend,
From Sonnet to Opus, on updates we depend!
Cache breakpoints now dance with precision so fine,
Tools parse the snapshots—a design so divine!
Configuration updated, control flow refined,
A meadow of changes, so clever designed! 🌱

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title identifies the primary optimization (structured outputs for guaranteed JSON) and references additional changes. It directly aligns with the main theme of applying Claude Optimize improvements across multiple files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@libs/python/cua-cli/cua_cli/commands/do.py`:
- Around line 861-865: The code uses next(b for b in response.content if b.type
== "tool_use") which will raise StopIteration if no tool_use block exists;
change this to use next(..., None) and then explicitly handle a missing
tool_block by logging or raising a clear, specific error (e.g., "missing
tool_use block in Claude response") before proceeding to access
parsed/summary/elements; update the block around response, tool_block, parsed,
summary, and elements in do.py to check for None and bail with a helpful message
instead of relying on the outer except.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b8cf2ade-4de0-4198-b35b-e50776ce7da2

📥 Commits

Reviewing files that changed from the base of the PR and between 1e62756 and 0e14600.

📒 Files selected for processing (4)

libs/cua-bench/cua_bench/agents/cua_agent.py
libs/python/agent/agent/loops/anthropic.py
libs/python/agent/agent/ui/gradio/app.py
libs/python/cua-cli/cua_cli/commands/do.py

coderabbitai · 2026-04-08T19:29:33Z

libs/python/agent/agent/ui/gradio/app.py

+        "default": "anthropic/claude-sonnet-4-6",
+        "Anthropic: Claude Opus 4.6": "anthropic/claude-opus-4-6",
+        "Anthropic: Claude Sonnet 4.6": "anthropic/claude-sonnet-4-6",
+        "Anthropic: Claude Sonnet 4.5": "anthropic/claude-sonnet-4-5-20250929",


⚠️ Potential issue | 🟠 Major

Model label mismatch causes silent fallback to the wrong Anthropic model.

At Lines 118-121, the old keys were removed, but libs/python/agent/agent/ui/gradio/ui_components.py (Lines 41-52) still emits the old labels. Because get_model_string() falls back at Line 150, user selection can be ignored without warning.

Proposed backward-compatible fix

"anthropic": { "default": "anthropic/claude-sonnet-4-6", + # Backward-compatible labels still used by ui_components.py + "Anthropic: Claude 4 Opus (20250514)": "anthropic/claude-opus-4-6", + "Anthropic: Claude 4 Sonnet (20250514)": "anthropic/claude-sonnet-4-6", + "Anthropic: Claude 3.7 Sonnet (20250219)": "anthropic/claude-sonnet-4-5-20250929", "Anthropic: Claude Opus 4.6": "anthropic/claude-opus-4-6", "Anthropic: Claude Sonnet 4.6": "anthropic/claude-sonnet-4-6", "Anthropic: Claude Sonnet 4.5": "anthropic/claude-sonnet-4-5-20250929", },

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

"default": "anthropic/claude-sonnet-4-6",

"Anthropic: Claude Opus 4.6": "anthropic/claude-opus-4-6",

"Anthropic: Claude Sonnet 4.6": "anthropic/claude-sonnet-4-6",

"Anthropic: Claude Sonnet 4.5": "anthropic/claude-sonnet-4-5-20250929",

"default": "anthropic/claude-sonnet-4-6",

# Backward-compatible labels still used by ui_components.py

"Anthropic: Claude 4 Opus (20250514)": "anthropic/claude-opus-4-6",

"Anthropic: Claude 4 Sonnet (20250514)": "anthropic/claude-sonnet-4-6",

"Anthropic: Claude 3.7 Sonnet (20250219)": "anthropic/claude-sonnet-4-5-20250929",

"Anthropic: Claude Opus 4.6": "anthropic/claude-opus-4-6",

"Anthropic: Claude Sonnet 4.6": "anthropic/claude-sonnet-4-6",

"Anthropic: Claude Sonnet 4.5": "anthropic/claude-sonnet-4-5-20250929",

coderabbitai · 2026-04-08T19:29:34Z

libs/python/cua-cli/cua_cli/commands/do.py

+            # Extract the tool use result — guaranteed valid JSON matching the schema
+            tool_block = next(b for b in response.content if b.type == "tool_use")
+            parsed = tool_block.input
+            summary = parsed.get("summary", "")
+            elements = parsed.get("elements", [])


⚠️ Potential issue | 🟡 Minor

Handle missing tool_use block gracefully.

Using next() without a default raises StopIteration if Claude's response unexpectedly lacks a tool_use block. While caught by the outer except Exception, the error message would be unhelpful.

🛡️ Proposed fix

# Extract the tool use result — guaranteed valid JSON matching the schema - tool_block = next(b for b in response.content if b.type == "tool_use") + tool_block = next((b for b in response.content if b.type == "tool_use"), None) + if tool_block is None: + await _print_context(state["provider"], state.get("name", ""), state) + return _fail("AI response missing tool_use block") parsed = tool_block.input

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@libs/python/cua-cli/cua_cli/commands/do.py` around lines 861 - 865, The code uses next(b for b in response.content if b.type == "tool_use") which will raise StopIteration if no tool_use block exists; change this to use next(..., None) and then explicitly handle a missing tool_block by logging or raising a clear, specific error (e.g., "missing tool_use block in Claude response") before proceeding to access parsed/summary/elements; update the block around response, tool_block, parsed, summary, and elements in do.py to check for None and bail with a helpful message instead of relying on the outer except.

Claude Optimize: Use structured outputs to guarantee valid JSON from …

0e14600

…Claude, +3 more

coderabbitai bot reviewed Apr 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Claude Optimize: Use structured outputs to guarantee valid JSON from Claude, +3 more#1283

Claude Optimize: Use structured outputs to guarantee valid JSON from Claude, +3 more#1283
saharmor wants to merge 1 commit intotrycua:mainfrom
saharmor:claude-optimize/00715f26-575e-4b96-87f1-26e365833235

saharmor commented Apr 8, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

vercel bot commented Apr 8, 2026

Uh oh!

coderabbitai bot commented Apr 8, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 8, 2026

Uh oh!

coderabbitai bot Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

saharmor commented Apr 8, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Optimizations applied

Use structured outputs to guarantee valid JSON from Claude

Upgrade Sonnet 4.0 defaults to Sonnet 4.6 across cua-bench

Update Gradio UI model menu to include Sonnet 4.6 and Opus 4.6

Add cache_control to tool definitions in Anthropic agent loop

Summary by CodeRabbit

Release Notes

Uh oh!

vercel bot commented Apr 8, 2026

Uh oh!

coderabbitai bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

saharmor commented Apr 8, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 8, 2026 •

edited

Loading