Skip to content

feat(agent): add OpenAI Responses API with auto endpoint detection#604

Merged
tlongwell-block merged 3 commits into
mainfrom
feat/openai-responses-api
May 17, 2026
Merged

feat(agent): add OpenAI Responses API with auto endpoint detection#604
tlongwell-block merged 3 commits into
mainfrom
feat/openai-responses-api

Conversation

@tlongwell-block
Copy link
Copy Markdown
Collaborator

Summary

OpenAI's GPT-5 and o-series models on api.openai.com require the Responses API (/v1/responses) for tool calling; the legacy Chat Completions endpoint rejects them with unsupported_parameter. Meanwhile every OpenAI-compatible server in the wild (vLLM, Ollama, llama.cpp, OpenRouter, Block Gateway, Databricks) still speaks only Chat Completions. This change teaches sprout-agent both dialects and routes between them automatically.

New configuration

OPENAI_COMPAT_API={auto,chat,responses}   # default: auto

auto picks Responses when OPENAI_COMPAT_BASE_URL points at an *.openai.com host, and Chat Completions everywhere else. Operators can pin the choice explicitly for providers that diverge from the default (e.g. a Responses-compatible self-hosted gateway, or a *.openai.com host that for some reason needs the legacy endpoint).

Implementation

config.rs

  • New OpenAiApi enum (ChatCompletions / Responses) on Config.
  • parse_openai_api_env() parses OPENAI_COMPAT_API; auto_openai_api() picks per base_url.
  • Zero-dep host extractor with a lookalike-safety test (api.openai.com.evil.example → Chat Completions, not Responses).

llm.rs

  • Provider::OpenAi now dispatches on cfg.openai_api for both complete() and summarize().
  • New responses_body + parse_responses (with responses_image_user_content, responses_stop helpers) handle the Responses wire shape:
    • Flat tool schema ({type, name, description, parameters} — no nested function: {…}).
    • input[] of typed items: typed user/assistant messages, function_call, function_call_output.
    • max_output_tokens (not max_tokens / max_completion_tokens).
    • Serializer emits each prior assistant function_call before its matching function_call_output — the API rejects with "No tool call found for call_id ..." otherwise (caught in live testing).
    • Parser walks output[], collects message content as text and function_call items as ToolCall. Skips reasoning items (stateless across turns; carrying them requires the encrypted-passthrough flow we don't need).
    • Stop mapping: incomplete + reason=max_output_tokens → MaxTokens; completed with function_callToolUse; completed otherwise → EndTurn.

README.md

  • Provider table updated to show per-provider endpoint under auto.
  • New env documented.

Tests

11 new unit tests (in llm.rs::tests and config.rs::tests), all passing:

  • responses_body_top_level_shapeinstructions/max_output_tokens/input; tools are flat; no stray messages/max_tokens/max_completion_tokens fields
  • responses_body_replay_emits_function_call_before_output — pins the replay-ordering invariant
  • responses_body_skips_empty_assistant_text — mirrors #559/#560 behavior; tool_calls still serialized
  • responses_body_image_tool_result_attaches_input_image — image attachment via trailing input_image user message
  • parse_responses_completed_with_text_is_end_turn
  • parse_responses_completed_with_function_call_is_tool_use — also verifies reasoning items are skipped
  • parse_responses_incomplete_max_output_tokens
  • parse_responses_rejects_malformed_function_arguments
  • auto_openai_api_picks_responses_for_official_openai
  • auto_openai_api_picks_chat_for_third_parties — vLLM/Ollama/OpenRouter/Block Gateway/self-hosted vLLM/malformed input
  • auto_openai_api_does_not_match_lookalike_hosts

Full suite: 58 pass, 0 fail. cargo fmt --all -- --check clean. cargo clippy -p sprout-agent --all-targets -- -D warnings clean.

Live smoke against api.openai.com with gpt-5-mini (key referenced only by filename, never read by the script):

=== auto + plain prompt (OPENAI_COMPAT_API=auto) ===
  stopReason=end_turn  tool_call=False  text='ready'
=== auto + tool prompt (OPENAI_COMPAT_API=auto) ===
  stopReason=end_turn  tool_call=True  text='It printed: hello'
=== chat + plain prompt (OPENAI_COMPAT_API=chat) ===
  stopReason=end_turn  tool_call=False  text='ready'

3/3 PASS — including a tool roundtrip through dev__shell that exercises the function_call → function_call_output replay ordering.

Not in this PR

  • Reasoning passthrough. Reasoning items are dropped, not carried forward across turns. Sprout's flow is already stateless across turns; carrying encrypted reasoning state would be a separate change with its own privacy/storage considerations.
  • Auto-fallback on Chat Completions errors. Suggested but rejected as too magical — operators with non-default endpoints set OPENAI_COMPAT_API explicitly, which keeps failure modes debuggable.
  • Env rename. Kept the existing OPENAI_COMPAT_* env names for zero migration cost; "compat" reads a little awkwardly now that we natively call Responses, but every operator already configures them that way.

Blast radius

Anthropic path: unchanged. provider=openai with non-*.openai.com base_url: same Chat Completions wire as before. provider=openai with official OpenAI base_url: now uses Responses by default; operators can pin to chat if they need the old behavior.

tlongwell-block added a commit that referenced this pull request May 17, 2026
…nses auto-upgrade

Two follow-ups from review on #604.

1. Anthropic startup hardening (Max #1)
   `OPENAI_COMPAT_API` was parsed unconditionally, so a stray bad value
   in an Anthropic-only env broke startup. Parse it only inside the
   `Provider::OpenAi` arm of `Config::from_env`. Anthropic gets a
   placeholder `OpenAiApi::ChatCompletions` it never reads. New tests
   pin the parser behavior without touching process env.

2. One-shot chat→responses auto-upgrade (Max #2, Tyler "automatic
   detection/fallthrough")
   When `OPENAI_COMPAT_API=auto` and the provider replies to a Chat
   Completions request with a body that explicitly names `/v1/responses`
   (or the prose "use the Responses API"), latch a process-wide
   sticky-cached upgrade and re-issue the same request on `/v1/responses`.
   Subsequent calls skip the chat attempt entirely. Pinned values
   (`OPENAI_COMPAT_API=chat`|`responses`) never auto-upgrade.

   Signal matcher (`is_responses_required_error`) is intentionally
   narrow — only matches the literal path `/v1/responses` or specific
   prose phrases, so we don't get fooled by unrelated 4xx bodies.

   New `Config.openai_api_auto: bool` records whether the operator
   resolved-by-auto vs. pinned, so we know when to enable the upgrade.

   `Llm` gains an `AtomicBool` for the sticky upgrade, plus three
   small helpers (`effective_openai_api`, `should_try_auto_upgrade`,
   `latch_responses_upgrade`) so the dispatch reads straight through.

   Logged at WARN once per process: `"openai chat-completions endpoint
   reported that this model requires the Responses API; auto-upgrading
   subsequent OpenAI calls to /v1/responses for the rest of this
   process"`, with the provider error body attached.

Tests:
- 4 new unit tests for `is_responses_required_error` covering the
  Databricks GPT-5.5 signal, OpenAI prose phrasing, and explicit
  non-matches for `invalid_api_key`, generic `unsupported_parameter`,
  and empty body.
- 3 new unit tests for `parse_openai_api` covering unset-defaults-to-auto,
  case-insensitive explicit values with whitespace, and rejected garbage.
- New integration test `tests/openai_auto_upgrade.rs` spawns a fake
  provider that 400s on `/chat/completions` with the Databricks signal
  and 200s on `/responses`. Drives sprout-agent through ACP and asserts
  `stopReason=end_turn` plus chat-hit-once / responses-hit-once.

65 tests pass, 0 fail. clippy `-D warnings` clean. cargo fmt clean.
Live smoke against api.openai.com with gpt-5-mini still 3/3 PASS.

Signed-off-by: Tyler Longwell <109685178+tlongwell-block@users.noreply.github.com>
OpenAI's GPT-5 / o-series models on api.openai.com require the
Responses API (/v1/responses) for tool calling; the legacy Chat
Completions endpoint rejects them. OpenAI-compatible servers (vLLM,
Ollama, llama.cpp, OpenRouter, Block Gateway, Databricks) almost all
still speak only Chat Completions. This change teaches sprout-agent
both dialects and routes between them automatically.

New env: OPENAI_COMPAT_API={auto,chat,responses}, default auto.
  auto picks Responses for *.openai.com hosts, Chat Completions for
  everything else. Operators can pin the choice explicitly.

Implementation:
- config.rs: OpenAiApi enum + parse_openai_api_env() + auto_openai_api()
  with a small zero-dep host extractor. Lookalike-safe (`.openai.com`
  suffix match, not substring).
- llm.rs: Provider::OpenAi now dispatches on cfg.openai_api. New
  responses_body / parse_responses pair handles the Responses wire
  shape (flat tool schema, input[] of typed items, max_output_tokens,
  output[] walk with reasoning-item skip). Serializer emits each
  prior assistant function_call before its function_call_output —
  the API rejects with "No tool call found for call_id ..." otherwise.
- README.md: provider table updated, new env documented.

Tests (11 new, all passing):
- responses_body shape: instructions/max_output_tokens/flat tools
- replay ordering invariant (function_call before function_call_output)
- empty-assistant text skipped, tool_calls still serialized
- image tool result → trailing input_image user message
- parse: end_turn / tool_use / max_output_tokens branches
- parse: rejects malformed function_call.arguments JSON
- auto-detection: official OpenAI → Responses; vLLM/Ollama/OpenRouter/
  Block Gateway/malformed → Chat Completions; lookalike host
  (api.openai.com.evil.example) → Chat Completions

cargo fmt + cargo clippy --all-targets -D warnings clean.
Live smoke against api.openai.com with gpt-5-mini: plain prompt,
tool-roundtrip (dev__shell), and explicit chat-mode fallback all
return stopReason=end_turn. See
scripts at ~/scratch/sprout-agent-demos/test_openai_responses_smoke.py
(out-of-tree).

Signed-off-by: Tyler Longwell <109685178+tlongwell-block@users.noreply.github.com>
…nses auto-upgrade

Two follow-ups from review on #604.

1. Anthropic startup hardening (Max #1)
   `OPENAI_COMPAT_API` was parsed unconditionally, so a stray bad value
   in an Anthropic-only env broke startup. Parse it only inside the
   `Provider::OpenAi` arm of `Config::from_env`. Anthropic gets a
   placeholder `OpenAiApi::ChatCompletions` it never reads. New tests
   pin the parser behavior without touching process env.

2. One-shot chat→responses auto-upgrade (Max #2, Tyler "automatic
   detection/fallthrough")
   When `OPENAI_COMPAT_API=auto` and the provider replies to a Chat
   Completions request with a body that explicitly names `/v1/responses`
   (or the prose "use the Responses API"), latch a process-wide
   sticky-cached upgrade and re-issue the same request on `/v1/responses`.
   Subsequent calls skip the chat attempt entirely. Pinned values
   (`OPENAI_COMPAT_API=chat`|`responses`) never auto-upgrade.

   Signal matcher (`is_responses_required_error`) is intentionally
   narrow — only matches the literal path `/v1/responses` or specific
   prose phrases, so we don't get fooled by unrelated 4xx bodies.

   New `Config.openai_api_auto: bool` records whether the operator
   resolved-by-auto vs. pinned, so we know when to enable the upgrade.

   `Llm` gains an `AtomicBool` for the sticky upgrade, plus three
   small helpers (`effective_openai_api`, `should_try_auto_upgrade`,
   `latch_responses_upgrade`) so the dispatch reads straight through.

   Logged at WARN once per process: `"openai chat-completions endpoint
   reported that this model requires the Responses API; auto-upgrading
   subsequent OpenAI calls to /v1/responses for the rest of this
   process"`, with the provider error body attached.

Tests:
- 4 new unit tests for `is_responses_required_error` covering the
  Databricks GPT-5.5 signal, OpenAI prose phrasing, and explicit
  non-matches for `invalid_api_key`, generic `unsupported_parameter`,
  and empty body.
- 3 new unit tests for `parse_openai_api` covering unset-defaults-to-auto,
  case-insensitive explicit values with whitespace, and rejected garbage.
- New integration test `tests/openai_auto_upgrade.rs` spawns a fake
  provider that 400s on `/chat/completions` with the Databricks signal
  and 200s on `/responses`. Drives sprout-agent through ACP and asserts
  `stopReason=end_turn` plus chat-hit-once / responses-hit-once.

65 tests pass, 0 fail. clippy `-D warnings` clean. cargo fmt clean.
Live smoke against api.openai.com with gpt-5-mini still 3/3 PASS.

Signed-off-by: Tyler Longwell <109685178+tlongwell-block@users.noreply.github.com>
Review pass per Max's PR comments. Same behavior, same core safety
tests, fewer production lines.

Cuts:
- Collapsed `openai_api: OpenAiApi` + `openai_api_auto: bool` into a
  single tri-state enum `OpenAiApi::{Chat,Responses,Auto}`. The auto-
  upgrade-on-error path now keys on `cfg.openai_api == Auto` directly
  instead of a parallel flag.
- Replaced the duplicated chat/responses dispatch across `complete`
  and `summarize` with a single `openai_request` helper. Callers pass a
  `FnMut(bool) -> (Value, OpenAiParse)` so the body is only built for
  the endpoint actually selected.
- Pre-resolved `OpenAiApi::Auto`'s host check is gone — the endpoint
  is computed at call time inside `openai_request`. Drops the
  `auto_openai_api` helper; the test surface is now a single pure
  `is_openai_host` function in `config.rs`.
- Inlined `responses_image_user_content` (single caller).
- Inlined `responses_stop` into `parse_responses` (single caller).
- Removed wall-of-text protocol comments; kept only the non-obvious
  replay-ordering invariant and a spec link.
- Collapsed test fan-out: `auto_openai_api_*` (3 fns) → `is_openai_host_matrix`
  (1 table); `parse_openai_api_*` (3 fns) → `parse_openai_api_values`
  (1 table); `is_responses_required_error_*` (3 fns) → `_matrix` (1 table).

What stayed:
- Replay-ordering integration test (`responses_body_replay_emits_function_call_before_output`).
- Live-fake Databricks integration test (`tests/openai_auto_upgrade.rs`).
- Lookalike host safety case (`api.openai.com.evil.example` → Chat).
- Narrow `is_responses_required_error` matcher.

Net production diff (excluding `#[cfg(test)] mod tests` blocks):
  config.rs  +97 → +48
  llm.rs    +358 → +265
  total     +455 → +313

59 tests pass (was 65 — 4 collapsed into 2 tables, 2 helpers removed).
cargo fmt + cargo clippy -p sprout-agent --all-targets -D warnings clean.
Live smoke against api.openai.com with gpt-5-mini: 3/3 PASS unchanged
(plain, tool roundtrip, explicit chat-mode override).

Signed-off-by: Tyler Longwell <109685178+tlongwell-block@users.noreply.github.com>
@tlongwell-block tlongwell-block force-pushed the feat/openai-responses-api branch from 9c57a2b to d2d06ce Compare May 17, 2026 01:04
@tlongwell-block tlongwell-block merged commit e4e9923 into main May 17, 2026
15 checks passed
@tlongwell-block tlongwell-block deleted the feat/openai-responses-api branch May 17, 2026 01:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant