Skip to content

fix(bedrock): inline mid-conversation system messages to preserve prompt cache#4534

Open
mickgvirtu wants to merge 1 commit into
maximhq:devfrom
mickgvirtu:pr-bedrock-midconv-system-cache
Open

fix(bedrock): inline mid-conversation system messages to preserve prompt cache#4534
mickgvirtu wants to merge 1 commit into
maximhq:devfrom
mickgvirtu:pr-bedrock-midconv-system-cache

Conversation

@mickgvirtu

Copy link
Copy Markdown

Summary

Fixes #4068. On the Bedrock provider, ConvertBifrostMessagesToBedrockMessages hoists every role:system/role:developer message into Bedrock's top-level system block, regardless of position. Because Bedrock's Converse prompt cache is prefix-based, a role:system message injected mid-conversation (e.g. the reminders Claude Code emits) grows the system prefix in front of the cached conversation and collapses cache reads to the system+tools floor — recurring on every such turn.

Changes

  • When the model is in the Anthropic family, keep only the leading run of system/developer messages in system; messages appearing after the conversation starts are inlined in place. Non-Anthropic models keep the historical hoist-everything behavior.
  • This mirrors the native Anthropic provider's existing SupportsMidConversationSystem handling. Bedrock has no message-level system role, so an inlined message is rendered as a user turn (wrapped in <system-reminder>…</system-reminder>, matching the convention clients already use for pre-wrapped reminders).
  • Gating is an inlineSystemReminders bool computed by the caller via IsAnthropicModelFamily(ctx, model) (alias-aware, consistent with the other Anthropic gates in the file).
  • cache_control on tool calls/results is preserved as a CachePoint carrying the requested TTL.

Type of change

  • Bug fix

Affected areas

  • Core (Go)
  • Providers/Integrations

How to test

go test ./core/providers/bedrock/

Adds TestMidConversationSystemReminderStaysInline, …HoistedForNonAnthropic, TestToolCacheControlBecomesCachePointWithTTL (positive TTL assertion), a lone-system early-return test, and a no-leading-system-block gate test.

Issue #4068 has the full root-cause plus a real cache-read trace (cached tokens dropping to the system/tools floor and recovering after the prefix re-warms). Related native-side work: #4276, #3879.


Notes for reviewers: this re-adds a parameter the converter previously dropped — happy to thread it differently (e.g. derive from a typed context) if you prefer. The <system-reminder> wrapping follows the client convention; open to gating it if you'd rather it not be implicit.

@CLAassistant

CLAassistant commented Jun 18, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 0290a262-ebfa-428b-92f1-b0134d63d418

📥 Commits

Reviewing files that changed from the base of the PR and between d0aa598 and c8cc345.

📒 Files selected for processing (2)
  • core/providers/bedrock/bedrock_test.go
  • core/providers/bedrock/responses.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • core/providers/bedrock/bedrock_test.go
  • core/providers/bedrock/responses.go

📝 Walkthrough

Summary by CodeRabbit

  • Tests

    • Added coverage for mid-conversation system/developer reminders, including model-specific hoisting vs inlining, correct ordering around tool calls/results, reminder empty-content handling, and cache-point behavior.
  • Improvements

    • Updated message conversion to better align system/developer reminder handling for Anthropic-family models during mid-conversation.
    • Preserved tool-call and tool-result cache control by emitting appropriate cache breakpoints with TTL.

Walkthrough

ConvertBifrostMessagesToBedrockMessages gains an inlineSystemReminders bool parameter. For Anthropic models, mid-conversation system/developer messages after the leading run are converted to user turns wrapped in <system-reminder> tags instead of being hoisted. Tool call and tool result CacheControl entries now emit adjacent CachePoint blocks. Twelve new tests validate all branching paths.

Changes

Bedrock inline system reminders and tool CachePoint

Layer / File(s) Summary
Function signature, state, and call-site wiring
core/providers/bedrock/responses.go, core/providers/bedrock/bedrock_test.go
ConvertBifrostMessagesToBedrockMessages gains inlineSystemReminders bool and seenNonSystemMessage state. ToBedrockResponsesRequest passes an Anthropic-derived boolean; ToBedrockConverseResponse passes false. Four existing test call sites updated to pass false.
Inline branching logic and reminder-wrapping helper
core/providers/bedrock/responses.go
When inlineSystemReminders is true and seenNonSystemMessage is set, mid-conversation system/developer messages are routed to convertBifrostSystemReminderToBedrockUserMessage, which wraps each text block in <system-reminder>...</system-reminder> and returns nil for empty content. Otherwise the existing hoist path is used.
CachePoint emission for tool call/result CacheControl
core/providers/bedrock/responses.go
During pending tool call and tool result emission, a Bedrock CachePoint block is appended when CacheControl is present on the tool call or result, preserving the configured TTL.
New test suite for reminder inlining and CachePoint behavior
core/providers/bedrock/bedrock_test.go
Adds 12 new test functions and helper builders covering Anthropic inline vs non-Anthropic hoist, hoist boundary at first non-system message, tool result pairing preservation, developer role, ContentStr inlining, empty content drop, reminder between tool call and result, CachePoint suppression on reminders, CachePoint with TTL on tool cache control, lone system message, and no-leading-system-block inlining.

Sequence Diagrams

sequenceDiagram
  participant Client
  participant BedrockConverter
  participant SystemMessage
  participant ToolCall
  participant ToolResult

  Client->>BedrockConverter: ConvertBifrostMessagesToBedrockMessages(inlineSystemReminders=true)
  BedrockConverter->>SystemMessage: Leading system messages → hoist to system block
  BedrockConverter->>SystemMessage: Mid-conversation system messages → wrap as <system-reminder> user turn
  BedrockConverter->>ToolCall: Emit tool call, append CachePoint if CacheControl present
  BedrockConverter->>ToolResult: Emit tool result, append CachePoint if CacheControl present
  BedrockConverter-->>Client: messages[], systemBlocks[], error
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • maximhq/bifrost#3754: Touches CachePoint block handling during Bedrock conversion, directly intersecting with the new CachePoint emission for tool call/result CacheControl.

Suggested reviewers

  • danpiths
  • akshaydeo

Poem

🐇 Hop, hop through the Bedrock stream,
Mid-conversation roles now gleam—
<system-reminder> tags wrap tight,
CachePoints blink at just the right site.
Leading blocks are hoisted up with care,
And twelve new tests confirm it's fair! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: inlining mid-conversation system messages to preserve prompt cache in the Bedrock provider.
Description check ✅ Passed The PR description includes all required template sections: Summary (with issue reference), Changes with design decisions, Type of change (Bug fix), Affected areas (Core and Providers), How to test with commands, and Breaking changes (marked No).
Docstring Coverage ✅ Passed Docstring coverage is 95.83% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

level=error msg="[linters_context] typechecking error: pattern ./...: directory prefix . does not contain main module or its selected dependencies"


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot requested review from akshaydeo and danpiths June 18, 2026 15:09

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
core/providers/bedrock/bedrock_test.go (1)

6471-6477: ⚡ Quick win

Cover the lone developer branch too.

The test comment and converter predicate both cover system and developer, but this test only exercises systemReminderTextMsg; add a developer case so the single-message early return cannot regress for role=developer.

Suggested test expansion
 func TestLoneSystemMessageReturnsUserMessage(t *testing.T) {
-	for _, inline := range []bool{true, false} {
-		input := []schemas.ResponsesMessage{systemReminderTextMsg("You are Claude Code.")}
-		messages, systemMessages, err := bedrock.ConvertBifrostMessagesToBedrockMessages(context.Background(), input, inline)
-		require.NoError(t, err)
-		assert.Empty(t, systemMessages, "lone system message must not populate the system block (inline=%v)", inline)
-		require.Len(t, messages, 1, "lone system message must yield exactly one message (inline=%v)", inline)
-		assert.Equal(t, bedrock.BedrockMessageRoleUser, messages[0].Role)
-	}
+	cases := []struct {
+		name string
+		msg  schemas.ResponsesMessage
+	}{
+		{name: "system", msg: systemReminderTextMsg("You are Claude Code.")},
+		{name: "developer", msg: developerReminderTextMsg("Developer instructions.")},
+	}
+
+	for _, tc := range cases {
+		t.Run(tc.name, func(t *testing.T) {
+			for _, inline := range []bool{true, false} {
+				input := []schemas.ResponsesMessage{tc.msg}
+				messages, systemMessages, err := bedrock.ConvertBifrostMessagesToBedrockMessages(context.Background(), input, inline)
+				require.NoError(t, err)
+				assert.Empty(t, systemMessages, "lone %s message must not populate the system block (inline=%v)", tc.name, inline)
+				require.Len(t, messages, 1, "lone %s message must yield exactly one message (inline=%v)", tc.name, inline)
+				assert.Equal(t, bedrock.BedrockMessageRoleUser, messages[0].Role)
+			}
+		})
+	}
 }

As per coding guidelines, Go changes should include deterministic tests and table-driven coverage for behavior changes.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/providers/bedrock/bedrock_test.go` around lines 6471 - 6477, The test
TestLoneSystemMessageReturnsUserMessage only covers the system message role by
using systemReminderTextMsg, but the converter predicate and test comment
indicate both system and developer roles should be handled. Expand the test to
also cover the developer message role by adding a developer message case
alongside the existing system message case. Use a table-driven approach or add a
separate developer message input to ensure the single-message early return path
in ConvertBifrostMessagesToBedrockMessages is exercised for both role types
without regression.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@core/providers/bedrock/bedrock_test.go`:
- Around line 6471-6477: The test TestLoneSystemMessageReturnsUserMessage only
covers the system message role by using systemReminderTextMsg, but the converter
predicate and test comment indicate both system and developer roles should be
handled. Expand the test to also cover the developer message role by adding a
developer message case alongside the existing system message case. Use a
table-driven approach or add a separate developer message input to ensure the
single-message early return path in ConvertBifrostMessagesToBedrockMessages is
exercised for both role types without regression.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 16db7350-4432-40ec-b7cf-7f4942b4ecba

📥 Commits

Reviewing files that changed from the base of the PR and between 96bb2bd and d0aa598.

📒 Files selected for processing (2)
  • core/providers/bedrock/bedrock_test.go
  • core/providers/bedrock/responses.go

coderabbitai[bot]
coderabbitai Bot previously approved these changes Jun 18, 2026
@greptile-apps

greptile-apps Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Confidence Score: 5/5

Safe to merge — the fix is correctly scoped to Anthropic-family Bedrock models, the historical hoist-everything path for other families is untouched, and the seenNonSystemMessage gate logic is sound across leading, mid-conversation, and tool-exchange message sequences.

The core behavioral change (gating on seenNonSystemMessage + inlineSystemReminders) is straightforward and well-tested across the main scenarios. The two flagged items are non-blocking: one is a test comment that overstates coverage of a defensive code path never reached in valid Claude Code conversations, and the other is an acknowledged design trade-off around unconditional system-reminder wrapping. Neither represents a functional regression or data correctness issue on the happy path.

Both changed files are in core/providers/bedrock/; the test comment mismatch in bedrock_test.go and the double-wrapping guard in responses.go are worth a second look before closing the PR.

Important Files Changed

Filename Overview
core/providers/bedrock/responses.go Adds inlineSystemReminders parameter to ConvertBifrostMessagesToBedrockMessages; implements seenNonSystemMessage gate to keep only the leading system-prompt run in Bedrock's system block; adds convertBifrostSystemReminderToBedrockUserMessage to render mid-conversation reminders as wrapped user turns; backfills cache-point propagation in the ResponsesMessageTypeMessage inline flush path for tool calls and results.
core/providers/bedrock/bedrock_test.go Updates existing call sites to pass false for the new parameter; adds 10 new tests covering leading-vs-mid system message handling, developer role, ContentStr branch, empty content, tool-result ordering, cache-point absence on reminders, and TTL preservation. The 'followed by a message' subtest exercises the pre-existing isLastResultInSequence flush rather than the new ResponsesMessageTypeMessage inline-flush lines.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[ConvertBifrostMessagesToBedrockMessages\ninlineSystemReminders bool] --> B{len==1 && role==system?}
    B -- yes --> C[Return as user message\nearly return]
    B -- no --> D[Loop over messages]
    D --> E{msgType==Message\n&& role==system/dev?}
    E -- no --> F[seenNonSystemMessage = true]
    F --> G{msgType switch}
    E -- yes --> G
    G -- FunctionCall --> H[stateManager.RegisterToolCall]
    G -- FunctionCallOutput --> I[stateManager.RegisterToolResult]
    I --> J{isLastResultInSequence?}
    J -- yes --> K[Flush tool calls + results\nwith CachePoints]
    J -- no --> D
    G -- Message --> L{HasPendingToolCalls\nor HasPendingResults?}
    L -- yes --> M[Inline flush\nwith CachePoints NEW]
    L -- no --> N{role==system/dev?}
    M --> N
    N -- hoist path --> O[Hoist into system block]
    N -- inline path --> P[Wrap in system-reminder tags]
    N -- user/assistant --> Q[convertBifrostMessageToBedrockMessage]
    D -- done --> R[flushPendingToolCalls / flushPendingToolResults]
    R --> S[Merge consecutive same-role messages]
    S --> T[Return bedrockMessages, systemMessages]
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[ConvertBifrostMessagesToBedrockMessages\ninlineSystemReminders bool] --> B{len==1 && role==system?}
    B -- yes --> C[Return as user message\nearly return]
    B -- no --> D[Loop over messages]
    D --> E{msgType==Message\n&& role==system/dev?}
    E -- no --> F[seenNonSystemMessage = true]
    F --> G{msgType switch}
    E -- yes --> G
    G -- FunctionCall --> H[stateManager.RegisterToolCall]
    G -- FunctionCallOutput --> I[stateManager.RegisterToolResult]
    I --> J{isLastResultInSequence?}
    J -- yes --> K[Flush tool calls + results\nwith CachePoints]
    J -- no --> D
    G -- Message --> L{HasPendingToolCalls\nor HasPendingResults?}
    L -- yes --> M[Inline flush\nwith CachePoints NEW]
    L -- no --> N{role==system/dev?}
    M --> N
    N -- hoist path --> O[Hoist into system block]
    N -- inline path --> P[Wrap in system-reminder tags]
    N -- user/assistant --> Q[convertBifrostMessageToBedrockMessage]
    D -- done --> R[flushPendingToolCalls / flushPendingToolResults]
    R --> S[Merge consecutive same-role messages]
    S --> T[Return bedrockMessages, systemMessages]
Loading

Reviews (2): Last reviewed commit: "Bedrock: inline mid-conversation system ..." | Re-trigger Greptile

Comment thread core/providers/bedrock/bedrock_test.go
…cache

Bedrock's prompt cache is prefix-based: a mid-conversation role=system message (e.g. the
reminders Claude Code injects) hoisted into the top-level system block grows that prefix every
turn and collapses the cached conversation to the tools/system floor. This is the Bedrock
counterpart of the native Anthropic provider's mid-conversation system support
(SupportsMidConversationSystem) — Bedrock has no message-level system role, so the inlined
message is rendered as a user turn. Gated by an inlineSystemReminders bool the caller computes
via IsAnthropicModelFamily(ctx, model) (alias-aware), so non-Anthropic families keep the
historical hoist-everything behavior. Tool-call/result cache_control breakpoints are preserved
as CachePoint blocks carrying the requested TTL. Adds regression tests including a positive
cache_control->CachePoint+TTL assertion, the lone-system early return, and the no-leading-system
gate.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mickgvirtu

Copy link
Copy Markdown
Author

Thanks — both addressed:

  • TTL test exercised the pre-existing end-of-sequence flush, not the new flush-before-message path (greptile): TestToolCacheControlBecomesCachePointWithTTL is now table-driven over two shapes — end of sequence (no following message) and followed by a message ([user, FunctionCall(+cache), FunctionCallOutput(+cache), user]), which reaches the new CachePoint code inside case ResponsesMessageTypeMessage. Both assert the 1h TTL survives.
  • Lone developer branch (coderabbit): TestLoneSystemMessageReturnsUserMessage now runs over both system and developer roles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bedrock: mid-conversation system messages hoisted into top-level system block break prompt caching

2 participants