Add prompt cache support and implement for Anthropic#716
Add prompt cache support and implement for Anthropic#716arunkumarry wants to merge 5 commits intocrmne:mainfrom
Conversation
# This is the 1st commit message: Add prompt caching support for Anthropic # The commit message crmne#2 will be skipped: # uncommit unnecessary file # The commit message crmne#3 will be skipped: # remove rubocop and flay fixes as they are unrelated to this issue # The commit message crmne#4 will be skipped: # remove rubocop ignore for anthropic complete method
# This is the 1st commit message: Add prompt caching support for Anthropic # The commit message crmne#2 will be skipped: # Add prompt caching support for Anthropic
d474ab2 to
1779bf1
Compare
|
After the initial Anthropic implementation, I extended the same cache_point: API to AWS Bedrock as well. The usage is identical: Bedrock uses a different wire format than direct Anthropic, instead of merging
Changes: |
What this does
Adds prompt caching support for Anthropic Claude models via a
cache_point:keyword onChat#with_instructionsandChat#ask. When a message is marked as a cache point, the gem injects Anthropic'scache_control: { type: 'ephemeral' }header on the last content block of that message and automatically adds the requiredanthropic-beta: prompt-caching-2024-07-31request header. The static portion of the prompt is cached server-side by Anthropic for 5 minutes, reducing input token costs on repeated calls.Fixes #706
Usage
Multiple cache points are supported (up to Anthropic's limit of 4 per request):
Cache points on ask are also supported for caching user messages:
chat.ask(large_static_user_context, cache_point: true)Future extensibility
The
cache_pointattribute on Message is provider agnostic. Adding support for other providers requires only provider-specific formatting logic:For providers with inline cache markers (like Anthropic):
Override
completemethod to add any required headers/beta flags whenmessages.any?(&:cache_point?)Add an
inject_cache_*helper in the provider'sChatmodule that modifies content blocksCall the helper in the message formatting methods when
msg.cache_point?For providers with separate cache APIs (like Gemini's Context Caching):
Override
completeto manage the cache lifecycle (create → reuse → retry on expiry)Modify
render_payloadto accept acached_content_name:parameterWhen the name is present, split messages at the last cache point and only send the dynamic suffix inline
For providers without caching support:
No changes needed
cache_pointflags are silently ignored, preserving existing behaviorThe core Message and Chat changes are already in place. Future PRs for Gemini, OpenAI (when they add caching), or other providers only need to touch their respective provider modules.
Type of change
Scope check
Required for new features
Quality check
overcommit --installand all hooks passThere are existing rubocop offenses in - spec/ruby_llm/generators/chat_ui_generator_spec.rb
There are existing Flay offenses in following files - spec/ruby_llm/generators/chat_ui_generator_spec.rb and lib/ruby_llm/error.rb
bundle exec rake vcr:record[provider_name]bundle exec rspecmodels.json,aliases.json)AI-generated code
API changes