diff --git a/assets/agw-docs/snippets/llm-comparison.md b/assets/agw-docs/snippets/llm-comparison.md
index ebc3257f4..e01c60dcc 100644
--- a/assets/agw-docs/snippets/llm-comparison.md
+++ b/assets/agw-docs/snippets/llm-comparison.md
@@ -1,18 +1,36 @@
Review the following table to compare agentgateway's support of different LLM provider APIs.
-| API | OpenAI | Anthropic | Amazon Bedrock | Azure | Google Gemini | Google Vertex AI | GitHub Copilot |
-|-----|:------:|:---------:|:--------------:|:------------:|:-------------:|:----------------:|:---------------:|
-| Completions `/v1/chat/completions` | ✅ Native | ✅ Translation | ✅ Translation| ✅ Native | ✅ Native`*`| ✅ Native`†` | ✅ Native |
-| Responses `/v1/responses` | ✅ Native | ❌ No | ✅ Translation| ✅ Native| ❌ No | ❌ No | ❌ No |
-| Messages `/v1/messages` | ✅ Translation | ✅ Native | ✅ Translation | ✅ Translation | ✅ Translation | ✅ Native`†` | ✅ Translation |
-| Embeddings `/v1/embeddings` | ✅ Native | ❌ No | ✅ Translation | ✅ Native | ❌ No | ✅ Translation | ❌ No |
-| Realtime `/v1/realtime` | ✅ Native | ❌ No | ❌ No | ❌ No | ❌ No | ❌ No | ❌ No |
-| Token Count `/v1/messages/count_tokens` | ❌ No | ✅ Native| ✅ Translation | ❌ No| ❌ No | ✅ Translation | ❌ No |
+| Provider | Chat Completions | Responses | Messages | Embeddings | Realtime | Count Tokens | Rerank |
+|---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
+| OpenAI | ✅ | ✅ | ✅¹ | ✅ | ✅ | ✅² | - |
+| Anthropic | ✅¹ | ◇ | ✅ | - | - | ✅ | - |
+| Bedrock | ✅¹ | ✅¹ | ✅¹ | ✅¹ | - | ✅⁴ | ✅¹ |
+| Azure | ✅ | ✅ | ✅¹ | ✅ | - | ✅² | ⚠️³ |
+| Gemini | ✅ | ✅¹ | ✅¹ | ✅ | - | ✅² | - |
+| Vertex AI | ✅⁴ | ◇ | ✅⁴ | ✅¹ | - | ✅⁴ | ✅¹ |
+| Copilot | ✅ | ✅ | ✅¹ | ◇ | - | ✅² | ⚠️³ |
+| Cohere | ✅ | ✅¹ | ✅¹ | ✅ | - | ✅² | ✅ |
+| Ollama | ✅ | ✅ | ✅¹ | ✅ | - | ✅² | - |
+| Baseten | ✅ | ✅¹ | ✅ | - | - | ✅² | - |
+| Cerebras | ✅ | ✅¹ | ✅¹ | - | - | ✅² | - |
+| Deepinfra | ✅ | ✅¹ | ✅ | ✅ | - | ✅² | - |
+| Deepseek | ✅ | ✅¹ | ✅ | - | - | ✅² | - |
+| Groq | ✅ | ✅ | ✅¹ | - | - | ✅² | - |
+| Hugging Face | ✅ | ✅ | ✅¹ | - | - | ✅² | - |
+| Mistral | ✅ | ✅¹ | ✅¹ | ✅ | - | ✅² | - |
+| OpenRouter | ✅ | ✅ | ✅ | ✅ | - | ✅² | ✅ |
+| Together AI | ✅ | ✅¹ | ✅¹ | ✅ | - | ✅² | ✅ |
+| xAI | ✅ | ✅ | ✅¹ | - | ✅ | ✅² | - |
+| Fireworks | ✅ | ✅ | ✅ | ✅ | - | ✅² | ✅ |
-**Notes**:
-- **✅ Native**: Agentgateway has complete support for the API, and the provider supports the API natively. This allows Agentgateway to passthrough unknown fields without change. As such, even if you use extra fields or new models, the proxying likely works.
-- **✅ Translation**: Agentgateway translates from one API to another. As such, agentgateway only supports fields that it is aware of. New models or LLM APIs might require code changes before they are fully supported.
-- **❌ No**: Agentgateway does not currently support the API for this provider.
-- `*`: Agentgateway supports the API natively via a compatibility endpoint. Note that Google Gemini does a translation for their Completions API support.
-- `†`: Agentgateway supports the API natively via translation to Anthropic. Support in Vertex AI differs depending on the model type.
-- Both streaming and non-streaming options for the Completions, Responses, and Messages APIs are supported.
+Legend:
+
+| Symbol | Meaning |
+|--------|--------------------------------------------------------------------------------|
+| ✅ | Supported natively |
+| ✅¹ | Supported via Agentgateway translation |
+| ✅² | Supported by a local estimate by Agentgateway |
+| ⚠️³ | Passthrough/provider-dependent; works only with a compatible upstream endpoint |
+| ✅⁴ | Supported, but behavior depends on model family or provider route |
+| ◇ | Not currently implemented in Agentgateway |
+| - | Provider does not offer this capability |
diff --git a/assets/agw-docs/standalone/deployment/binary.md b/assets/agw-docs/standalone/deployment/binary.md
index 15ab40fd6..f3e7a77bc 100644
--- a/assets/agw-docs/standalone/deployment/binary.md
+++ b/assets/agw-docs/standalone/deployment/binary.md
@@ -4,7 +4,7 @@ To run agentgateway as a standalone binary, follow the steps to download, instal
{{% steps %}}
-### Step 1: Download and install
+### Download and install
Download and install the agentgateway binary. Alternatively, you can manually download the binary from the [agentgateway releases page](https://github.com/agentgateway/agentgateway/releases/latest).
@@ -79,7 +79,7 @@ Password:
agentgateway installed into /usr/local/bin/agentgateway
```
-### Step 2: Verify the installation
+### Verify the installation
Verify that the `agentgateway` binary is installed.
@@ -99,26 +99,22 @@ Example output with the latest version, {{< reuse "agw-docs/versions/n-patch.md"
}
```
-### Step 3: Create a configuration file
+### Run agentgateway
-Create a [configuration file]({{< link-hextra path="/configuration/" >}}) for agentgateway. In this example, `config.yaml` is used. You might start with [this simple example configuration file](https://agentgateway.dev/examples/basic/config.yaml).
+To run agentgateway, the binary can simply be executed. Configuration will be stored in `~/.config/agentgateway`
-```yaml
-{{< github url="https://agentgateway.dev/examples/basic/config.yaml" >}}
+```sh
+agentgateway
```
-### Step 4: Run agentgateway
+To specify an explicit configuration file, use `-f`:
```sh
agentgateway -f config.yaml
```
-Example output:
+You might start with [this simple example configuration file](https://agentgateway.dev/examples/basic/config.yaml).
-```
-info state_manager loaded config from File("config.yaml")
-info app serving UI at http://localhost:15000/ui
-info proxy::gateway started bind bind="bind/3000"
-```
+Open to get started!
{{% /steps %}}
diff --git a/assets/agw-docs/standalone/virtual-keys.md b/assets/agw-docs/standalone/virtual-keys.md
index 2b585bee7..bfdff2603 100644
--- a/assets/agw-docs/standalone/virtual-keys.md
+++ b/assets/agw-docs/standalone/virtual-keys.md
@@ -198,6 +198,10 @@ EOF
LLMs typically charge per input and output token. Without spending control, users can quickly generate large bills by submitting long prompts, streaming or retrying requests, or running recursive agent loops. To protect against unexpected bills, scaling surprises, and abuse, use token-based rate limits to cap the number of tokens that can be used.
+{{< callout type="warning" >}}
+`localRateLimit` is a **gateway-wide** limit, not a per-key limit. It enforces a single shared token budget across **all** requests and API keys.
+{{< /callout >}}
+
### How rate limiting works
Agentgateway checks token-based rate limits in two phases:
@@ -352,61 +356,6 @@ EOF
With this setting, requests are denied immediately if the estimated prompt token count exceeds the available budget.
-## Add a global token budget
-
-{{< callout type="warning" >}}
-`localRateLimit` is a **gateway-wide** limit, not a per-key limit. It enforces a single shared token budget across **all** requests and API keys.
-{{< /callout >}}
-
-To add a token budget that limits total token usage across all requests using more advanced routing options, use the routing-based configuration format with `localRateLimit`.
-
-{{< callout type="info" >}}
-Rate limiting requires the `binds/listeners/routes` configuration format because `localRateLimit` is an HTTP-level policy. For more information, see the [Routing-based configuration guide]({{< link-hextra path="/llm/configuration-modes/" >}}).
-{{< /callout >}}
-
-```yaml
-cat <<'EOF' > config.yaml
-# yaml-language-server: $schema=https://agentgateway.dev/schema/config
-
-binds:
-- port: 4000
- listeners:
- - routes:
- - backends:
- - ai:
- name: openai
- provider:
- openAI:
- model: gpt-3.5-turbo
- policies:
- apiKey:
- mode: strict
- keys:
- - key: sk-alice-abc123def456
- metadata:
- user: alice
- - key: sk-bob-xyz789uvw012
- metadata:
- user: bob
- backendAuth:
- key: "$OPENAI_API_KEY"
- localRateLimit:
- - maxTokens: 100000
- tokensPerFill: 100000
- fillInterval: 86400s
- type: tokens
-EOF
-```
-
-| Setting | Description |
-| -- | -- |
-| `backendAuth` | The API key used to authenticate with the LLM provider backend. For configuration options, see [Manage API keys]({{< link-hextra path="/llm/api-keys/" >}}). |
-| `localRateLimit` | Token-based rate limiting applied globally to **all** requests through this route, regardless of which API key is used. |
-| `maxTokens` | The maximum number of tokens available in the shared budget. |
-| `tokensPerFill` | The number of tokens added during each refill. |
-| `fillInterval` | The interval between refills. Use `86400s` for a daily budget. |
-| `type` | Set to `tokens` for token-based limits. Use `requests` for request-based limits. |
-
For more information about rate limiting configuration options, see [Rate limits]({{< link-hextra path="/configuration/resiliency/rate-limits/" >}}).
## Monitor per-key spending
diff --git a/content/docs/standalone/main/deployment/docker/_index.md b/content/docs/standalone/main/deployment/docker/_index.md
index 07c55f8f6..76ee64e1c 100644
--- a/content/docs/standalone/main/deployment/docker/_index.md
+++ b/content/docs/standalone/main/deployment/docker/_index.md
@@ -6,32 +6,47 @@ description: Overview of how to deploy agentgateway with Docker.
To run agentgateway as a Docker container, agentgateway publishes official Docker images at `cr.agentgateway.dev/agentgateway`.
-Before you begin, create a [configuration file]({{< link-hextra path="/configuration/" >}}) for agentgateway. In this example, `config.yaml` is used.
-You might start with [this simple example configuration file](https://agentgateway.dev/examples/basic/config.yaml).
## Docker
-To run agentgateway with Docker, mount your configuration file into the container and expose any necessary ports.
+To run agentgateway with Docker, you may either mount your [configuration file]({{< link-hextra path="/configuration/" >}}) directly, or mount a directory
+and create the configuration in the UI:
```sh
-docker run -v ./config.yaml:/config.yaml -p 3000:3000 \
- cr.agentgateway.dev/agentgateway:v{{< reuse "agw-docs/versions/n-patch.md" >}} \
- -f /config.yaml
+mkdir agentgateway-config
+docker run \
+ --user "$(id -u):$(id -g)" \
+ -v ./agentgateway-config:/config \
+ -p 3000:3000 -p 4000:4000 -p 127.0.0.1:15000:15000 \
+ cr.agentgateway.dev/agentgateway:v{{< reuse "agw-docs/versions/n-patch.md" >}}
```
-By default, the agentgateway admin UI listens on localhost, which is not exposed outside of the container.
-To access the UI, you can change the bind address and expose the port.
+When run in this mode, a configuration file will automatically be created, setting up logging and exposing the admin UI.
+The `user` is customized to run as the current user to ensure the container can read and write the configuration.
+
+If you want to provide an explicit file, you can also do so. By default, the agentgateway admin UI listens on localhost, which is not exposed outside of the container;
+the `ADMIN_ADDR` is set below to expose it and is optional.
```sh
-docker run -v ./config.yaml:/config.yaml -p 3000:3000 \
- -p 127.0.0.1:15000:15000 -e ADMIN_ADDR=0.0.0.0:15000 \
+docker run \
+ --user "$(id -u):$(id -g)" \
+ -v ./config.yaml:/config.yaml \
+ -p 3000:3000 -p 4000:4000 -p 127.0.0.1:15000:15000 \
+ -e ADMIN_ADDR=0.0.0.0:15000 \
cr.agentgateway.dev/agentgateway:v{{< reuse "agw-docs/versions/n-patch.md" >}} \
-f /config.yaml
```
+Open to get started!
+
## Docker Compose
-To run agentgateway in Docker Compose, follow a similar approach to mount the configuration file and expose the ports.
+To run agentgateway in Docker Compose, follow the same approach as above. Create a directory for the configuration and start the service.
+
+```sh
+mkdir agentgateway-config
+docker compose up
+```
```yaml
services:
@@ -39,12 +54,14 @@ services:
container_name: agentgateway
restart: unless-stopped
image: cr.agentgateway.dev/agentgateway:v{{< reuse "agw-docs/versions/n-patch.md" >}}
+ # Replace with your user and group IDs, such as the output of: id -u && id -g
+ user: "1000:1000"
ports:
- "3000:3000"
+ - "4000:4000"
- "127.0.0.1:15000:15000"
volumes:
- - ./config.yaml:/config.yaml
- environment:
- - ADMIN_ADDR=0.0.0.0:15000
- command: ["-f", "/config.yaml"]
+ - ./agentgateway-config:/config
```
+
+Open to get started!
diff --git a/content/docs/standalone/main/llm/_index.md b/content/docs/standalone/main/llm/_index.md
index 43344887f..86d044a73 100644
--- a/content/docs/standalone/main/llm/_index.md
+++ b/content/docs/standalone/main/llm/_index.md
@@ -8,4 +8,5 @@ next: /reference/observability
test: skip
---
-Consume LLM services by setting up AI backends for your LLM providers.
+Agentgateway can act as a feature rich AI/LLM gateway, acting as a proxy between your applications and LLM providers.
+This enables connecting to thousands of LLM model through a unified interface providing governance, observability, and reliability controls.
diff --git a/content/docs/standalone/main/llm/about.md b/content/docs/standalone/main/llm/about.md
index 25c303ded..8c20d9ff8 100644
--- a/content/docs/standalone/main/llm/about.md
+++ b/content/docs/standalone/main/llm/about.md
@@ -35,10 +35,6 @@ Many providers now have dedicated integrations with preconfigured base URLs and
- [OpenRouter]({{< link-hextra path="/llm/providers/openrouter/" >}})
- [Fireworks AI]({{< link-hextra path="/llm/providers/fireworks/" >}})
-### OpenAI-compatible fallback
-
-Use [OpenAI-compatible]({{< link-hextra path="/llm/providers/openai-compatible/" >}}) for Perplexity, vLLM, LM Studio, or another provider without built-in support.
-
### Self-hosted solutions
Run models locally or in your own infrastructure:
@@ -46,14 +42,19 @@ Run models locally or in your own infrastructure:
- [vLLM]({{< link-hextra path="/llm/providers/openai-compatible/#vllm" >}})
- [LM Studio]({{< link-hextra path="/llm/providers/openai-compatible/#lm-studio" >}})
+### Custom providers
+
+Use [Custom provider]({{< link-hextra path="/llm/providers/openai-compatible/" >}}) for other providers without direct support such as Perplexity, vLLM, or LM Studio.
+Agentgateway supports all of the common LLM formats and can generally integrate with any provider ([file an issue](https://github.com/agentgateway/agentgateway/issues/new) if one is missing!).
+
## Using the API
-By default, requests to agentgateway use the [OpenAI Chat Completions](https://developers.openai.com/api/reference/chat-completions/overview) API.
-These requests are translated to the upstream provider's API.
+Agentgateway exposes multiple different API endpoints, including [OpenAI Chat Completions](https://developers.openai.com/api/reference/chat-completions/overview), [Anthropic Messages](https://platform.claude.com/docs/en/api/messages), and more.
+Depending on the API used in the request, and the provider selected, agentgateway can pass the request through or translate it as needed.
-Using the Chat Completions API works exactly the same as consuming OpenAI, with a change to the base URL.
-This allows you to continue using existing code and SDKs.
+This enables a unified API regardless of the provider used, allowing seamlessly connecting clients (regardless of which API they use) to any provider.
+Below shows some basic examples using the Chat Completions API
{{< callout type="info" >}}
For detailed configuration of specific API endpoint types, including Chat Completions and the OpenAI Realtime API, see [API types]({{< link-hextra path="/llm/api-types/" >}}).
{{< /callout >}}
@@ -62,7 +63,7 @@ For detailed configuration of specific API endpoint types, including Chat Comple
{{% tab %}}
```shell
-curl 'http://localhost:4000/' \
+curl 'http://localhost:4000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
"model": "gpt-3.5-turbo",
@@ -89,7 +90,7 @@ import openai
client = openai.OpenAI(
api_key="anything",
- base_url="http://localhost:4000"
+ base_url="http://localhost:4000/v1"
)
response = client.chat.completions.create(model="gpt-4o-mini", messages = [
@@ -110,7 +111,7 @@ import OpenAI from "openai";
const openai = new OpenAI({
apiKey: "anything",
- baseURL: "http://localhost:4000",
+ baseURL: "http://localhost:4000/v1",
});
const response = await openai.chat.completions.create({
model: "gpt-4o-mini",
@@ -125,16 +126,25 @@ console.log(response);
## Model routing and aliases
-Model routing is configured within the `llm` section of your agentgateway configuration file. The top-level configuration file is organized into sections such as `config`, `binds`, `llm`, `mcp`, `services`, and `workloads`; for a complete overview, see [Configuration overview]({{< link-hextra path="/configuration/overview/" >}}). The `llm` section offers a simplified, model-centric approach compared to the traditional `binds/listeners/routes` model; for more details on the two approaches, see [LLM configuration modes]({{< link-hextra path="/llm/configuration-modes/" >}}). The model configurations shown in this section live under the `llm.models` key.
+Model routing is configured within the `llm` section of your agentgateway configuration file.
+The `llm` section offers a simplified, model-centric approach compared to the traditional `binds/listeners/routes` model; for more details on the two approaches, see [LLM configuration modes]({{< link-hextra path="/llm/configuration-modes/" >}}).
+The model configurations shown in this section live under the `llm.models` key.
+
+Agentgateway routes requests by matching an incoming model name, and then sending it to the configured model.
+The outgoing model can be passed through from the incoming model, be transformed, or be a static model.
-When you configure a model in the `llm` section, two fields control how requests are routed, as shown in the following table.
+Some examples:
+
+* Match `fast` and send to `gpt-mini`.
+* Match `*` and forward the model as-is.
+* Match `openai/*` and strip the `openai/` prefix, forwarding the remaining model as-is.
| Field | Purpose |
|-------|---------|
| `models.name` | The model name to match in incoming client requests. Agentgateway compares this value against the `model` field in the request body. Use a wildcard `*` to match any model name. |
| `params.model` | The model name sent to the upstream provider. If set, this overrides the model from the request. If not set, the model from the request is passed through. |
-### Pass-through mode
+### Passthrough
Use `name: "*"` without setting `params.model` to accept any model name and pass it directly to the provider. This is the simplest configuration for single-provider setups.
@@ -142,16 +152,35 @@ Use `name: "*"` without setting `params.model` to accept any model name and pass
llm:
models:
- name: "*"
- provider: openAI
+ provider: openai
params:
apiKey: "$OPENAI_API_KEY"
```
Clients specify the actual model in their requests, such as `"model": "gpt-4o-mini"`, and agentgateway forwards it to the provider as-is.
+### Prefixed Passthrough
+
+Use `name: "openai/*"` without setting `params.model` to accept model requests like `openai/gpt-4o-mini` and forward to OpenAI as `gpt-4o-mini`.
+This is the recommended approach when you want to expose all models from multiple providers.
+
+```yaml
+llm:
+ models:
+ - name: "*"
+ provider: openai
+ params:
+ apiKey: "$OPENAI_API_KEY"
+ transformation:
+ model: llmRequest.model.stripPrefix("openai/")
+```
+
+Clients specify the provider and model in their requests, such as `"model": "openai/gpt-4o-mini"`, and agentgateway forwards to `gpt-4o-mini`
+
### Model aliases
-Set `name` to a user-friendly alias and `params.model` to the actual provider model. This lets you decouple client-facing model names from provider-specific identifiers, making it easier to swap models without updating client code.
+Set `name` to a user-friendly alias and `params.model` to the actual provider model.
+This lets you decouple client-facing model names from provider-specific identifiers, making it easier to swap models without updating client code.
```yaml
llm:
@@ -172,17 +201,13 @@ Clients send `"model": "fast"` or `"model": "smart"`, and agentgateway translate
### Route priority
-When multiple models match a request, agentgateway selects the best match by using the following priority order:
-
-1. **Match specificity**: Routes with more match criteria take priority. For example, a route with two header matchers ranks higher than a route with one.
-2. **Config order**: When two routes have equal specificity, the route listed first in the configuration file takes priority.
-
-This means you can control tie-breaking behavior by ordering your models in the config. Place more specific routes before generic or wildcard routes to ensure they match first.
+When multiple models match a request, the more precise match takes precedence.
+For example, with the configuration below, requests with `accounts/fireworks/*` will match the `fireworks` provider first:
```yaml
llm:
models:
- # Specific route — listed first, wins ties against the wildcard
+ # Specific route: wins ties against the wildcard
- name: "accounts/fireworks/*"
provider: fireworks
matches:
@@ -192,9 +217,7 @@ llm:
exact: "eng"
params:
apiKey: "$FIREWORKS_API_KEY"
- # Optional. Override the default Fireworks endpoint:
- # baseUrl: "https://api.fireworks.ai/inference/v1"
- # Catch-all route — matches anything, but lower priority
+ # Catch-all route: matches anything, but lower priority
- name: "*"
provider: openAI
matches:
diff --git a/content/docs/standalone/main/llm/api-keys.md b/content/docs/standalone/main/llm/api-keys.md
deleted file mode 100644
index 95e6da66a..000000000
--- a/content/docs/standalone/main/llm/api-keys.md
+++ /dev/null
@@ -1,125 +0,0 @@
----
-title: Manage API keys
-weight: 40
-description: Manage API keys for LLM provider authentication.
-prev: /llm/providers
----
-
-Managing API keys is an important security mechanism to prevent unauthorized access to your LLM provider. If API keys are compromised, attackers can deliberately run expensive queries, such as large and recursive prompts, at your expense.
-
-You can choose between the following options to provide an API key to agentgateway:
-* Inline
-* Environment variable
-* File
-* Kubernetes secret or passthrough token
-
-Follow the instructions in this guide to learn how to use these different methods.
-
-## Before you begin
-
-{{< reuse "agw-docs/snippets/prereq-agentgateway.md" >}}
-
-## Configure your agentgateway proxy
-
-Browse through the tabs to learn about different ways for how to provide your API key to agentgateway.
-
-{{< tabs items="Inline,Environment variable,File,Kubernetes secret or passthrough token" >}}
-
-{{% tab %}}
-
-You can provide your API key directly in the agentgateway configuration. This option is the least secure. Only use this option for quick tests.
-
-1. Configure the agentgateway proxy and enter your key in the `params.apiKey` field directly.
- ```yaml
- cat < config.yaml
- # yaml-language-server: $schema=https://agentgateway.dev/schema/config
- llm:
- models:
- - name: "*"
- provider: openAI
- params:
- apiKey: "sk-proj...."
- EOF
- ```
-
-{{% /tab %}}
-{{% tab %}}
-
-1. Get the token from your LLM provider, such as an API key to OpenAI and save it as an environment variable.
- ```sh
- export OPENAI_API_KEY=
- ```
-
-2. Configure the agentgateway proxy to refer to that environment variable. Agentgateway automatically replaces the value of the variable with the value that is stored in the environment.
- ```yaml
- cat <<'EOF' > config.yaml
- # yaml-language-server: $schema=https://agentgateway.dev/schema/config
- llm:
- models:
- - name: "*"
- provider: openAI
- params:
- apiKey: "$OPENAI_API_KEY"
- EOF
- ```
-
-{{% /tab %}}
-{{% tab %}}
-
-You can store your API key in a file and load the file into agentgateway during startup.
-
-1. Save your API key in a file, such as `key.txt`.
- ```sh
- echo "" >> key.txt
- ```
-
-2. Load the key from the file into an environment variable and configure the agentgateway proxy.
- ```sh
- export OPENAI_API_KEY=$(cat key.txt)
- ```
-
- ```yaml
- cat <<'EOF' > config.yaml
- # yaml-language-server: $schema=https://agentgateway.dev/schema/config
- llm:
- models:
- - name: "*"
- provider: openAI
- params:
- apiKey: "$OPENAI_API_KEY"
- EOF
- ```
-{{% /tab %}}
-{{% tab %}}
-
-When deploying agentgateway on Kubernetes, you can leverage Kubernetes secrets to store your API key or pass through a token by using an `Authorization` or other custom header.
-
-For more information, see the [agentgateway on Kubernetes docs](https://agentgateway.dev/docs/kubernetes/latest/llm/api-keys/).
-
-{{% /tab %}}
-{{< /tabs >}}
-
-## Authenticate incoming LLM API calls
-
-In addition to sending provider API keys upstream, you can authenticate incoming requests on the local LLM listener with `llm.policies.apiKey`.
-
-Set `llm.policies.apiKey.mode: permissive` when you want to populate API key metadata for later policies (for example, authorization or logging), without rejecting requests based on authentication.
-
-```yaml
-# yaml-language-server: $schema=https://agentgateway.dev/schema/config
-llm:
- policies:
- apiKey:
- mode: permissive
- keys:
- - key: sk-team-engineering
- metadata:
- team: engineering
- models:
- - name: "*"
- provider: openAI
- params:
- apiKey: "$OPENAI_API_KEY"
-```
-
-For the authentication mode semantics (`strict`, `optional`, and `permissive`), see [API Key authentication]({{< link-hextra path="/configuration/security/apikey-authn/" >}}).
diff --git a/content/docs/standalone/main/llm/api-types/_index.md b/content/docs/standalone/main/llm/api-types/_index.md
index b39d4af96..6ff6eccb4 100644
--- a/content/docs/standalone/main/llm/api-types/_index.md
+++ b/content/docs/standalone/main/llm/api-types/_index.md
@@ -5,14 +5,17 @@ description: Supported LLM API endpoint types and route configurations
test: skip
---
-Agentgateway supports multiple LLM API endpoint types, called *route types*, that determine how clients interact with the gateway and how requests are routed to backends. In the simplified `llm` configuration, agentgateway maps standard endpoint paths to these route types automatically. In the `binds/listeners/routes` configuration, you set the route type explicitly in the `policies.ai.routes` map.
+Agentgateway natively supports multiple LLM API endpoint types.
+These are automatically exposed on the gateway, and translated as appropriate based on the provider.
The following API types have dedicated guides:
-- **[Chat completions]({{< link-hextra path="/llm/api-types/completions/" >}})** — The OpenAI `/v1/chat/completions` endpoint. This is the most widely used API type for text generation and chat applications.
-- **[Responses]({{< link-hextra path="/llm/api-types/responses/" >}})** — The OpenAI `/v1/responses` endpoint for stateful, multi-step model interactions.
-- **[Messages]({{< link-hextra path="/llm/api-types/messages/" >}})** — The Anthropic `/v1/messages` endpoint for Claude models.
-- **[Realtime]({{< link-hextra path="/llm/api-types/realtime/" >}})** — The OpenAI Realtime API for low-latency, streaming voice and text interactions over WebSockets.
-- **[Passthrough]({{< link-hextra path="/llm/api-types/passthrough/" >}})** — Forwards requests directly to the backend provider without transformation.
-
-Agentgateway also recognizes additional route types for specific endpoints, including `embeddings` (`/v1/embeddings`), `models` (`/v1/models`), and `anthropicTokenCount` (`/v1/messages/count_tokens`).
+- **[Chat completions]({{< link-hextra path="/llm/api-types/completions/" >}})**: The OpenAI `/v1/chat/completions` endpoint. This is the most widely used API type for text generation and chat applications.
+- **[Responses]({{< link-hextra path="/llm/api-types/responses/" >}})**: The OpenAI `/v1/responses` endpoint for stateful, multi-step model interactions.
+- **[Messages]({{< link-hextra path="/llm/api-types/messages/" >}})**: The Anthropic `/v1/messages` endpoint for Claude models.
+- **[Embeddings]({{< link-hextra path="/llm/api-types/embeddings/" >}})**: The OpenAI-compatible `/v1/embeddings` endpoint for creating vector representations of text.
+- **[Realtime]({{< link-hextra path="/llm/api-types/realtime/" >}})**: The OpenAI Realtime API for low-latency, streaming voice and text interactions over WebSockets.
+- **[Rerank]({{< link-hextra path="/llm/api-types/rerank/" >}})**: The Cohere-compatible `/v2/rerank` endpoint for ranking documents by relevance to a query.
+- **[Models]({{< link-hextra path="/llm/api-types/models/" >}})**: The OpenAI-compatible `/v1/models` endpoint for listing available models.
+- **[Token count]({{< link-hextra path="/llm/api-types/token-count/" >}})**: The Anthropic `/v1/messages/count_tokens` endpoint for estimating input tokens.
+- **[Passthrough]({{< link-hextra path="/llm/api-types/passthrough/" >}})**: Forwards requests directly to the backend provider without transformation.
diff --git a/content/docs/standalone/main/llm/api-types/completions.md b/content/docs/standalone/main/llm/api-types/completions.md
index d75c75a13..660b63bc3 100644
--- a/content/docs/standalone/main/llm/api-types/completions.md
+++ b/content/docs/standalone/main/llm/api-types/completions.md
@@ -11,8 +11,6 @@ The OpenAI Chat Completions API (`/v1/chat/completions`) is the primary interfac
The [OpenAI Chat Completions API](https://developers.openai.com/api/docs/guides/text) is the most widely used LLM endpoint. Agentgateway proxies these requests to your configured providers while providing token usage tracking, observability metrics, and policy enforcement.
-By default, requests to agentgateway use the Chat Completions API. These requests are translated to the upstream provider's native API format when necessary.
-
## Route type configuration
In the simplified `llm` configuration, agentgateway automatically maps `/v1/chat/completions` requests to the `completions` route type, so no explicit route configuration is required.
@@ -56,7 +54,7 @@ For detailed information about model routing and configuration modes, see [Model
Using the Chat Completions API works exactly the same as consuming OpenAI directly, with only a change to the base URL. This allows you to continue using existing code and SDKs.
-{{< tabs items="Curl,Python,JavaScript" >}}
+{{< tabs items="Curl,Python,JavaScript,Other" >}}
{{% tab %}}
```shell
@@ -85,7 +83,7 @@ import openai
client = openai.OpenAI(
api_key="anything",
- base_url="http://localhost:4000"
+ base_url="http://localhost:4000/v1"
)
response = client.chat.completions.create(
@@ -109,7 +107,7 @@ import OpenAI from "openai";
const openai = new OpenAI({
apiKey: "anything",
- baseURL: "http://localhost:4000",
+ baseURL: "http://localhost:4000/v1",
});
const response = await openai.chat.completions.create({
@@ -121,13 +119,9 @@ console.log(response);
```
{{% /tab %}}
-{{< /tabs >}}
-
-## Token usage tracking
-
-After sending Chat Completions requests, verify that agentgateway recorded token usage metrics.
+{{% tab %}}
-1. Open the agentgateway [metrics endpoint](http://localhost:15020/metrics).
-2. Look for the `agentgateway_gen_ai_client_token_usage` metric. The metric includes labels for the token type (`input` or `output`) and the model used.
+[View other LLM client integrations](/docs/standalone/main/integrations/llm-clients/).
-For more information about LLM metrics and observability, see [Observe traffic]({{< link-hextra path="/llm/observability/" >}}).
+{{% /tab %}}
+{{< /tabs >}}
diff --git a/content/docs/standalone/main/llm/api-types/embeddings.md b/content/docs/standalone/main/llm/api-types/embeddings.md
new file mode 100644
index 000000000..3352851de
--- /dev/null
+++ b/content/docs/standalone/main/llm/api-types/embeddings.md
@@ -0,0 +1,78 @@
+---
+title: Embeddings
+weight: 35
+description: Send embedding requests through agentgateway using the OpenAI-compatible Embeddings API.
+test: skip
+---
+
+The Embeddings API (`/v1/embeddings`) creates vector representations of text that you can use for search, retrieval, clustering, and other semantic workflows.
+
+## About
+
+Agentgateway supports the OpenAI-compatible Embeddings API. Requests to `/v1/embeddings` are routed to your configured provider while agentgateway applies the same routing, authentication, observability, and policy framework that you use for other LLM traffic.
+
+## Route type configuration
+
+In the simplified `llm` configuration, agentgateway automatically maps `/v1/embeddings` requests to the `embeddings` route type, so no explicit route configuration is required.
+
+```yaml
+# yaml-language-server: $schema=https://agentgateway.dev/schema/config
+llm:
+ models:
+ - name: "*"
+ provider: openAI
+ params:
+ apiKey: "$OPENAI_API_KEY"
+```
+
+To configure the route type explicitly, use the `binds/listeners/routes` format and set the `embeddings` route type in the `policies.ai.routes` map.
+
+```yaml
+# yaml-language-server: $schema=https://agentgateway.dev/schema/config
+binds:
+- port: 4000
+ listeners:
+ - routes:
+ - backends:
+ - ai:
+ name: openai
+ provider:
+ openAI: {}
+ policies:
+ ai:
+ routes:
+ "/v1/embeddings": "embeddings"
+ backendAuth:
+ key: "$OPENAI_API_KEY"
+```
+
+{{< callout type="info" >}}
+For detailed information about model routing and configuration modes, see [Model routing and aliases]({{< link-hextra path="/llm/about/" >}}).
+{{< /callout >}}
+
+## Using the API
+
+Send a request to the `/v1/embeddings` endpoint. The response includes an embedding vector for each input item.
+
+{{< tabs items="Curl,Other" >}}
+{{% tab %}}
+
+```shell
+curl 'http://localhost:4000/v1/embeddings' \
+--header 'Content-Type: application/json' \
+--data '{
+ "model": "text-embedding-3-small",
+ "input": [
+ "agentgateway routes LLM traffic",
+ "embeddings turn text into vectors"
+ ]
+}'
+```
+
+{{% /tab %}}
+{{% tab %}}
+
+[View other LLM client integrations](/docs/standalone/main/integrations/llm-clients/).
+
+{{% /tab %}}
+{{< /tabs >}}
diff --git a/content/docs/standalone/main/llm/api-types/messages.md b/content/docs/standalone/main/llm/api-types/messages.md
index a5c9923af..690f50e43 100644
--- a/content/docs/standalone/main/llm/api-types/messages.md
+++ b/content/docs/standalone/main/llm/api-types/messages.md
@@ -9,9 +9,12 @@ The Anthropic Messages API (`/v1/messages`) is the native interface for Anthropi
## About
-The [Anthropic Messages API](https://platform.claude.com/docs/en/api/messages) is the primary endpoint for Claude models. Agentgateway proxies these requests to your configured providers while providing token usage tracking, observability metrics, and policy enforcement. Agentgateway automatically adds the `x-api-key` and `anthropic-version` headers that the Anthropic API requires.
+The [Anthropic Messages API](https://platform.claude.com/docs/en/api/messages) is the primary endpoint for Claude models.
+Agentgateway proxies these requests to your configured providers while providing token usage tracking, observability metrics, and policy enforcement.
-The related `/v1/messages/count_tokens` endpoint, which estimates token usage before sending a request, is handled by the `anthropicTokenCount` route type.
+When using the Anthropic provider, Agentgateway automatically handles additional requirements, such as the `x-api-key` and `anthropic-version` headers that the Anthropic API requires.
+
+The related [`/v1/messages/count_tokens`]({{< link-hextra path="/llm/api-types/token-count/" >}}) endpoint estimates token usage before sending a request and is handled by the `anthropicTokenCount` route type.
## Route type configuration
@@ -57,6 +60,9 @@ For detailed information about model routing and configuration modes, see [Model
Send a request to the `/v1/messages` endpoint. The request is forwarded to the Anthropic API and the response is returned to the client.
+{{< tabs items="Curl,Other" >}}
+{{% tab %}}
+
```shell
curl -X POST http://localhost:4000/v1/messages \
-H "Content-Type: application/json" \
@@ -67,13 +73,12 @@ curl -X POST http://localhost:4000/v1/messages \
}'
```
-For Anthropic-specific features such as token counting, extended thinking, and structured outputs, see the [Anthropic provider]({{< link-hextra path="/llm/providers/anthropic/" >}}) guide.
+{{% /tab %}}
+{{% tab %}}
-## Token usage tracking
+[View other LLM client integrations](/docs/standalone/main/integrations/llm-clients/).
-After sending Messages requests, verify that agentgateway recorded token usage metrics.
+{{% /tab %}}
+{{< /tabs >}}
-1. Open the agentgateway [metrics endpoint](http://localhost:15020/metrics).
-2. Look for the `agentgateway_gen_ai_client_token_usage` metric. The metric includes labels for the token type (`input` or `output`) and the model used.
-
-For more information about LLM metrics and observability, see [Observe traffic]({{< link-hextra path="/llm/observability/" >}}).
+For Anthropic-specific features such as token counting, extended thinking, and structured outputs, see the [Anthropic provider]({{< link-hextra path="/llm/providers/anthropic/" >}}) guide.
diff --git a/content/docs/standalone/main/llm/api-types/models.md b/content/docs/standalone/main/llm/api-types/models.md
new file mode 100644
index 000000000..377050e69
--- /dev/null
+++ b/content/docs/standalone/main/llm/api-types/models.md
@@ -0,0 +1,49 @@
+---
+title: Models
+weight: 55
+description: List available models through agentgateway using the OpenAI-compatible Models API.
+test: skip
+---
+
+The Models API (`/v1/models`) lists the models that are available through the configured LLM provider.
+
+## About
+
+Agentgateway supports the OpenAI-compatible Models API. Use this endpoint when clients need to discover available model IDs, such as web UIs, SDKs, or developer tools that populate model selectors from `/v1/models`.
+
+## Route type configuration
+
+In the simplified `llm` configuration, agentgateway automatically maps `/v1/models` requests to the `models` route type, so no explicit route configuration is required.
+
+```yaml
+# yaml-language-server: $schema=https://agentgateway.dev/schema/config
+llm:
+ models:
+ - name: "*"
+ provider: openAI
+ params:
+ apiKey: "$OPENAI_API_KEY"
+```
+
+{{< callout type="info" >}}
+For detailed information about model routing and configuration modes, see [Model routing and aliases]({{< link-hextra path="/llm/about/" >}}).
+{{< /callout >}}
+
+## Using the API
+
+Send a request to the `/v1/models` endpoint to list models from the upstream provider.
+
+{{< tabs items="Curl,Other" >}}
+{{% tab %}}
+
+```shell
+curl 'http://localhost:4000/v1/models'
+```
+
+{{% /tab %}}
+{{% tab %}}
+
+[View other LLM client integrations](/docs/standalone/main/integrations/llm-clients/).
+
+{{% /tab %}}
+{{< /tabs >}}
diff --git a/content/docs/standalone/main/llm/api-types/rerank.md b/content/docs/standalone/main/llm/api-types/rerank.md
new file mode 100644
index 000000000..a1471cde0
--- /dev/null
+++ b/content/docs/standalone/main/llm/api-types/rerank.md
@@ -0,0 +1,85 @@
+---
+title: Rerank
+weight: 45
+description: Send rerank requests through agentgateway using the Cohere-compatible Rerank API.
+test: skip
+---
+
+The Rerank API (`/v2/rerank`) scores a list of documents against a query and returns the most relevant results in ranked order.
+
+## About
+
+Agentgateway supports the Cohere-compatible Rerank API. Use rerank when you already have a candidate set of documents, such as from keyword search or vector search, and want a model to reorder those documents by relevance to a query.
+
+Agentgateway also recognizes `/v1/rerank` as a rerank route, but `/v2/rerank` is the Cohere-compatible endpoint.
+
+## Route type configuration
+
+In the simplified `llm` configuration, agentgateway automatically maps `/v2/rerank` requests to the `rerank` route type, so no explicit route configuration is required.
+
+```yaml
+# yaml-language-server: $schema=https://agentgateway.dev/schema/config
+llm:
+ models:
+ - name: "*"
+ provider: cohere
+ params:
+ apiKey: "$COHERE_API_KEY"
+```
+
+To configure the route type explicitly, use the `binds/listeners/routes` format and set the `rerank` route type in the `policies.ai.routes` map.
+
+```yaml
+# yaml-language-server: $schema=https://agentgateway.dev/schema/config
+binds:
+- port: 4000
+ listeners:
+ - routes:
+ - backends:
+ - ai:
+ name: cohere
+ provider:
+ cohere: {}
+ policies:
+ ai:
+ routes:
+ "/v2/rerank": "rerank"
+ backendAuth:
+ key: "$COHERE_API_KEY"
+```
+
+{{< callout type="info" >}}
+For detailed information about model routing and configuration modes, see [Model routing and aliases]({{< link-hextra path="/llm/about/" >}}).
+{{< /callout >}}
+
+## Using the API
+
+Send a request to the `/v2/rerank` endpoint with a query and candidate documents. The response ranks the documents by relevance.
+
+{{< tabs items="Curl,Other" >}}
+{{% tab %}}
+
+```shell
+curl 'http://localhost:4000/v2/rerank' \
+--header 'Content-Type: application/json' \
+--data '{
+ "model": "rerank-v3.5",
+ "query": "What does agentgateway do?",
+ "documents": [
+ "agentgateway routes, secures, and observes agent and LLM traffic.",
+ "A bicycle drivetrain transfers power from pedals to wheels.",
+ "Vector databases store embeddings for semantic search."
+ ],
+ "top_n": 2
+}'
+```
+
+{{% /tab %}}
+{{% tab %}}
+
+[View other LLM client integrations](/docs/standalone/main/integrations/llm-clients/).
+
+{{% /tab %}}
+{{< /tabs >}}
+
+For more information about configuring Cohere, see the [Cohere provider]({{< link-hextra path="/llm/providers/cohere/" >}}) guide.
diff --git a/content/docs/standalone/main/llm/api-types/responses.md b/content/docs/standalone/main/llm/api-types/responses.md
index db5bf0cdb..30a1c8c90 100644
--- a/content/docs/standalone/main/llm/api-types/responses.md
+++ b/content/docs/standalone/main/llm/api-types/responses.md
@@ -54,7 +54,7 @@ For detailed information about model routing and configuration modes, see [Model
Using the Responses API works exactly the same as consuming OpenAI directly, with only a change to the base URL. This allows you to continue using existing code and SDKs.
-{{< tabs items="Curl,Python,JavaScript" >}}
+{{< tabs items="Curl,Python,JavaScript,Other" >}}
{{% tab %}}
```shell
@@ -78,7 +78,7 @@ import openai
client = openai.OpenAI(
api_key="anything",
- base_url="http://localhost:4000"
+ base_url="http://localhost:4000/v1"
)
response = client.responses.create(
@@ -97,7 +97,7 @@ import OpenAI from "openai";
const openai = new OpenAI({
apiKey: "anything",
- baseURL: "http://localhost:4000",
+ baseURL: "http://localhost:4000/v1",
});
const response = await openai.responses.create({
@@ -109,13 +109,9 @@ console.log(response);
```
{{% /tab %}}
-{{< /tabs >}}
-
-## Token usage tracking
-
-After sending Responses requests, verify that agentgateway recorded token usage metrics.
+{{% tab %}}
-1. Open the agentgateway [metrics endpoint](http://localhost:15020/metrics).
-2. Look for the `agentgateway_gen_ai_client_token_usage` metric. The metric includes labels for the token type (`input` or `output`) and the model used.
+[View other LLM client integrations](/docs/standalone/main/integrations/llm-clients/).
-For more information about LLM metrics and observability, see [Observe traffic]({{< link-hextra path="/llm/observability/" >}}).
+{{% /tab %}}
+{{< /tabs >}}
diff --git a/content/docs/standalone/main/llm/api-types/token-count.md b/content/docs/standalone/main/llm/api-types/token-count.md
new file mode 100644
index 000000000..2602f261f
--- /dev/null
+++ b/content/docs/standalone/main/llm/api-types/token-count.md
@@ -0,0 +1,83 @@
+---
+title: Token count
+weight: 60
+description: Count tokens through agentgateway using the Anthropic Messages token-count API.
+test: skip
+---
+
+The Anthropic token-count API (`/v1/messages/count_tokens`) estimates the number of input tokens in an Anthropic Messages request before sending it to a model.
+
+## About
+
+Agentgateway supports the Anthropic Messages token-count endpoint with the `anthropicTokenCount` route type. Use this endpoint when clients need to estimate request size before calling `/v1/messages`, such as to enforce budgets, avoid context-window limits, or show usage estimates.
+
+## Route type configuration
+
+In the simplified `llm` configuration, agentgateway automatically maps `/v1/messages/count_tokens` requests to the `anthropicTokenCount` route type, so no explicit route configuration is required.
+
+```yaml
+# yaml-language-server: $schema=https://agentgateway.dev/schema/config
+llm:
+ models:
+ - name: "*"
+ provider: anthropic
+ params:
+ apiKey: "$ANTHROPIC_API_KEY"
+```
+
+To configure the route type explicitly, use the `binds/listeners/routes` format and set the `anthropicTokenCount` route type in the `policies.ai.routes` map. Most configurations also map `/v1/messages` to the `messages` route type for the actual model request.
+
+```yaml
+# yaml-language-server: $schema=https://agentgateway.dev/schema/config
+binds:
+- port: 4000
+ listeners:
+ - routes:
+ - backends:
+ - ai:
+ name: anthropic
+ provider:
+ anthropic: {}
+ policies:
+ ai:
+ routes:
+ "/v1/messages": "messages"
+ "/v1/messages/count_tokens": "anthropicTokenCount"
+ backendAuth:
+ key: "$ANTHROPIC_API_KEY"
+```
+
+{{< callout type="info" >}}
+For detailed information about model routing and configuration modes, see [Model routing and aliases]({{< link-hextra path="/llm/about/" >}}).
+{{< /callout >}}
+
+## Using the API
+
+Send a request to the `/v1/messages/count_tokens` endpoint with the same message shape that you would send to `/v1/messages`.
+
+{{< tabs items="Curl,Other" >}}
+{{% tab %}}
+
+```shell
+curl 'http://localhost:4000/v1/messages/count_tokens' \
+--header 'Content-Type: application/json' \
+--data '{
+ "model": "claude-opus-4-6",
+ "messages": [
+ {
+ "role": "user",
+ "content": "How many tokens are in this request?"
+ }
+ ]
+}'
+```
+
+{{% /tab %}}
+{{% tab %}}
+
+[View other LLM client integrations](/docs/standalone/main/integrations/llm-clients/).
+
+{{% /tab %}}
+{{< /tabs >}}
+
+For Anthropic-specific features such as Messages, token counting, extended thinking, and structured outputs, see the [Anthropic provider]({{< link-hextra path="/llm/providers/anthropic/" >}}) guide.
diff --git a/content/docs/standalone/main/llm/content-routing.md b/content/docs/standalone/main/llm/content-routing.md
deleted file mode 100644
index 2583f1476..000000000
--- a/content/docs/standalone/main/llm/content-routing.md
+++ /dev/null
@@ -1,258 +0,0 @@
----
-title: Content-based routing
-weight: 45
-description: Route requests to different LLM backends based on request body content, such as the requested model name.
----
-
-Route requests to different LLM backends based on the content of the request body, not just headers or path (also known as body-based routing or intelligent routing).
-
-## About content-based routing
-
-Content-based routing allows you to route requests to different backends based on fields in the request body, such as the `model` field in an LLM API request. This is useful when you want to:
-
-- Route different models to different providers (e.g., `gpt-4` to OpenAI, `claude-3` to Anthropic)
-- Direct certain models to specific backends based on cost or performance
-- Route based on custom fields like user tier or priority level
-
-Agentgateway implements content-based routing by using transformations to extract values from the request body into headers, then using header-based routing rules to select the appropriate backend.
-
-### How it works
-
-Content-based routing works in two steps:
-
-1. **Extract body field to header**: Use a transformation policy to extract a field from the JSON request body (like `model`) into a custom header
-2. **Match on header**: Use header matching in the route to route based on that header value
-
-This pattern lets you route based on any field in the request body while using standard routing capabilities.
-
-## Before you begin
-
-{{< reuse "agw-docs/snippets/prereq-agentgateway.md" >}}
-
-## Route by model name
-
-This example shows how to route requests to different backends based on the `model` field in the request body.
-
-1. Create a configuration file with multiple routes that extract the `model` field from the request body and match on it. Each route uses a transformation to extract the model name into the `x-model` header, then matches on that header value.
-
- ```yaml
- cat < config.yaml
- # yaml-language-server: $schema=https://agentgateway.dev/schema/config
- binds:
- - port: 3000
- listeners:
- - routes:
- # Route GPT models to OpenAI
- - matches:
- - path:
- pathPrefix: "/"
- headers:
- - name: "x-model"
- value:
- regex: "^gpt-.*"
- backends:
- - ai:
- name: openai
- provider:
- openAI:
- model: gpt-4o
- policies:
- backendAuth:
- key: "$OPENAI_API_KEY"
- transformations:
- request:
- set:
- x-model: 'json(request.body).model'
- cors:
- allowOrigins:
- - "*"
- allowHeaders:
- - "*"
- # Route Claude models to Anthropic
- - matches:
- - path:
- pathPrefix: "/"
- headers:
- - name: "x-model"
- value:
- regex: "^claude-.*"
- backends:
- - ai:
- name: anthropic
- provider:
- anthropic:
- model: claude-3-5-sonnet-latest
- policies:
- backendAuth:
- key: "$ANTHROPIC_API_KEY"
- transformations:
- request:
- set:
- x-model: 'json(request.body).model'
- cors:
- allowOrigins:
- - "*"
- allowHeaders:
- - "*"
- EOF
- ```
-
- {{< reuse "agw-docs/snippets/review-table.md" >}}
-
- | Setting | Description |
- | --- | --- |
- | `matches.headers.name` | The name of the header to match on. In this example, `x-model` is the custom header that contains the extracted model name. |
- | `matches.headers.value.regex` | A regular expression to match the header value. Routes with `^gpt-.*` match any model starting with "gpt", while `^claude-.*` matches any model starting with "claude". |
- | `transformations.request.set` | A CEL expression that extracts the `model` field from the JSON request body using `json(request.body).model` and sets it as the `x-model` header. |
-
-2. Run the agentgateway.
- ```sh
- agentgateway -f config.yaml
- ```
-
-3. Send a request with `gpt-4o` in the model field. Verify that the request routes to the OpenAI backend.
-
- ```sh
- curl 'http://0.0.0.0:3000/' \
- --header 'Content-Type: application/json' \
- --data '{
- "model": "gpt-4o",
- "messages": [
- {
- "role": "user",
- "content": "Say hello"
- }
- ]
- }' | jq -r '.model'
- ```
-
- Example output:
- ```
- gpt-4o-2024-08-06
- ```
-
-4. Send a request with `claude-3-5-sonnet-latest` in the model field. Verify that the request routes to the Anthropic backend.
-
- ```sh
- curl 'http://0.0.0.0:3000/' \
- --header 'Content-Type: application/json' \
- --data '{
- "model": "claude-3-5-sonnet-latest",
- "messages": [
- {
- "role": "user",
- "content": "Say hello"
- }
- ]
- }' | jq -r '.model'
- ```
-
- Example output:
- ```
- claude-3-5-sonnet-20241022
- ```
-
-## Route by custom field
-
-You can extract any field from the request body for routing decisions, not just the `model` field.
-
-This example shows routing based on a custom `priority` field in the request body to route high-priority requests to a more powerful model.
-
-```yaml
-cat < config.yaml
-# yaml-language-server: $schema=https://agentgateway.dev/schema/config
-binds:
-- port: 3000
- listeners:
- - routes:
- # High priority route
- - matches:
- - path:
- pathPrefix: "/"
- headers:
- - name: "x-priority"
- value:
- exact: "high"
- backends:
- - ai:
- name: openai-premium
- provider:
- openAI:
- model: gpt-4o
- policies:
- backendAuth:
- key: "$OPENAI_API_KEY"
- transformations:
- request:
- set:
- x-priority: 'coalesce(json(request.body).priority, "standard")'
- cors:
- allowOrigins:
- - "*"
- allowHeaders:
- - "*"
- # Standard priority route (default)
- - matches:
- - path:
- pathPrefix: "/"
- backends:
- - ai:
- name: openai-standard
- provider:
- openAI:
- model: gpt-4o-mini
- policies:
- backendAuth:
- key: "$OPENAI_API_KEY"
- transformations:
- request:
- set:
- x-priority: 'coalesce(json(request.body).priority, "standard")'
- cors:
- allowOrigins:
- - "*"
- allowHeaders:
- - "*"
-EOF
-```
-
-{{< callout type="info" >}}
-The `coalesce()` function returns the first non-null value from its arguments. This provides a default value if the field is missing, preventing errors when the custom field is not included in requests.
-{{< /callout >}}
-
-Test the routing by sending requests with different priority values:
-
-```sh
-# High priority request - routes to gpt-4o
-curl 'http://0.0.0.0:3000/' \
---header 'Content-Type: application/json' \
---data '{
- "model": "gpt-4o",
- "priority": "high",
- "messages": [{"role": "user", "content": "Urgent request"}]
-}' | jq -r '.model'
-```
-
-```sh
-# Standard priority request - routes to gpt-4o-mini
-curl 'http://0.0.0.0:3000/' \
---header 'Content-Type: application/json' \
---data '{
- "model": "gpt-4o",
- "messages": [{"role": "user", "content": "Normal request"}]
-}' | jq -r '.model'
-```
-
-## Known limitations
-
-When implementing content-based routing, be aware of these limitations:
-
-- **Route order matters**: Routes are evaluated in the order they appear in the configuration. Place more specific routes (with header matches) before generic routes (without matches) to ensure proper routing.
-- **Performance impact**: Extracting fields from the request body adds processing overhead. For high-throughput scenarios, consider using header-based routing when possible.
-- **JSON parsing**: The `json()` CEL function requires valid JSON. Malformed JSON in the request body will cause routing failures.
-
-## Next steps
-
-- Learn about [transformations](../../configuration/traffic-management/transformations/) for more advanced request manipulation
-- Set up [backend routing](../../configuration/routes/) for multiple backends
-- Configure [rate limiting]({{< link-hextra path="/llm/virtual-keys/" >}}) to control costs per route
diff --git a/content/docs/standalone/main/llm/costs.md b/content/docs/standalone/main/llm/costs.md
index d3edbe797..0f3a80508 100644
--- a/content/docs/standalone/main/llm/costs.md
+++ b/content/docs/standalone/main/llm/costs.md
@@ -6,146 +6,142 @@ test:
costs:
- file: content/docs/standalone/main/llm/costs.md
path: costs
+aliases:
+ - /llm/spending/
---
-Agentgateway can compute the realized USD cost of each LLM request when you provide a model cost catalog. With a catalog in place, agentgateway attributes cost per request in access logs, traces, and metrics, and exposes the values to CEL expressions as `llm.cost` and `llm.costRates`.
+Agentgateway can track LLM spend by mapping each request's provider, model, and token counts to per-token pricing.
-Agentgateway does not ship a built-in catalog. Costs are computed only when you configure one (for example, a catalog that you generate with [`agctl costs import`](#generate-a-catalog-with-agctl)).
+Agentgateway extracts token usage from supported LLM APIs automatically. To convert those token counts into cost, configure a model cost catalog. The catalog maps provider and model names to pricing data so agentgateway can attach realized USD cost to logs, traces, metrics, and CEL expressions.
-## Before you begin
+{{< callout type="info" >}}
+Cost analysis is best-effort and may not exactly match your provider bill in scenarios such as price changes, custom pricing, failed requests, or provider-specific billing rules.
+{{< /callout >}}
-{{< reuse "agw-docs/snippets/prereq-agentgateway.md" >}}
+## Configure a model catalog
+Use `config.modelCatalog` to load one or more model cost catalog files. Catalog entries are merged in order, and later entries take precedence. This lets you start with an imported public catalog and then layer local overrides for contracted pricing, internal models, or provider-specific aliases.
+
+```yaml
+# yaml-language-server: $schema=https://agentgateway.dev/schema/config
-## Step 1: Prepare a catalog
+config:
+ modelCatalog:
+ - file: ./costs/catalog.json
-Prepare a catalog by creating your own JSON file or using the `agctl costs import` command.
+llm:
+ models:
+ - name: "*"
+ provider: openAI
+ params:
+ apiKey: "$OPENAI_API_KEY"
+```
-### Catalog JSON format
+Run agentgateway with the config file.
-{{< reuse "agw-docs/snippets/model-catalog-json-format.md" >}}
+```sh
+agentgateway -f config.yaml
+```
-### Generate a catalog with agctl
+After the catalog is loaded, priced requests include cost data. The access log includes `agw.ai.usage.cost.total`, and CEL exposes cost data as `llm.cost` and `llm.costRates`.
-Use `agctl costs import` to generate a catalog JSON file, then reference that file from `config.modelCatalog` or `MODEL_CATALOG_PATHS`.
+For general LLM telemetry setup, see [Observe traffic]({{< link-hextra path="/llm/observability/" >}}).
-1. Generate a catalog from a supported source. By default, `agctl costs import` imports every provider that the proxy supports from [models.dev](https://models.dev).
+## Import costs with agctl
- ```sh
- agctl costs import --pretty --out ./catalog.json
- ```
+Use `agctl costs import` to generate a catalog file from a supported pricing source. The default source is `models.dev`.
-2. To import only a subset of providers, pass a comma-separated list to `--providers`.
+```sh
+mkdir -p costs
+agctl costs import --out ./costs/catalog.json
+```
- ```sh
- agctl costs import --pretty --providers openai,anthropic --out ./catalog.json
- ```
+To keep the catalog smaller, import only the providers that you use.
-3. Reference the generated file from your configuration with `config.modelCatalog[].file` or `MODEL_CATALOG_PATHS`, then run agentgateway.
+```sh
+agctl costs import \
+ --source models.dev \
+ --providers anthropic,google,openai \
+ --out ./costs/catalog.json
+```
-For all options, see the [`agctl costs import`]({{< link-hextra path="/reference/agctl/agctl-costs-import/" >}}) reference.
+For all flags, see the [`agctl costs import`]({{< link-hextra path="/reference/agctl/agctl-costs-import/" >}}) reference.
-## Step 2: Configure catalog sources
+## Import costs in the UI
-Configure one or more catalog sources for agentgateway with the `config.modelCatalog` config section. Sources are merged in order, with later sources taking precedence at the model level.
+You can also import model costs from the Admin UI.
-### Load a catalog from a file
+1. Open the [Admin UI cost page](http://localhost:15000/ui/llm/costs).
+2. Press **Refresh base costs**.
-The `file` field is a path to a catalog JSON file. Agentgateway watches the file and reloads it when it changes.
+The UI fetches the latest base costs and configures `modelCatalog`. You can refresh again later to pull updated pricing and model data.
-```yaml
-# yaml-language-server: $schema=https://agentgateway.dev/schema/config
-config:
- modelCatalog:
- - file: ./catalog.json
-```
+When you set up a fresh configuration for the first time, the UI automatically performs this step.
-### Embed a catalog inline
+## Override catalog entries
-The `inline` field is a string that contains the catalog JSON.
+If your provider pricing differs from the imported public catalog, add another catalog file after the imported one. Later catalog sources override earlier sources.
```yaml
-# yaml-language-server: $schema=https://agentgateway.dev/schema/config
config:
modelCatalog:
- - inline: |
- {
- "providers": {
- "openai": {
- "models": {
- "gpt-4o-mini": {
- "rates": { "input": "0.15", "output": "0.6", "cacheRead": "0.075" }
- }
- }
- }
- }
- }
+ - file: ./costs/catalog.json
+ - file: ./costs/overrides.json
```
-### Load catalog files with an environment variable
+Use overrides for contracted pricing, internally hosted models, or models that do not appear in the imported catalog.
-You can also load one or more catalog files with the `MODEL_CATALOG_PATHS` environment variable, set to a comma-separated list of file paths. The environment variable is useful for container deployments where you mount a catalog file and enable it without editing the main configuration file.
+You can also load one or more catalog files with the `MODEL_CATALOG_PATHS` environment variable. Set it to a comma-separated list of file paths.
```sh
-MODEL_CATALOG_PATHS=./catalog.json,./overrides.json agentgateway -f config.yaml
+MODEL_CATALOG_PATHS=./costs/catalog.json,./costs/overrides.json agentgateway -f config.yaml
```
{{< callout type="warning" >}}
-When `MODEL_CATALOG_PATHS` is set, it **replaces** any `config.modelCatalog` sources; the two are not merged. Use one mechanism or the other.
+When `MODEL_CATALOG_PATHS` is set, it replaces any `config.modelCatalog` sources. Use one mechanism or the other.
{{< /callout >}}
-## Step 3: Configure cost policies
+## Use cost data
-Use cost data in CEL, logs, traces, and metrics policies.
+When a request matches an entry in the catalog, agentgateway populates these CEL fields:
-When a request matches an entry in the catalog, agentgateway populates the following CEL fields:
+- `llm.cost`: The realized USD cost of the request. Includes `total` plus per-token-type components such as `input`, `output`, `cacheRead`, `cacheWrite`, `reasoning`, `inputAudio`, and `outputAudio`. Unset when the model cannot be priced.
+- `llm.costRates`: The effective USD-per-1,000,000-token rates that were applied. Includes the same per-token-type fields when available. Unset when the model cannot be priced.
-- `llm.cost`: The realized USD cost of the request. Includes `total` plus per-token-type components: `input`, `output`, `cacheRead`, `cacheWrite`, `reasoning`, `inputAudio`, and `outputAudio`. Unset when the model cannot be priced.
-- `llm.costRates`: The effective USD-per-1,000,000-token rates that were applied, after tier selection. Includes the same per-token-type fields when available. Unset when the model cannot be priced.
+The request access log always includes `agw.ai.usage.cost.total` for LLM requests when a cost is available.
+Traces always include the full breakdown:
+* `agw.ai.usage.cost.total`
+* `agw.ai.usage.cost.input`
+* `agw.ai.usage.cost.output`
+* `agw.ai.usage.cost.cache_read`
+* `agw.ai.usage.cost.cache_write`
+* `agw.ai.usage.cost.reasoning`
+* `agw.ai.usage.cost.input_audio`
+* `agw.ai.usage.cost.output_audio`
-The request access log always includes `agw.ai.usage.cost.total` for LLM requests (it is `0` when the model cannot be priced). To add the breakdown or rate fields, reference them with CEL in access logs, traces, or metrics:
+As these are loaded into the CEL context, they can be explicitly emited as well
```yaml
# yaml-language-server: $schema=https://agentgateway.dev/schema/config
frontendPolicies:
accessLog:
add:
- llm.cost.total: 'llm.cost.total'
- llm.cost.input: 'llm.cost.input'
- llm.cost.output: 'llm.cost.output'
- llm.cost.cacheRead: 'llm.cost.cacheRead'
- tracing:
- attributes:
- llm.cost.total: 'llm.cost.total'
- llm.costRates.input: 'llm.costRates.input'
- llm.costRates.output: 'llm.costRates.output'
-
-config:
- metrics:
- fields:
- add:
- llm.cost.total: 'llm.cost.total'
- llm.costRates.input: 'llm.costRates.input'
+ # Add the input cost
+ input_cost: llm.cost.input
+ # Add ALL cost variables, as `cost.input`, `cost.output`, etc.
+ cost: flatten(llm.cost)
```
-A priced request produces an access log line that includes the cost fields:
+A priced request produces an access log entry that includes cost data.
-```
+```console
... protocol=llm gen_ai.provider.name=openai gen_ai.request.model=gpt-4o-mini
gen_ai.usage.input_tokens=14 gen_ai.usage.output_tokens=6 agw.ai.usage.cost.total=0.0000057 ...
```
-For more examples, see [Observe traffic]({{< link-hextra path="/llm/observability/" >}}) and the [CEL reference]({{< link-hextra path="/reference/cel/cel-context" >}}).
+## Monitor catalog lookups
-## Step 4: Generate traffic
-
-Generate traffic through agentgateway that matches a model entry from the catalog. For example steps, try the [LLM getting started]({{< link-hextra path="/quickstart/llm/" >}}).
-
-## Step 5: Monitor catalog lookups
-
-Every cost lookup increments the `agentgateway_cost_catalog_lookups_total` counter, labeled with the lookup `status` and the request's `gen_ai_system` (provider), `gen_ai_request_model`, and `gen_ai_response_model`. Use the lookup to confirm that your catalog prices your traffic.
-
-The `status` label is one of the following values:
+Every cost lookup increments the `agentgateway_cost_catalog_lookups_total` counter. The metric is labeled with lookup `status`, provider, request model, and response model.
| Status | Meaning |
|--------|---------|
@@ -154,18 +150,54 @@ The `status` label is one of the following values:
| `Missing` | The provider or model was not found in the catalog. |
| `NoCatalog` | No catalog is configured. |
-For example, the metrics endpoint at `http://localhost:15020/metrics` shows lines such as the following:
-
-agentgateway_cost_catalog_lookups_total{status="Exact",gen_ai_system="openai",gen_ai_request_model="gpt-4o-mini",...} 1
-agentgateway_cost_catalog_lookups_total{status="Missing",gen_ai_system="openai",gen_ai_request_model="gpt-3.5-turbo",...} 1
-```
-
A rising `Missing` or `Unpriced` count means requests are flowing through models that your catalog does not price. Add the missing providers or models to your catalog and reload.
{{< callout type="info" >}}
In traces, the corresponding cost-resolution `status` attribute uses lowercase values: `exact`, `unpriced`, `missing`, and `noCatalog`.
{{< /callout >}}
+## Enforce budgets
+
+The model catalog provides pricing data for spend visibility. To block or throttle traffic, combine cost visibility with rate limiting or virtual key management.
+
+- Use [Rate limiting]({{< link-hextra path="/configuration/resiliency/rate-limits/" >}}) to cap request or token usage per route, user, or API key.
+- Use [Virtual keys]({{< link-hextra path="/llm/virtual-keys/" >}}) to issue keys with per-key controls and attribution.
+
+## Advanced: Catalog format
+
+Usually, you do not need to write catalog JSON by hand. Use `agctl costs import` or the Admin UI to generate the base catalog, then add overrides only when needed.
+
+{{< reuse "agw-docs/snippets/model-catalog-json-format.md" >}}
+
+The following minimal example prices one OpenAI model and one tiered Gemini model.
+
+```json
+{
+ "providers": {
+ "openai": {
+ "models": {
+ "gpt-4o-mini": {
+ "rates": { "input": "0.15", "output": "0.6", "cacheRead": "0.075" }
+ }
+ }
+ },
+ "gcp.gemini": {
+ "models": {
+ "gemini-2.5-pro": {
+ "rates": { "input": "1.25", "output": "10", "cacheRead": "0.125" },
+ "tiers": [
+ {
+ "contextOver": 200000,
+ "rates": { "input": "2.5", "output": "15", "cacheRead": "0.25" }
+ }
+ ]
+ }
+ }
+ }
+ }
+}
+```
+
{{< doc-test paths="costs" >}}
# Verify that agentgateway loads a catalog from a file source.
cat > /tmp/costs-catalog.json <<'EOF'
diff --git a/content/docs/standalone/main/llm/providers/_index.md b/content/docs/standalone/main/llm/providers/_index.md
index 46f0e4544..c72510262 100644
--- a/content/docs/standalone/main/llm/providers/_index.md
+++ b/content/docs/standalone/main/llm/providers/_index.md
@@ -9,34 +9,15 @@ Learn how to configure agentgateway for a particular LLM {{< gloss "Provider" >}
## First-class providers
-Use the dedicated provider pages when agentgateway already knows the upstream base URL and request format. This list includes Anthropic, OpenAI, and many OpenAI-compatible providers.
+Use the dedicated provider pages when agentgateway already knows the upstream base URL and request format. This list includes Anthropic, OpenAI, and many more!
-## OpenAI-compatible fallback
+## Custom providers
-Use [OpenAI-compatible]({{< link-hextra path="/llm/providers/openai-compatible/" >}}) only for providers that do not have a first-class shortcut, such as Perplexity, vLLM, LM Studio, or another service that exposes the OpenAI API format.
-
-### Override the upstream base URL
-
-When you need a custom upstream endpoint, set `params.baseUrl` on the model instead of older host or path override fields.
-
-```yaml
-# yaml-language-server: $schema=https://agentgateway.dev/schema/config
-
-llm:
- models:
- - name: "*"
- provider: openAI
- auth:
- key:
- value: "$PERPLEXITY_API_KEY"
- params:
- baseUrl: "https://api.perplexity.ai"
- tls: {}
-```
+Use [Custom providers]({{< link-hextra path="/llm/providers/custom/" >}}) only for providers that do not have a first-class shortcut, such as Perplexity, vLLM, LM Studio, or another service that exposes a compatible [API format](../api-types).
## Authentication
-For simplified `llm` configuration, upstream provider authentication is configured per model via `llm.models[].auth`. In routing-based configurations, use `policies.backendAuth` on a route instead.
+For simplified `llm` configuration, upstream provider authentication is configured per model via `llm.models[]` (typically `params.apiKey` for API-key providers, and `auth` for cloud-native flows). In routing-based configurations, use `policies.backendAuth` on a route instead.
### API key
@@ -47,9 +28,8 @@ llm:
models:
- name: "*"
provider: openAI
- auth:
- key:
- value: "$OPENAI_API_KEY"
+ params:
+ apiKey: "$OPENAI_API_KEY"
```
Use `auth.key.location` only when a provider needs the credential somewhere other than its default location. For example, Azure often uses `api-key`:
@@ -58,10 +38,10 @@ Use `auth.key.location` only when a provider needs the credential somewhere othe
llm:
models:
- name: "*"
- provider: azure
+ provider: custom
auth:
key:
- value: "$AZURE_API_KEY"
+ value: "$API_KEY"
location:
header:
name: api-key
@@ -109,4 +89,8 @@ llm:
## Standalone upstream TLS
-Use `llm.models[].tls` to configure TLS when connecting to an upstream provider. You might use this configuration to trust a private CA when using a self-hosted HTTPS endpoint. Common fields include `root` for a trusted CA bundle, `hostname` and `subjectAltNames` for upstream identity checks, `cert` and `key` for client certificates, and `keyExchangeGroups` for TLS negotiation. In agentgateway versions prior to 1.3, this model-level setting was called `backendTLS`.
+Use `llm.models[].tls` to configure advanced TLS when connecting to an upstream provider.
+When using built in providers, default TLS settings are used.
+When using custom a `baseUrl`, the `https://` scheme will automatically use TLS.
+
+However, if you need advanced configurations such as client certificates or customized verification steps, you may set fields such as `root` for a trusted CA bundle, `hostname` and `subjectAltNames` for upstream identity checks, `cert` and `key` for client certificates.
diff --git a/content/docs/standalone/main/llm/providers/anthropic.md b/content/docs/standalone/main/llm/providers/anthropic.md
index 906970632..6b5e38445 100644
--- a/content/docs/standalone/main/llm/providers/anthropic.md
+++ b/content/docs/standalone/main/llm/providers/anthropic.md
@@ -1,6 +1,6 @@
---
title: Anthropic
-weight: 50
+weight: 15
description: Configuration and setup for Anthropic Claude provider
---
@@ -36,7 +36,7 @@ llm:
After running agentgateway with the configuration from the previous section, you can send a request to the `v1/messages` endpoint. Agentgateway automatically adds the `x-api-key` authorization and `anthropic-version` headers to the request. The request is forwarded to the Anthropic API and the response is returned to the client.
-```json
+```sh
curl -X POST http://localhost:4000/v1/messages \
-H "Content-Type: application/json" \
-d '{
@@ -98,7 +98,7 @@ Example response:
}
```
-## Extended thinking and reasoing
+## Extended thinking and reasoning
Extended thinking and reasoning lets Claude reason through complex problems before generating a response. You can opt in to extended thinking and reasoning by adding specific parameters to your request.
@@ -130,7 +130,7 @@ The following values are supported:
The following example request uses adaptive extended thinking. Note that this setting requires the `output_config.effort` field to be set too.
```sh
-curl "localhost:3000/v1/messages" -H content-type:application/json -d '{
+curl "localhost:4000/v1/messages" -H content-type:application/json -d '{
"model": "",
"max_tokens": 1024,
"thinking": {
@@ -181,7 +181,7 @@ Structured outputs constrain the model to respond with a specific JSON schema. Y
Provide the JSON schema definition in the `output_config.format` field.
```sh
-curl "localhost:3000/v1/messages" -H content-type:application/json -d '{
+curl "localhost:4000/v1/messages" -H content-type:application/json -d '{
"model": "",
"max_tokens": 256,
"output_config": {
@@ -234,15 +234,11 @@ Example output:
[Claude Platform on AWS](https://docs.aws.amazon.com/claude-platform/latest/userguide/welcome.html) hosts Anthropic's native Messages API on AWS infrastructure at `aws-external-anthropic.{region}.api.aws`. Because the API is the same Anthropic Messages API, you point the `anthropic` provider at the AWS endpoint and choose either API-key or AWS SigV4 authentication.
-
-{{< callout type="info" >}}
-Before you begin, [install agentgateway with the nightly build]({{< link-hextra path="/quickstart/">}}).
-{{< /callout >}}
-
{{< tabs tabTotal="2" items="API key, AWS SigV4" >}}
{{% tab tabName="API key" %}}
-Store your Anthropic-on-AWS API key in a file and reference it from the provider configuration. Override the upstream host to point at the Claude Platform endpoint.
+Store your Claude Platform on AWS API key in an environment variable or file and reference it from the provider configuration.
+Override the upstream host to point at the Claude Platform endpoint.
```yaml
# yaml-language-server: $schema=https://agentgateway.dev/schema/config
@@ -253,25 +249,20 @@ llm:
provider: anthropic
requestHeaders:
set:
+ # Replace with your workspace ID
anthropic-workspace-id: wrkspc_XXXXX
params:
- awsRegion: us-west-2
- hostOverride: aws-external-anthropic.us-west-2.api.aws:443
- pathPrefix: /v1
- auth:
- key:
- value:
- file: $HOME/.secrets/anthropic-aws
- tls: {}
+ apiKey: $ANTHROPIC_AWS_API_KEY
+ # Replace with your region
+ baseUrl: https://aws-external-anthropic.us-west-2.api.aws/v1
```
-| Setting | Description |
-|---------|-------------|
+| Setting | Description |
+|---------------------------------------------|-------------|
| `requestHeaders.set.anthropic-workspace-id` | The Anthropic workspace ID that scopes the request. Replace `wrkspc_XXXXX` with your workspace ID. |
-| `params.hostOverride` | The Claude Platform endpoint host and port. Use the form `aws-external-anthropic.{region}.api.aws:443`. |
-| `params.pathPrefix` | The Anthropic API path prefix on Claude Platform, set to `/v1`. |
-| `auth.key.value.file` | A path to a file that contains the API key. |
-| `tls: {}` | Enables TLS to the upstream host. Required because Claude Platform is served over HTTPS. |
+| `params.hostOverride` | The Claude Platform endpoint host and port. Use the form `aws-external-anthropic.{region}.api.aws:443`. |
+| `params.pathPrefix` | The Anthropic API path prefix on Claude Platform, set to `/v1`. |
+| `params.apiKey` | API key. |
{{% /tab %}}
{{% tab tabName="AWS SigV4" %}}
diff --git a/content/docs/standalone/main/llm/providers/azure.md b/content/docs/standalone/main/llm/providers/azure.md
index 5d56e4b45..7961ec69d 100644
--- a/content/docs/standalone/main/llm/providers/azure.md
+++ b/content/docs/standalone/main/llm/providers/azure.md
@@ -1,6 +1,6 @@
---
title: Azure
-weight: 60
+weight: 15
description: Configuration and setup for Azure AI services provider
---
@@ -8,7 +8,7 @@ Configure Microsoft Azure AI as an LLM provider in agentgateway.
## Authentication
-Before you can use Azure as an LLM provider, you must authenticate by using one of the standard [Azure authentication methods](https://learn.microsoft.com/en-us/azure/ai-services/authentication). In standalone mode, this authentication is configured via `llm.models[].auth` (for example, `auth.azure.implicit` or `auth.key`). In routing-based configurations, use `policies.backendAuth.azure`.
+Before you can use Azure as an LLM provider, you must authenticate by using one of the standard [Azure authentication methods](https://learn.microsoft.com/en-us/azure/ai-services/authentication). In standalone mode, this authentication is configured with `llm.models[]` fields (for example, `params.apiKey` or `auth.azure`). In routing-based configurations, use `policies.backendAuth.azure`.
## Configuration
@@ -29,9 +29,6 @@ llm:
models:
- name: "*"
provider: azure
- auth:
- azure:
- implicit: {}
params:
azureResourceName: "your-resource-name"
azureResourceType: foundry
@@ -68,9 +65,6 @@ llm:
models:
- name: "gpt-4.1"
provider: azure
- auth:
- azure:
- implicit: {}
params:
azureResourceName: "your-resource-name"
azureResourceType: openAI
@@ -90,7 +84,7 @@ llm:
| `params.azureProjectName` | The Foundry project name. Required for `foundry` type. If omitted, defaults to `azureResourceName`. |
| `params.azureApiVersion` | Optional API version override. Defaults to `v1`. For legacy deployments, use a dated version like `2024-04-01-preview`. |
| `params.model` | The specific Azure model to use. If set, this model is used for all requests. If not set, the request must include the model to use. |
-| `auth` | Authentication for the upstream Azure endpoint. Use `auth.azure` for Entra ID auth, or `auth.key.value` for API key auth. Set `auth.key.location.header.name: api-key` if needed. |
+| `params.apiKey` | The Azure API key for authentication. If unset, implicit Entra ID authentication is used. You can reference environment variables using the `$VAR_NAME` syntax. |
## Advanced configuration
@@ -110,13 +104,6 @@ binds:
- matches:
- path:
pathPrefix: /azure
- policies:
- urlRewrite:
- authority: auto
- backendAuth:
- azure:
- implicit: {}
- backendTLS: {}
backends:
- ai:
name: azure
@@ -147,8 +134,6 @@ binds:
- path:
pathPrefix: /azure
policies:
- urlRewrite:
- authority: auto
backendAuth:
azure:
explicitConfig:
@@ -156,7 +141,6 @@ binds:
tenantId: ""
clientId: ""
clientSecret: ""
- backendTLS: {}
backends:
- ai:
name: azure
@@ -198,7 +182,6 @@ binds:
tenantId: ""
clientId: ""
clientSecret: ""
- backendTLS: {}
```
{{< reuse "agw-docs/snippets/review-configuration.md" >}}
@@ -235,7 +218,6 @@ binds:
azure:
explicitConfig:
managedIdentity: {}
- backendTLS: {}
```
{{< reuse "agw-docs/snippets/review-configuration.md" >}}
@@ -278,7 +260,6 @@ binds:
# OR use objectId or resourceId instead
# objectId: "your-managed-identity-object-id"
# resourceId: "/subscriptions/.../resourceGroups/.../providers/Microsoft.ManagedIdentity/userAssignedIdentities/..."
- backendTLS: {}
```
{{< reuse "agw-docs/snippets/review-configuration.md" >}}
diff --git a/content/docs/standalone/main/llm/providers/baseten.md b/content/docs/standalone/main/llm/providers/baseten.md
index a70a5b000..1258efcf6 100644
--- a/content/docs/standalone/main/llm/providers/baseten.md
+++ b/content/docs/standalone/main/llm/providers/baseten.md
@@ -1,6 +1,6 @@
---
title: Baseten
-weight: 61
+weight: 20
description: Configuration and setup for Baseten LLM provider
---
@@ -19,8 +19,6 @@ llm:
provider: baseten
params:
apiKey: "$BASETEN_API_KEY"
- # Optional. If omitted, agentgateway uses the default:
- # baseUrl: "https://inference.baseten.co/v1"
```
{{< reuse "agw-docs/snippets/review-configuration.md" >}}
@@ -35,7 +33,7 @@ llm:
## Example request
-After running agentgateway with the configuration from the previous section, you can send an OpenAI-compatible request to the `v1/chat/completions` endpoint by replacing `` with your Baseten model or deployment ID:
+After running agentgateway with the configuration from the previous section, you can send an OpenAI-compatible request to the `v1/chat/completions` endpoint by replacing `` with your Baseten model or deployment ID:
```bash
curl -X POST http://localhost:4000/v1/chat/completions \
diff --git a/content/docs/standalone/main/llm/providers/bedrock.md b/content/docs/standalone/main/llm/providers/bedrock.md
index e1116e8fc..1e7ff2ec8 100644
--- a/content/docs/standalone/main/llm/providers/bedrock.md
+++ b/content/docs/standalone/main/llm/providers/bedrock.md
@@ -1,20 +1,21 @@
---
title: Amazon Bedrock
-weight: 40
+weight: 15
description: Configuration and setup for Amazon Bedrock provider
---
Configure Amazon Bedrock as an LLM provider in agentgateway.
{{< callout type="info" >}}
-Agentgateway accepts only OpenAI-formatted requests (such as the `/v1/chat/completions` request body shape) and returns OpenAI-formatted responses, regardless of the route path that you configure. Agentgateway translates between OpenAI and Bedrock formats internally. Bedrock's native [Converse API](https://docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference-call.html) request and response shapes are not supported. Usage fields in responses follow the OpenAI shape (`prompt_tokens`, `completion_tokens`, `total_tokens`), not the Bedrock shape (`inputTokens`, `outputTokens`, `totalTokens`).
+Agentgateway accepts requests in one of the supported [API formats](../api-types) (such as the `/v1/chat/completions` request body shape) and returns responses in that format.
+Agentgateway translates between these formats and Bedrock formats internally using Bedrock's [Converse API](https://docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference-call.html).
+Directly sending `Converse` or `Invoke` request shapes are not directly supported; see [passthrough](#passthrough) for more information if you need these APIs.
{{< /callout >}}
## Authentication
Before you can use Bedrock as an LLM provider, you must authenticate by using the standard [AWS authentication sources](https://docs.aws.amazon.com/sdkref/latest/guide/creds-config-files.html).
-
-The default SigV4 service name for Bedrock is handled automatically, so you do not need to set `auth.aws.serviceName`.
+Agentgateway will automatically detect the local ambient credentials, but these can be explicitly configured with `auth.aws`.
## Configuration
@@ -40,9 +41,107 @@ llm:
| `params.model` | The specific Bedrock model to use. If set, this model is used for all requests. If not set, the request must include the model to use. |
| `params.awsRegion` | The AWS region where the Bedrock model is hosted. |
+## Passthrough
+
+If your applications directly use the AWS `Converse` or `Invoke` APIs, Agentgateway cannot translate these APIs to other providers.
+However, it can pass the request through to Bedrock itself following the [passthrough](../api-types/passthrough) approach.
+
+This can provide telemetry data for these requests.
+
+First, setup passthrough mode:
+
+```yaml
+# yaml-language-server: $schema=https://agentgateway.dev/schema/config
+llm:
+ models:
+ - name: us.anthropic*
+ provider: bedrock
+ params:
+ awsRegion: us-west-2
+ passthrough: detect
+```
+
+Then, you can send native Converse and Invoke requests:
+
+{{< tabs items="Converse,Invoke" >}}
+{{% tab %}}
+
+```python
+import json
+
+import boto3
+
+client = boto3.client(
+ 'bedrock-runtime',
+ region_name='us-west-2',
+ endpoint_url='http://localhost:4000',
+)
+response = client.converse(
+ modelId='us.anthropic.claude-sonnet-4-6',
+ messages=[
+ {
+ 'role': 'user',
+ 'content': [{'text': 'give 1 word answer'}]
+ }
+ ]
+)
+print('converse response:')
+print(response)
+```
+
+{{% /tab %}}
+{{% tab %}}
+
+```python
+import json
+
+import boto3
+
+client = boto3.client(
+ 'bedrock-runtime',
+ region_name='us-west-2',
+ endpoint_url='http://localhost:4000',
+)
+response = client.invoke_model(
+ modelId='us.anthropic.claude-sonnet-4-6',
+ body=json.dumps({
+ 'anthropic_version': 'bedrock-2023-05-31',
+ 'max_tokens': 10,
+ 'messages': [
+ {
+ 'role': 'user',
+ 'content': [{'type': 'text', 'text': 'give 1 word answer'}],
+ }
+ ],
+ }),
+)
+body = json.loads(response['body'].read())
+
+print('invoke response:')
+print(body)
+```
+
+{{% /tab %}}
+{{< /tabs >}}
+
+
+{{< callout type="info" >}}
+Model translations are not supported with passthrough, so avoid using a model match like `aws/*`, as it cannot be transformed.
+{{< /callout >}}
+
+## Claude Platform on AWS
+
+See [here](../anthropic/#use-claude-platform-on-aws) for connect to [Claude Platform on AWS](https://docs.aws.amazon.com/claude-platform/latest/userguide/welcome.html).
+
+## Bedrock Mantle
+
+The [Bedrock Mantle](https://docs.aws.amazon.com/bedrock/latest/userguide/bedrock-mantle.html) endpoint is not currently supported.
+Follow the [GitHub issue](https://github.com/agentgateway/agentgateway/issues/2041) if you are interested!
+
## Token counting
-Bedrock supports token counting for Anthropic models via the `count_tokens` endpoint. Agentgateway automatically handles the required formatting for Bedrock's count-tokens endpoint, including adding the `max_tokens: 1` parameter and Base64 encoding the request body.
+Bedrock supports token counting for Anthropic models via the `count_tokens` endpoint.
+Agentgateway automatically handles the required formatting for Bedrock's count-tokens endpoint.
```bash
curl -X POST http://localhost:4000/v1/messages/count_tokens \
@@ -81,7 +180,7 @@ Use the `reasoning_effort` field to control how much reasoning the model applies
Note that `max_tokens` must be greater than the thinking budget, and the minimum thinking budget is 1,024 tokens.
```sh
-curl "localhost:3000/v1/chat/completions" -H content-type:application/json -d '{
+curl "localhost:4000/v1/chat/completions" -H content-type:application/json -d '{
"model": "",
"max_tokens": 6000,
"reasoning_effort": "high",
@@ -99,7 +198,7 @@ curl "localhost:3000/v1/chat/completions" -H content-type:application/json -d '{
Structured outputs constrain the model to respond with a specific JSON schema. Provide the schema definition in the OpenAI `response_format` field of your request. Agentgateway translates this to Bedrock's native format automatically.
```sh
-curl "localhost:3000/v1/chat/completions" -H content-type:application/json -d '{
+curl "localhost:4000/v1/chat/completions" -H content-type:application/json -d '{
"model": "",
"max_tokens": 256,
"response_format": {
diff --git a/content/docs/standalone/main/llm/providers/cerebras.md b/content/docs/standalone/main/llm/providers/cerebras.md
index aac497e27..da0566da6 100644
--- a/content/docs/standalone/main/llm/providers/cerebras.md
+++ b/content/docs/standalone/main/llm/providers/cerebras.md
@@ -1,6 +1,6 @@
---
title: Cerebras
-weight: 61
+weight: 20
description: Configuration and setup for Cerebras LLM provider
---
@@ -19,8 +19,6 @@ llm:
provider: cerebras
params:
apiKey: "$CEREBRAS_API_KEY"
- # Optional. If omitted, agentgateway uses the default:
- # baseUrl: "https://api.cerebras.ai/v1"
```
{{< reuse "agw-docs/snippets/review-configuration.md" >}}
diff --git a/content/docs/standalone/main/llm/providers/cohere.md b/content/docs/standalone/main/llm/providers/cohere.md
index ea1f641ab..25b91207a 100644
--- a/content/docs/standalone/main/llm/providers/cohere.md
+++ b/content/docs/standalone/main/llm/providers/cohere.md
@@ -1,6 +1,6 @@
---
title: Cohere
-weight: 61
+weight: 20
description: Configuration and setup for Cohere LLM provider
---
@@ -19,8 +19,6 @@ llm:
provider: cohere
params:
apiKey: "$COHERE_API_KEY"
- # Optional. If omitted, agentgateway uses the default:
- # baseUrl: "https://api.cohere.ai"
```
{{< reuse "agw-docs/snippets/review-configuration.md" >}}
diff --git a/content/docs/standalone/main/llm/providers/custom.md b/content/docs/standalone/main/llm/providers/custom.md
new file mode 100644
index 000000000..27c44d982
--- /dev/null
+++ b/content/docs/standalone/main/llm/providers/custom.md
@@ -0,0 +1,74 @@
+---
+title: Custom
+weight: 99
+description: Configure agentgateway for providers without built-in support that implement the OpenAI API format.
+aliases: /llm/providers/openai-compatible
+test:
+ openai-compatible-validate:
+ - file: content/docs/standalone/main/llm/providers/openai-compatible.md
+ path: openai-compat-validate
+---
+
+Use this page for providers that implement the OpenAI API format but do not have a first-class `provider:` support yet. For built-in providers such as [Baseten]({{< link-hextra path="/llm/providers/baseten/" >}}), [Cerebras]({{< link-hextra path="/llm/providers/cerebras/" >}}), [Cohere]({{< link-hextra path="/llm/providers/cohere/" >}}), [DeepInfra]({{< link-hextra path="/llm/providers/deepinfra/" >}}), [DeepSeek]({{< link-hextra path="/llm/providers/deepseek/" >}}), [Fireworks AI]({{< link-hextra path="/llm/providers/fireworks/" >}}), [Groq]({{< link-hextra path="/llm/providers/groq/" >}}), [Hugging Face]({{< link-hextra path="/llm/providers/huggingface/" >}}), [Mistral]({{< link-hextra path="/llm/providers/mistral/" >}}), [OpenRouter]({{< link-hextra path="/llm/providers/openrouter/" >}}), [Together AI]({{< link-hextra path="/llm/providers/togetherai/" >}}), [xAI]({{< link-hextra path="/llm/providers/xai/" >}}), and [Ollama]({{< link-hextra path="/llm/providers/ollama/" >}}), use the dedicated provider pages instead.
+
+{{< callout type="info" >}}
+Many providers provide "OpenAI compatible" or "Anthropic compatible" endpoints.
+While these _can_ be used with `provider: openai`/`provider: anthropic` and a customized `baseUrl`, prefer to use `provider: custom`.
+
+Using a specific vendor's provider may introduce semantics specific to that provider.
+{{< /callout >}}
+
+## Before you begin
+
+{{< reuse "agw-docs/snippets/prereq-agentgateway.md" >}}
+
+You also need the following prerequisites.
+
+- An API key for your chosen provider, unless you are pointing to a local endpoint such as vLLM or LM Studio.
+
+{{< doc-test paths="openai-compat-validate" >}}
+# Install agentgateway binary for testing
+mkdir -p "$HOME/.local/bin"
+export PATH="$HOME/.local/bin:$PATH"
+VERSION="v{{< reuse "agw-docs/versions/n-patch.md" >}}"
+BINARY_URL="https://github.com/agentgateway/agentgateway/releases/download/${VERSION}/agentgateway-$(uname -s | tr '[:upper:]' '[:lower:]')-$(uname -m | sed 's/x86_64/amd64/')"
+curl -sL "$BINARY_URL" -o "$HOME/.local/bin/agentgateway"
+chmod +x "$HOME/.local/bin/agentgateway"
+
+# Set placeholder API keys for validation (--validate-only still resolves env vars)
+export PERPLEXITY_API_KEY="${PERPLEXITY_API_KEY:-test}"
+{{< /doc-test >}}
+
+## Configuring a custom provider
+
+With a custom provider, you provide the API endpoint and a list of formats it supports.
+Agentgateway will automatically handle mapping between the incoming format and the supported formats.
+
+Below shows an example of connecting to [Perplexity](https://www.perplexity.ai/), which exposes an OpenAI-compatible API for search-augmented models and does not currently have a first-class provider.
+
+```yaml {paths="openai-compat-validate"}
+cat > /tmp/test-perplexity.yaml << 'EOF'
+# yaml-language-server: $schema=https://agentgateway.dev/schema/config
+llm:
+ models:
+ - name: "*"
+ provider:
+ custom:
+ formats:
+ # Indicate this provider supports the completions API. With no `path` specified, this defaults to /chat/completions
+ - type: completions
+ # Indicate this provider supports the messages API, on a custom path /messages-api
+ # - type: messages
+ # path: /messages-api
+ # All possible APIs:
+ # - type: embeddings
+ # - type: responses
+ # - type: realtime
+ # - type: anthropicTokenCount
+ # - type: rerank
+ params:
+ apiKey: "$PERPLEXITY_API_KEY"
+ model: llama-3.1-sonar-large-128k-online
+ baseUrl: "https://api.perplexity.ai"
+EOF
+```
diff --git a/content/docs/standalone/main/llm/providers/deepinfra.md b/content/docs/standalone/main/llm/providers/deepinfra.md
index d5fc7137d..71106cb9b 100644
--- a/content/docs/standalone/main/llm/providers/deepinfra.md
+++ b/content/docs/standalone/main/llm/providers/deepinfra.md
@@ -1,6 +1,6 @@
---
title: DeepInfra
-weight: 61
+weight: 20
description: Configuration and setup for DeepInfra LLM provider
---
@@ -19,8 +19,6 @@ llm:
provider: deepinfra
params:
apiKey: "$DEEPINFRA_API_KEY"
- # Optional. If omitted, agentgateway uses the default:
- # baseUrl: "https://api.deepinfra.com/v1/openai"
```
{{< reuse "agw-docs/snippets/review-configuration.md" >}}
diff --git a/content/docs/standalone/main/llm/providers/deepseek.md b/content/docs/standalone/main/llm/providers/deepseek.md
index e2cec7310..1daa94936 100644
--- a/content/docs/standalone/main/llm/providers/deepseek.md
+++ b/content/docs/standalone/main/llm/providers/deepseek.md
@@ -1,6 +1,6 @@
---
title: DeepSeek
-weight: 61
+weight: 20
description: Configuration and setup for DeepSeek LLM provider
---
@@ -19,8 +19,6 @@ llm:
provider: deepseek
params:
apiKey: "$DEEPSEEK_API_KEY"
- # Optional. If omitted, agentgateway uses the default:
- # baseUrl: "https://api.deepseek.com/v1"
```
{{< reuse "agw-docs/snippets/review-configuration.md" >}}
diff --git a/content/docs/standalone/main/llm/providers/fireworks.md b/content/docs/standalone/main/llm/providers/fireworks.md
index 94f47e990..5ed6e1db6 100644
--- a/content/docs/standalone/main/llm/providers/fireworks.md
+++ b/content/docs/standalone/main/llm/providers/fireworks.md
@@ -1,6 +1,6 @@
---
title: Fireworks AI
-weight: 61
+weight: 20
description: Configuration and setup for Fireworks AI LLM provider
---
@@ -19,8 +19,6 @@ llm:
provider: fireworks
params:
apiKey: "$FIREWORKS_API_KEY"
- # Optional. If omitted, agentgateway uses the default:
- # baseUrl: "https://api.fireworks.ai/inference/v1"
```
{{< reuse "agw-docs/snippets/review-configuration.md" >}}
diff --git a/content/docs/standalone/main/llm/providers/gemini.md b/content/docs/standalone/main/llm/providers/gemini.md
index 71cd3926b..b85202463 100644
--- a/content/docs/standalone/main/llm/providers/gemini.md
+++ b/content/docs/standalone/main/llm/providers/gemini.md
@@ -1,6 +1,6 @@
---
title: Gemini
-weight: 30
+weight: 15
description: Configuration and setup for Google Gemini provider
---
@@ -17,9 +17,8 @@ llm:
models:
- name: "*"
provider: gemini
- auth:
- key:
- value: "$GEMINI_API_KEY"
+ params:
+ apiKey: "$GEMINI_API_KEY"
```
{{< reuse "agw-docs/snippets/review-configuration.md" >}}
@@ -29,4 +28,4 @@ llm:
| `name` | The model name to match in incoming requests. When a client sends `"model": ""`, the request is routed to this provider. Use `*` to match any model name. |
| `provider` | The LLM provider, set to `gemini` for Google Gemini models. |
| `params.model` | The specific Gemini model to use. If set, this model is used for all requests. If not set, the request must include the model to use. |
-| `auth.key.value` | The Gemini API key for authentication. You can reference environment variables using the `$VAR_NAME` syntax. |
+| `params.apiKey` | The Gemini API key for authentication. You can reference environment variables using the `$VAR_NAME` syntax. |
diff --git a/content/docs/standalone/main/llm/providers/groq.md b/content/docs/standalone/main/llm/providers/groq.md
index 06ee1d92d..0e238e22f 100644
--- a/content/docs/standalone/main/llm/providers/groq.md
+++ b/content/docs/standalone/main/llm/providers/groq.md
@@ -1,6 +1,6 @@
---
title: Groq
-weight: 61
+weight: 20
description: Configuration and setup for Groq LLM provider
---
@@ -19,8 +19,6 @@ llm:
provider: groq
params:
apiKey: "$GROQ_API_KEY"
- # Optional. If omitted, agentgateway uses the default:
- # baseUrl: "https://api.groq.com/openai/v1"
```
{{< reuse "agw-docs/snippets/review-configuration.md" >}}
diff --git a/content/docs/standalone/main/llm/providers/huggingface.md b/content/docs/standalone/main/llm/providers/huggingface.md
index fffe17733..944b5c575 100644
--- a/content/docs/standalone/main/llm/providers/huggingface.md
+++ b/content/docs/standalone/main/llm/providers/huggingface.md
@@ -1,6 +1,6 @@
---
title: Hugging Face
-weight: 61
+weight: 20
description: Configuration and setup for Hugging Face LLM provider
---
@@ -19,8 +19,6 @@ llm:
provider: huggingface
params:
apiKey: "$HUGGINGFACE_API_KEY"
- # Optional. If omitted, agentgateway uses the default:
- # baseUrl: "https://router.huggingface.co/v1"
```
{{< reuse "agw-docs/snippets/review-configuration.md" >}}
diff --git a/content/docs/standalone/main/llm/providers/mistral.md b/content/docs/standalone/main/llm/providers/mistral.md
index e4dd0d44a..180eb077f 100644
--- a/content/docs/standalone/main/llm/providers/mistral.md
+++ b/content/docs/standalone/main/llm/providers/mistral.md
@@ -1,6 +1,6 @@
---
title: Mistral
-weight: 61
+weight: 20
description: Configuration and setup for Mistral LLM provider
---
@@ -19,8 +19,6 @@ llm:
provider: mistral
params:
apiKey: "$MISTRAL_API_KEY"
- # Optional. If omitted, agentgateway uses the default:
- # baseUrl: "https://api.mistral.ai/v1"
```
{{< reuse "agw-docs/snippets/review-configuration.md" >}}
diff --git a/content/docs/standalone/main/llm/providers/multiple-llms.md b/content/docs/standalone/main/llm/providers/multiple-llms.md
index 940cb12ea..22fc90672 100644
--- a/content/docs/standalone/main/llm/providers/multiple-llms.md
+++ b/content/docs/standalone/main/llm/providers/multiple-llms.md
@@ -1,96 +1,33 @@
---
title: Multiple LLM providers
-weight: 90
+weight: 30
description: Configure load balancing across multiple LLM providers.
---
-Create a group of LLM providers for the same route. agentgateway automatically load balances requests across the providers in the group using the **Power of Two Choices (P2C)** algorithm. This algorithm picks two random providers, scores each one based on health, latency, and pending requests, and routes the request to the higher-scoring provider. All providers in a single group are treated as equally preferred — P2C distributes traffic across healthy providers but does not implement failover.
-
-**Load balancing vs. failover:** The single-group configuration on this page is load balancing, not failover. Failover requires multiple priority groups and a health/eviction policy. When all providers in a priority group are evicted (for example, due to repeated errors or rate limiting), the gateway automatically routes to the next priority group. For a failover example, see the [Kubernetes deployment of agentgateway](https://agentgateway.dev/docs/kubernetes/latest/llm/failover/).
-
-The P2C algorithm provides better performance than simple round-robin, random, or least-connections strategies by adapting in real-time to each provider's health and performance characteristics.
-
-## Reusable providers in simplified LLM mode
-
-For simplified `llm` configuration, you can define named provider defaults once in `llm.providers[]` and reference them from multiple `llm.models[]` entries with `provider.reference`. This is different from the previous group example. Here, the reusable provider acts as a preset, not as a load-balancing pool.
+For simplified `llm` configuration, you can define named provider defaults once in `llm.providers[]` and reference them from multiple `llm.models[]` entries with `provider.reference`.
```yaml
llm:
providers:
- - name: openai-default
- provider: openAI
+ - name: openai-prod
+ provider: openai
params:
apiKey: "$OPENAI_API_KEY"
- - name: openai-backup
- provider: openAI
- params:
- apiKey: "$OPENAI_BACKUP_API_KEY"
models:
- name: fast
provider:
- reference: openai-default
+ reference: openai-prod
params:
model: gpt-4o-mini
- name: smart
provider:
- reference: openai-backup
- params:
- model: gpt-4o
-```
-
-When a model references a named provider with `provider.reference`, provider defaults are reused automatically. Keep shared settings on `llm.providers[]`, and only override `params.model` on the model itself.
-
-```yaml
-llm:
- providers:
- - name: openai-default
- provider: openAI
- params:
- apiKey: "$OPENAI_API_KEY"
-
- models:
- - name: smart
- provider:
- reference: openai-default
+ reference: openai-prod
params:
model: gpt-4o
```
In this example, `smart` inherits the upstream API key from `llm.providers[]` and only changes the model name.
-Named providers can hold shared upstream settings you want to reuse, such as authentication, host overrides, path overrides, or other model defaults. Keep the shared values on `llm.providers[]` and only set per-model differences on `llm.models[]`.
-
-## Configuration
-
-{{< callout type="info" >}}
-Provider groups with load balancing require the traditional `binds/listeners/routes` configuration format. For more information, see the [Routing-based configuration guide]({{< link-hextra path="/llm/configuration-modes/" >}}).
-{{< /callout >}}
-
-{{< reuse "agw-docs/snippets/review-configuration.md" >}} The example sets two providers, OpenAI and Gemini. Each provider can have its own individual settings, such as host and path overrides, API keys, backend TLS, and more.
-
-```yaml
-# yaml-language-server: $schema=https://agentgateway.dev/schema/config
-binds:
-- port: 3000
- listeners:
- - routes:
- - backends:
- - ai:
- groups:
- - providers:
- - name: openai
- provider:
- openAI:
- # Optional; overrides the model in requests
- model: gpt-3.5-turbo
- backendAuth:
- key: "$OPENAI_API_KEY"
- - name: gemini
- provider:
- gemini:
- # Optional; overrides the model in requests
- model: gemini-1.5-flash-latest
- backendAuth:
- key: "$GEMINI_API_KEY"
-```
+Named providers can hold shared upstream settings you want to reuse, such as authentication, host overrides, path overrides, or other model defaults.
+Keep the shared values on `llm.providers[]` and only set per-model differences on `llm.models[]`.
diff --git a/content/docs/standalone/main/llm/providers/ollama.md b/content/docs/standalone/main/llm/providers/ollama.md
index f848c7abd..0c99b4b5f 100644
--- a/content/docs/standalone/main/llm/providers/ollama.md
+++ b/content/docs/standalone/main/llm/providers/ollama.md
@@ -26,7 +26,6 @@ chmod +x "$HOME/.local/bin/agentgateway"
# Write and validate the ollama config from the guide
cat > /tmp/test-ollama-standalone.yaml << 'EOF'
llm:
- port: 3000
models:
- name: "*"
provider: ollama
diff --git a/content/docs/standalone/main/llm/providers/openai-compatible.md b/content/docs/standalone/main/llm/providers/openai-compatible.md
deleted file mode 100644
index 8e60fa933..000000000
--- a/content/docs/standalone/main/llm/providers/openai-compatible.md
+++ /dev/null
@@ -1,141 +0,0 @@
----
-title: OpenAI-compatible providers
-weight: 10
-description: Configure agentgateway for providers without built-in support that implement the OpenAI API format.
-test:
- openai-compatible-validate:
- - file: content/docs/standalone/main/llm/providers/openai-compatible.md
- path: openai-compat-validate
----
-
-Use this page for providers that implement the OpenAI API format but do not have a first-class `provider:` shortcut yet. For built-in providers such as [Baseten]({{< link-hextra path="/llm/providers/baseten/" >}}), [Cerebras]({{< link-hextra path="/llm/providers/cerebras/" >}}), [Cohere]({{< link-hextra path="/llm/providers/cohere/" >}}), [DeepInfra]({{< link-hextra path="/llm/providers/deepinfra/" >}}), [DeepSeek]({{< link-hextra path="/llm/providers/deepseek/" >}}), [Fireworks AI]({{< link-hextra path="/llm/providers/fireworks/" >}}), [Groq]({{< link-hextra path="/llm/providers/groq/" >}}), [Hugging Face]({{< link-hextra path="/llm/providers/huggingface/" >}}), [Mistral]({{< link-hextra path="/llm/providers/mistral/" >}}), [OpenRouter]({{< link-hextra path="/llm/providers/openrouter/" >}}), [Together AI]({{< link-hextra path="/llm/providers/togetherai/" >}}), [xAI]({{< link-hextra path="/llm/providers/xai/" >}}), and [Ollama]({{< link-hextra path="/llm/providers/ollama/" >}}), use the dedicated provider pages instead.
-
-If you need a different upstream endpoint for one of those built-in standalone providers, keep the first-class `provider:` value and set `params.baseUrl` on that provider instead of switching to `provider: openAI`.
-
-In standalone mode, configure upstream authentication per model with `llm.models[].auth` and upstream TLS with `llm.models[].tls`. For an overview of the available auth and TLS options, see [Providers]({{< link-hextra path="/llm/providers/" >}}).
-
-## Before you begin
-
-{{< reuse "agw-docs/snippets/prereq-agentgateway.md" >}}
-
-You also need the following prerequisites.
-
-- An API key for your chosen provider, unless you are pointing to a local endpoint such as vLLM or LM Studio.
-
-{{< doc-test paths="openai-compat-validate" >}}
-# Install agentgateway binary for testing
-mkdir -p "$HOME/.local/bin"
-export PATH="$HOME/.local/bin:$PATH"
-VERSION="v{{< reuse "agw-docs/versions/n-patch.md" >}}"
-BINARY_URL="https://github.com/agentgateway/agentgateway/releases/download/${VERSION}/agentgateway-$(uname -s | tr '[:upper:]' '[:lower:]')-$(uname -m | sed 's/x86_64/amd64/')"
-curl -sL "$BINARY_URL" -o "$HOME/.local/bin/agentgateway"
-chmod +x "$HOME/.local/bin/agentgateway"
-
-# Set placeholder API keys for validation (--validate-only still resolves env vars)
-export PERPLEXITY_API_KEY="${PERPLEXITY_API_KEY:-test}"
-{{< /doc-test >}}
-
-## Managed provider fallback
-
-### Perplexity
-
-[Perplexity](https://www.perplexity.ai/) exposes an OpenAI-compatible API for search-augmented models and does not currently have a first-class standalone provider shortcut.
-
-```yaml {paths="openai-compat-validate"}
-cat > /tmp/test-perplexity.yaml << 'EOF'
-# yaml-language-server: $schema=https://agentgateway.dev/schema/config
-llm:
- port: 3000
- models:
- - name: "*"
- provider: openAI
- auth:
- key:
- value: "$PERPLEXITY_API_KEY"
- params:
- model: llama-3.1-sonar-large-128k-online
- baseUrl: "https://api.perplexity.ai"
- tls: {}
-EOF
-```
-
-{{< doc-test paths="openai-compat-validate" >}}
-agentgateway -f /tmp/test-perplexity.yaml --validate-only
-{{< /doc-test >}}
-
-## Self-hosted OpenAI-compatible endpoints
-
-### vLLM
-
-[vLLM](https://github.com/vllm-project/vllm) is a high-performance model server for self-hosted OpenAI-compatible inference.
-
-```yaml {paths="openai-compat-validate"}
-cat > /tmp/test-vllm.yaml << 'EOF'
-# yaml-language-server: $schema=https://agentgateway.dev/schema/config
-llm:
- port: 3000
- models:
- - name: "*"
- provider: openAI
- params:
- baseUrl: "http://localhost:8000/v1"
-EOF
-```
-
-{{< doc-test paths="openai-compat-validate" >}}
-agentgateway -f /tmp/test-vllm.yaml --validate-only
-{{< /doc-test >}}
-
-If your vLLM server uses HTTPS, set `params.baseUrl` to the HTTPS endpoint and add `tls: {}` to the model configuration. (In agentgateway versions prior to 1.3, this model-level setting was called `backendTLS`.)
-
-### LM Studio
-
-[LM Studio](https://lmstudio.ai/) runs models locally and exposes an OpenAI-compatible API for desktop testing.
-
-```yaml {paths="openai-compat-validate"}
-cat > /tmp/test-lmstudio.yaml << 'EOF'
-# yaml-language-server: $schema=https://agentgateway.dev/schema/config
-llm:
- port: 3000
- models:
- - name: llama-3.2-90b
- provider: openAI
- params:
- baseUrl: "http://localhost:1234/v1"
-EOF
-```
-
-{{< doc-test paths="openai-compat-validate" >}}
-agentgateway -f /tmp/test-lmstudio.yaml --validate-only
-{{< /doc-test >}}
-
-Enable the local server in LM Studio: **Settings** > **Local Server** > **Start Server**.
-
-## Generic configuration
-
-Use the following template for any OpenAI-compatible provider without built-in support:
-
-```yaml
-llm:
- port: 3000
- models:
- - name: "*"
- provider: openAI
- auth:
- key:
- value: "$PROVIDER_API_KEY"
- params:
- model: ""
- baseUrl: "https://provider.example.com/v1"
- tls: {} # only for HTTPS providers
-```
-
-Set `params.baseUrl` to the provider's API root. This can include provider-specific prefixes such as `/v1`, `/openai/v1`, or another base path. If the provider already has a first-class page, use that provider shortcut and its documented default base URL instead.
-
-| Field | Description |
-|-------|-------------|
-| `provider` | Set to `openAI` for OpenAI-compatible providers without a first-class shortcut. |
-| `auth.key.value` | Optional. The API key for the provider. Reference environment variables with the `$VAR_NAME` syntax. Omit for local endpoints that do not require authentication. |
-| `params.model` | Optional. Override the upstream model name. Omit to pass the client-provided model through. |
-| `params.baseUrl` | The provider's API root URL, including scheme and any required base path prefix. |
-| `tls` | Enable TLS for the upstream connection. Required for HTTPS providers, omit for local HTTP providers. (In agentgateway versions prior to 1.3, this model-level setting was called `backendTLS`.) |
diff --git a/content/docs/standalone/main/llm/providers/openai.md b/content/docs/standalone/main/llm/providers/openai.md
index 36c2564c8..25adb8270 100644
--- a/content/docs/standalone/main/llm/providers/openai.md
+++ b/content/docs/standalone/main/llm/providers/openai.md
@@ -17,9 +17,8 @@ llm:
models:
- name: "*"
provider: openAI
- auth:
- key:
- value: "$OPENAI_API_KEY"
+ params:
+ apiKey: "$OPENAI_API_KEY"
```
{{< reuse "agw-docs/snippets/review-configuration.md" >}}
@@ -29,7 +28,7 @@ llm:
| `name` | The model name to match in incoming requests. When a client sends `"model": ""`, the request is routed to this provider. Use `*` to match any model name. |
| `provider` | The LLM provider, set to `openAI` for OpenAI models. |
| `params.model` | The specific OpenAI model to use. If set, this model is used for all requests. If not set, the request must include the model to use. |
-| `auth.key.value` | The OpenAI API key for authentication. You can reference environment variables using the `$VAR_NAME` syntax. |
+| `params.apiKey` | The OpenAI API key for authentication. You can reference environment variables using the `$VAR_NAME` syntax. |
{{< callout type="info" >}}
For advanced routing scenarios that require path-based routing or custom endpoints, use the traditional `binds/listeners/routes` configuration format. See the [Routing-based configuration guide]({{< link-hextra path="/llm/configuration-modes/" >}}) for more information.
diff --git a/content/docs/standalone/main/llm/providers/openrouter.md b/content/docs/standalone/main/llm/providers/openrouter.md
index 55a3efa5f..692793ee2 100644
--- a/content/docs/standalone/main/llm/providers/openrouter.md
+++ b/content/docs/standalone/main/llm/providers/openrouter.md
@@ -1,6 +1,6 @@
---
title: OpenRouter
-weight: 61
+weight: 20
description: Configuration and setup for OpenRouter LLM provider
---
@@ -19,8 +19,6 @@ llm:
provider: openrouter
params:
apiKey: "$OPENROUTER_API_KEY"
- # Optional. If omitted, agentgateway uses the default:
- # baseUrl: "https://openrouter.ai/api/v1"
```
{{< reuse "agw-docs/snippets/review-configuration.md" >}}
diff --git a/content/docs/standalone/main/llm/providers/togetherai.md b/content/docs/standalone/main/llm/providers/togetherai.md
index 052a6a43e..3a9a427ae 100644
--- a/content/docs/standalone/main/llm/providers/togetherai.md
+++ b/content/docs/standalone/main/llm/providers/togetherai.md
@@ -1,6 +1,6 @@
---
title: Together AI
-weight: 61
+weight: 20
description: Configuration and setup for Together AI LLM provider
---
@@ -19,8 +19,6 @@ llm:
provider: togetherai
params:
apiKey: "$TOGETHER_API_KEY"
- # Optional. If omitted, agentgateway uses the default:
- # baseUrl: "https://api.together.xyz/v1"
```
{{< reuse "agw-docs/snippets/review-configuration.md" >}}
diff --git a/content/docs/standalone/main/llm/providers/vertex.md b/content/docs/standalone/main/llm/providers/vertex.md
index 1938b4968..63cb62e5d 100644
--- a/content/docs/standalone/main/llm/providers/vertex.md
+++ b/content/docs/standalone/main/llm/providers/vertex.md
@@ -1,6 +1,6 @@
---
title: Vertex AI
-weight: 20
+weight: 15
description: Configuration and setup for Google Cloud Vertex AI provider
---
@@ -25,9 +25,6 @@ llm:
models:
- name: gemini-2.5-flash
provider: vertex
- auth:
- gcp:
- type: accessToken
params:
model: google/gemini-2.5-flash-lite-preview-06-17
vertexProject: my-project-id
diff --git a/content/docs/standalone/main/llm/providers/xai.md b/content/docs/standalone/main/llm/providers/xai.md
index 41e926f7c..6b4bd2edb 100644
--- a/content/docs/standalone/main/llm/providers/xai.md
+++ b/content/docs/standalone/main/llm/providers/xai.md
@@ -1,6 +1,6 @@
---
title: xAI
-weight: 61
+weight: 20
description: Configuration and setup for xAI (Grok) LLM provider
---
@@ -19,8 +19,6 @@ llm:
provider: xai
params:
apiKey: "$XAI_API_KEY"
- # Optional. If omitted, agentgateway uses the default:
- # baseUrl: "https://api.x.ai/v1"
```
{{< reuse "agw-docs/snippets/review-configuration.md" >}}
diff --git a/content/docs/standalone/main/llm/spending.md b/content/docs/standalone/main/llm/spending.md
index 92c947f52..e69de29bb 100644
--- a/content/docs/standalone/main/llm/spending.md
+++ b/content/docs/standalone/main/llm/spending.md
@@ -1,9 +0,0 @@
----
-title: Control spend
-weight: 50
-description: Control cost with token budgets and spend limits to prevent unexpected bills and LLM misuse.
-aliases:
- - /llm/spending/
----
-
-{{< redirect path="/llm/costs/" >}}
diff --git a/release.md b/release.md
new file mode 100644
index 000000000..faac71ee7
--- /dev/null
+++ b/release.md
@@ -0,0 +1,182 @@
+🎉 Welcome to the 1.3.0 release of the agentgateway project!
+
+This release is a major step forward for LLM, MCP, and agentic traffic. Agentgateway v1.3.0 adds a purpose-built UI, AI cost analysis, virtual models, reusable providers and guardrails, 13 new LLM providers, richer MCP support, and many improvements across traffic policy, TLS, telemetry, and operations.
+
+## Artifacts
+
+**Docker images** are available:
+* `cr.agentgateway.dev/agentgateway:v1.3.0`
+* `cr.agentgateway.dev/controller:v1.3.0`
+
+**Helm charts** are available:
+* `cr.agentgateway.dev/charts/agentgateway:v1.3.0`
+* `cr.agentgateway.dev/charts/agentgateway-crds:v1.3.0`
+
+**Binaries** are available below.
+
+## Quick Start
+
+Follow the [Kubernetes](https://agentgateway.dev/docs/kubernetes/latest/quickstart/) or [Standalone](https://agentgateway.dev/docs/standalone/latest/quickstart/) quick start guide to get started.
+
+## 🔥 Breaking changes
+
+### `agctl` commands reorganized under `proxy` and `controller`
+
+The experimental `agctl` CLI now groups its inspection, tracing, and management commands under the `proxy` and `controller` parent commands, and adds commands for log-level management and version information. Update any scripts or automation that call the previous top-level commands.
+
+Kubernetes examples:
+
+Before:
+
+```sh
+agctl config all gateway/agentgateway-proxy -n agentgateway-system -o yaml
+agctl config backends gateway/agentgateway-proxy -n agentgateway-system
+agctl trace gateway/agentgateway-proxy -n agentgateway-system --port 80 -- http://www.example.com/
+```
+
+Now:
+
+```sh
+agctl proxy config all gateway/agentgateway-proxy -n agentgateway-system -o yaml
+agctl proxy config backends gateway/agentgateway-proxy -n agentgateway-system
+agctl proxy trace gateway/agentgateway-proxy -n agentgateway-system --port 80 -- http://www.example.com/
+```
+
+Standalone examples:
+
+Before:
+
+```sh
+agctl config all --file /tmp/agw-dump.json -o yaml
+agctl trace --local --port 3000 -- http://example.com/headers
+```
+
+Now:
+
+```sh
+agctl proxy config all --file /tmp/agw-dump.json -o yaml
+agctl proxy trace --local --port 3000 -- http://example.com/headers
+```
+
+The reorganization also introduces the following capabilities:
+
+- `agctl version` prints version information for the `agctl` CLI.
+- `agctl proxy log` gets or sets the proxy log level at runtime.
+- `agctl controller log` gets or sets the agentgateway controller log level per component at runtime.
+
+For more information, see the Kubernetes docs for [installing `agctl`](https://agentgateway.dev/docs/kubernetes/main/operations/agctl/), [inspecting agentgateway configuration](https://agentgateway.dev/docs/kubernetes/main/operations/inspect-config/), [tracing requests with `agctl`](https://agentgateway.dev/docs/kubernetes/main/operations/trace-requests/), [debug logs](https://agentgateway.dev/docs/kubernetes/main/operations/debug/#debug-logs), and the [`agctl` CLI reference](https://agentgateway.dev/docs/kubernetes/main/reference/agctl/). For standalone mode, see [installing `agctl`](https://agentgateway.dev/docs/standalone/main/operations/agctl/), [inspecting agentgateway configuration](https://agentgateway.dev/docs/standalone/main/operations/inspect-config/), [tracing requests with `agctl`](https://agentgateway.dev/docs/standalone/main/operations/trace-requests/), and the [`agctl` CLI reference](https://agentgateway.dev/docs/standalone/main/reference/agctl/).
+
+## 🌟 New features
+
+### New UI for LLM, MCP, and traffic management
+
+Agentgateway now includes a rebuilt UI organized around three native views:
+
+- **LLM**: Models, providers, policies, guardrails, costs, virtual API keys, and analytics.
+- **MCP**: Servers, tools, resources, authentication, and MCP policy configuration.
+- **Traffic**: Gateway API traffic configuration and policy management.
+
+The UI includes onboarding for LLM, MCP, and API capabilities, model and provider setup, per-model policies, request and response guardrails, and unified logs for LLM, MCP, and A2A calls. For more information, see the [Kubernetes UI observability docs](https://agentgateway.dev/docs/kubernetes/main/observability/ui/) and the [LLM](https://agentgateway.dev/docs/kubernetes/main/llm/) and [MCP](https://agentgateway.dev/docs/kubernetes/main/mcp/) docs.
+
+### AI cost and token analysis
+
+Agentgateway can now calculate token usage and dollar cost for LLM requests, attribute usage, and surface the data in logs, traces, metrics, `agctl`, and the UI.
+
+Cost and token data can be grouped by model, provider, user, team, and client tool. This makes it possible to analyze spend, export reports, build chargeback workflows, and apply policy decisions such as budgets, alerts, quotas, or cost-sensitive routing at the gateway.
+
+For more information, see [Kubernetes LLM cost tracking](https://agentgateway.dev/docs/kubernetes/main/llm/cost-tracking/), [Standalone LLM spending](https://agentgateway.dev/docs/standalone/main/llm/spending/), and the [`agctl costs` reference](https://agentgateway.dev/docs/kubernetes/main/reference/agctl/agctl-costs/).
+
+### Virtual models
+
+Virtual models let clients send one model name while agentgateway chooses the real backend model at request time. This moves routing policy out of clients and into the gateway.
+
+Supported strategies include:
+
+- **Weighted routing** to split traffic across models for A/B testing, migrations, and cost optimization.
+- **Failover routing** to automatically retry fallback models when a primary model fails or is rate-limited.
+- **Conditional routing** to select models with CEL expressions based on request attributes such as headers, user tier, or prompt shape.
+
+For more information, see [Standalone virtual models](https://agentgateway.dev/docs/standalone/main/llm/virtual-models/), [Kubernetes LLM load balancing](https://agentgateway.dev/docs/kubernetes/main/llm/load-balancing/), [Kubernetes LLM failover](https://agentgateway.dev/docs/kubernetes/main/llm/failover/), and [Kubernetes LLM content routing](https://agentgateway.dev/docs/kubernetes/main/llm/content-routing/).
+
+### Reusable providers and guardrails
+
+Providers and guardrails can now be defined once and referenced across many models. This simplifies large LLM deployments where many incoming model names share provider configuration, credentials, or policy.
+
+Standalone deployments can also declare shared guardrails as top-level resources instead of repeating guardrail configuration on every route. For more information, see [Standalone guardrails](https://agentgateway.dev/docs/standalone/main/llm/prompt-guards/overview/), [Standalone multi-layer guardrails](https://agentgateway.dev/docs/standalone/main/llm/prompt-guards/multi-layer/), and [Kubernetes guardrails](https://agentgateway.dev/docs/kubernetes/main/llm/guardrails/overview/).
+
+### New and improved LLM providers
+
+Agentgateway adds 13 new first-class LLM providers, including Mistral, Hugging Face, and Cohere, along with expanded custom provider support for providers without built-in integrations. For more information, see the [Standalone LLM provider docs](https://agentgateway.dev/docs/standalone/main/llm/providers/) and [Kubernetes LLM provider docs](https://agentgateway.dev/docs/kubernetes/main/llm/providers/).
+
+Additional LLM gateway improvements include:
+
+- Rerank request and response support across providers.
+- Custom LLM providers for InferencePool backends.
+- More precise per-model matching, with exact matches preferred.
+- Streaming guardrails for streaming requests.
+- Webhook guardrail `failureMode` support.
+- Per-model LLM authorization.
+- Local LLM TLS and CORS support.
+- Latency and throughput telemetry attributes on LLM requests.
+- Bedrock detect-passthrough support, Application Inference Profile prompt cache support, Anthropic beta-header allowlists, host override support, URL-encoded model IDs, and reasoning-signature replay.
+- Anthropic system messages and extra-high thinking support.
+
+### MCP improvements
+
+MCP support now includes Okta as a first-class authentication provider, MCP-aware external auth and external processing, resource subscribe and unsubscribe support, improved multiplexing behavior, and broader protocol compliance fixes.
+
+The UI also includes native MCP policy views for access control, traffic shaping, and mutation policies such as authorization, CORS, JWT, rate limiting, transformations, and external processing. For more information, see the [Kubernetes MCP docs](https://agentgateway.dev/docs/kubernetes/main/mcp/), [Standalone MCP docs](https://agentgateway.dev/docs/standalone/main/mcp/), [MCP authentication](https://agentgateway.dev/docs/kubernetes/main/mcp/auth/), and [MCP guardrails](https://agentgateway.dev/docs/kubernetes/main/mcp/guardrails/).
+
+### Request handling and extensibility
+
+Traffic policies can now buffer request bodies before forwarding, giving policies and extensions access to full request bodies before backend selection. For more information, see [Kubernetes body buffering](https://agentgateway.dev/docs/kubernetes/main/traffic-management/buffer/) and [Standalone body buffering](https://agentgateway.dev/docs/standalone/main/configuration/traffic-management/buffer/).
+
+External processing support is also expanded with richer processing-mode configuration, and external processors can return an immediate response from request-body and response-body phases. For more information, see [Kubernetes external processing](https://agentgateway.dev/docs/kubernetes/main/traffic-management/extproc/) and [Standalone external processing](https://agentgateway.dev/docs/standalone/main/configuration/traffic-management/extproc/).
+
+### Authentication and authorization
+
+Authorization can now run in the pre-routing phase, and external-auth cache TTL can be configured as an expression. This release also includes external-authz caching, expanded credential-location expressions, and scheme derivation from `X-Forwarded-Proto`. For more information, see [Kubernetes external auth](https://agentgateway.dev/docs/kubernetes/main/security/extauth/), [Standalone external auth](https://agentgateway.dev/docs/standalone/main/configuration/security/external-authz/), [Standalone HTTP authorization](https://agentgateway.dev/docs/standalone/main/configuration/security/http-authz/), and [Standalone JWT authentication](https://agentgateway.dev/docs/standalone/main/configuration/security/jwt-authn/).
+
+### TLS, networking, and policy
+
+This release adds dynamic SSL certificates for Kubernetes listener TLS, generalized backend TLS and backend references, a new `BackendReferenceGrantMode`, configurable policy inheritance strategy, and composable AI backend policies. For more information, see [Kubernetes TLS encryption](https://agentgateway.dev/docs/kubernetes/main/install/tls/), [Kubernetes backend TLS](https://agentgateway.dev/docs/kubernetes/main/security/backendtls/), and [Standalone backend TLS](https://agentgateway.dev/docs/standalone/main/configuration/security/backend-tls/).
+
+Additional networking and policy improvements include terminating inbound CONNECT, configurable admin interfaces including Unix Domain Sockets, AWS AssumeRole support, custom AWS service names, and mTLS certificate passthrough with CEL.
+
+### CEL and `agctl`
+
+CEL support is expanded with helpers for URL encode/decode, timestamp conversions, bit operations on bytes, raw JWT token access, gRPC response status, expressions in direct responses, and CEL-based retry conditions. For more information, see the [Standalone CEL reference](https://agentgateway.dev/docs/standalone/main/reference/cel/).
+
+The `agctl` CLI now includes proxy and controller log commands, version reporting with mismatch checks, route groups in config output, and evicted-backend visibility.
+
+### Operations and observability
+
+Agentgateway now exposes proxy timing measurements, a config-synchronization metric, request and connection IDs for troubleshooting, and richer distributed traces with JSON mode, body snapshots, effective gateway and route policies, and raw-output file opening. For more information, see [Kubernetes observability](https://agentgateway.dev/docs/kubernetes/main/observability/), [Kubernetes tracing](https://agentgateway.dev/docs/kubernetes/main/observability/tracing/), [Standalone metrics](https://agentgateway.dev/docs/standalone/main/reference/observability/metrics/), and [Standalone traces](https://agentgateway.dev/docs/standalone/main/reference/observability/traces/).
+
+## 🪲 Notable fixes
+
+- Fixed TCP route precedence.
+- Fixed Gateway status handling when no listeners are valid.
+- Fixed route-level OIDC cookie handling.
+- Fixed capacity-weighted load balancing.
+- Fixed backend eviction retries.
+- Fixed streaming-completion capture across Bedrock, Messages, and Responses API paths.
+- Fixed credential-location expression behavior.
+- Fixed scheme handling from `X-Forwarded-Proto`.
+- Improved MCP multiplexing and list behavior.
+- Improved MCP protocol compliance across tools, prompts, and resources.
+
+## Contributors
+
+Thank you to everyone who contributed code, reviews, documentation, bug reports, and CI improvements for this release, including more than twenty first-time contributors.
+
+Special thanks to the contributors who drove many of the changes in this release:
+
+- @howardjohn
+- @stevenctl
+- @keithmattix
+- @danehans
+- @TwilightTechie
+- @filintod
+
+See the full contributor list below.
diff --git a/static/integrations/providers/anthropic.svg b/static/integrations/providers/anthropic.svg
new file mode 100644
index 000000000..5640eaca8
--- /dev/null
+++ b/static/integrations/providers/anthropic.svg
@@ -0,0 +1,3 @@
+
diff --git a/static/integrations/providers/azure.svg b/static/integrations/providers/azure.svg
new file mode 100644
index 000000000..a8a297d96
--- /dev/null
+++ b/static/integrations/providers/azure.svg
@@ -0,0 +1,11 @@
+
diff --git a/static/integrations/providers/baseten.svg b/static/integrations/providers/baseten.svg
new file mode 100644
index 000000000..ffd4fbd8b
--- /dev/null
+++ b/static/integrations/providers/baseten.svg
@@ -0,0 +1,3 @@
+
diff --git a/static/integrations/providers/bedrock.svg b/static/integrations/providers/bedrock.svg
new file mode 100644
index 000000000..eef5460cc
--- /dev/null
+++ b/static/integrations/providers/bedrock.svg
@@ -0,0 +1,10 @@
+
diff --git a/static/integrations/providers/cerebras.svg b/static/integrations/providers/cerebras.svg
new file mode 100644
index 000000000..ac57eb2df
--- /dev/null
+++ b/static/integrations/providers/cerebras.svg
@@ -0,0 +1,4 @@
+
diff --git a/static/integrations/providers/cohere.svg b/static/integrations/providers/cohere.svg
new file mode 100644
index 000000000..20351993c
--- /dev/null
+++ b/static/integrations/providers/cohere.svg
@@ -0,0 +1,5 @@
+
diff --git a/static/integrations/providers/copilot.svg b/static/integrations/providers/copilot.svg
new file mode 100644
index 000000000..d29181a9a
--- /dev/null
+++ b/static/integrations/providers/copilot.svg
@@ -0,0 +1,3 @@
+
diff --git a/static/integrations/providers/deepinfra.svg b/static/integrations/providers/deepinfra.svg
new file mode 100644
index 000000000..096ea0aea
--- /dev/null
+++ b/static/integrations/providers/deepinfra.svg
@@ -0,0 +1,4 @@
+
diff --git a/static/integrations/providers/deepseek.svg b/static/integrations/providers/deepseek.svg
new file mode 100644
index 000000000..5f7cdcfa8
--- /dev/null
+++ b/static/integrations/providers/deepseek.svg
@@ -0,0 +1,3 @@
+
diff --git a/static/integrations/providers/fireworks.svg b/static/integrations/providers/fireworks.svg
new file mode 100644
index 000000000..11ed92373
--- /dev/null
+++ b/static/integrations/providers/fireworks.svg
@@ -0,0 +1,3 @@
+
diff --git a/static/integrations/providers/gemini.svg b/static/integrations/providers/gemini.svg
new file mode 100644
index 000000000..b1235b4a5
--- /dev/null
+++ b/static/integrations/providers/gemini.svg
@@ -0,0 +1,11 @@
+
diff --git a/static/integrations/providers/googlecloud.svg b/static/integrations/providers/googlecloud.svg
new file mode 100644
index 000000000..80def4a19
--- /dev/null
+++ b/static/integrations/providers/googlecloud.svg
@@ -0,0 +1 @@
+
\ No newline at end of file
diff --git a/static/integrations/providers/groq.svg b/static/integrations/providers/groq.svg
new file mode 100644
index 000000000..ff1105567
--- /dev/null
+++ b/static/integrations/providers/groq.svg
@@ -0,0 +1,3 @@
+
diff --git a/static/integrations/providers/huggingface.svg b/static/integrations/providers/huggingface.svg
new file mode 100644
index 000000000..ae94384ff
--- /dev/null
+++ b/static/integrations/providers/huggingface.svg
@@ -0,0 +1,8 @@
+
diff --git a/static/integrations/providers/mistral.svg b/static/integrations/providers/mistral.svg
new file mode 100644
index 000000000..0183909ba
--- /dev/null
+++ b/static/integrations/providers/mistral.svg
@@ -0,0 +1,7 @@
+
diff --git a/static/integrations/providers/ollama.svg b/static/integrations/providers/ollama.svg
new file mode 100644
index 000000000..2635b08db
--- /dev/null
+++ b/static/integrations/providers/ollama.svg
@@ -0,0 +1,8 @@
+
diff --git a/static/integrations/providers/openai.svg b/static/integrations/providers/openai.svg
new file mode 100644
index 000000000..308ed8ab7
--- /dev/null
+++ b/static/integrations/providers/openai.svg
@@ -0,0 +1 @@
+
diff --git a/static/integrations/providers/openrouter.svg b/static/integrations/providers/openrouter.svg
new file mode 100644
index 000000000..7e8abc81d
--- /dev/null
+++ b/static/integrations/providers/openrouter.svg
@@ -0,0 +1,8 @@
+
diff --git a/static/integrations/providers/togetherai.svg b/static/integrations/providers/togetherai.svg
new file mode 100644
index 000000000..68413386c
--- /dev/null
+++ b/static/integrations/providers/togetherai.svg
@@ -0,0 +1,4 @@
+
diff --git a/static/integrations/providers/vertex.svg b/static/integrations/providers/vertex.svg
new file mode 100644
index 000000000..4fc47470b
--- /dev/null
+++ b/static/integrations/providers/vertex.svg
@@ -0,0 +1,10 @@
+
diff --git a/static/integrations/providers/xai.svg b/static/integrations/providers/xai.svg
new file mode 100644
index 000000000..ccd22443c
--- /dev/null
+++ b/static/integrations/providers/xai.svg
@@ -0,0 +1,3 @@
+