From 1b1dbdeadb332d8b9daad29dd14412b140e84c25 Mon Sep 17 00:00:00 2001
From: Jim Bennett <jimbobbennett@mac.com>
Date: Thu, 18 Jun 2026 15:46:34 -0700
Subject: [PATCH 1/3] feat: add check-models skill + PR-diff-scoped CI gate

Skill + scanner that find outdated OpenAI / Anthropic / Google (Gemini)
model references and migrate them to current size-tier equivalents. The
CI gate scans only the lines a PR changes (paths: python typescript), so
it flags newly introduced outdated models and fails the check, without
blocking on pre-existing references. Notebook/code migrations land
separately as themed PRs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 .agents/skills/check-models/SKILL.md          | 149 ++++++++++++++++++
 .../skills/check-models/scripts/models.json   | 129 +++++++++++++++
 .../check-models/scripts/scan-models.mjs      | Bin 0 -> 19646 bytes
 .github/workflows/check-models.yml            |  97 ++++++++++++
 4 files changed, 375 insertions(+)
 create mode 100644 .agents/skills/check-models/SKILL.md
 create mode 100644 .agents/skills/check-models/scripts/models.json
 create mode 100644 .agents/skills/check-models/scripts/scan-models.mjs
 create mode 100644 .github/workflows/check-models.yml

diff --git a/.agents/skills/check-models/SKILL.md b/.agents/skills/check-models/SKILL.md
new file mode 100644
index 0000000..1c1e0d5
--- /dev/null
+++ b/.agents/skills/check-models/SKILL.md
@@ -0,0 +1,149 @@
+---
+name: check-models
+description: Find and update out-of-date OpenAI and Anthropic model references (in docs, MDX, notebooks, and code) to the latest size-equivalent models, and apply the code changes each new generation requires (e.g. max_tokens → max_completion_tokens for GPT-5). Use when asked to "check the models", "update model versions", "are these models current", "migrate the models in the docs/tutorials", or before publishing content that names a model.
+---
+
+# check-models
+
+Keep model references current. OpenAI and Anthropic ship new generations often, and docs/tutorials drift: a notebook pinned to `gpt-4o-mini` or `claude-3-5-sonnet` is teaching a model that's a generation (or three) behind, sometimes with code that no longer runs (GPT-5 rejects `max_tokens`).
+
+This skill does three things:
+
+1. **Look up** the latest models (don't trust memory — models change).
+2. **Migrate** every reference to the latest model **of the same size tier** (a `*-mini` becomes the latest mini, never the flagship).
+3. **Fix the code** each new generation requires (parameter renames/removals).
+
+## The golden rule: match the size tier
+
+Never "upgrade" a small model to a big one. `gpt-4o-mini` is a small/cheap model — its modern equivalent is the latest **mini**, not the flagship. Migrating it to `gpt-5.5` silently multiplies a reader's cost. Map within the tier:
+
+| Old tier | OpenAI → | Anthropic → |
+|---|---|---|
+| flagship / full | latest flagship | latest Opus |
+| pro | latest pro | latest Opus |
+| mini / small / cheap | latest **mini** | latest Haiku |
+| nano | latest **nano** | (smallest tier) latest Haiku |
+
+> The latest mini/nano are **not** always in the newest generation. As of the policy date below, the flagship is GPT-5.5 but there is no GPT-5.5-mini — so `gpt-4o-mini` → `gpt-5.4-mini`, **not** `gpt-5.5-mini`. Always confirm which generation actually has a mini/nano variant.
+
+## Current models (policy)
+
+The authoritative values live in [`models.json`](./models.json) — the scanner and the GitHub Action both read it. As of `updated: 2026-06-17`:
+
+**OpenAI (reasoning — default targets)** — flagship `gpt-5.5`, pro `gpt-5.5-pro`, mini `gpt-5.4-mini`, nano `gpt-5.4-nano`.
+**OpenAI (non-reasoning — for temperature-dependent code)** — flagship `gpt-4.1`, mini `gpt-4.1-mini`, nano `gpt-4.1-nano`. Active, **not** deprecated, and they **accept `temperature`/`top_p`**.
+**Anthropic** — Opus `claude-opus-4-8`, Sonnet `claude-sonnet-4-6`, Haiku `claude-haiku-4-5`.
+**Google Gemini** — Pro `gemini-2.5-pro`, Flash `gemini-3.5-flash`, Flash-Lite `gemini-3.1-flash-lite`. The latest **stable** model per tier sits on *different* generations (like OpenAI), so the policy tracks Gemini with an explicit `google.current` allow-list and a `google.deprecated` → replacement map (in `models.json`), not a single version floor. Match the SIZE tier (pro/flash/flash-lite). `gemini-2.0-*` (shut down 2026-06-01), `gemini-1.5-*`, and `gemini-1.0-*` are deprecated; `gemini-2.5-*` and `gemini-3.x` are still current and are left alone.
+
+> ⚠️ **Claude Fable 5 / Mythos 5** were export-control-suspended on 2026-06-12 — they are NOT valid migration targets. ⚠️ GPT-5.5 has **no** mini/nano variant.
+
+**Specialised (non-chat) OpenAI models** — realtime, audio, image, transcribe, tts, embeddings, and moderation models don't map to the flagship/mini tiers, so they're tracked separately in `models.json` → `specialized`:
+- `specialized.current` (e.g. `gpt-realtime-1.5`, `gpt-realtime-2`, `gpt-audio-1.5`, `gpt-image-2`, `gpt-4o-transcribe`, `gpt-4o-mini-tts`, `text-embedding-3-*`, `omni-moderation`) are **valid** — the scanner leaves them alone.
+- `specialized.deprecated` maps a retired ID to its current replacement (e.g. `gpt-4o-realtime-preview` → `gpt-realtime-1.5` (shut down 2026-05-07), `gpt-4o-audio-preview` → `gpt-audio-1.5`, `gpt-4o-search-preview` → `gpt-5.4-mini`) — flagged `warn` with the target, since the realtime/audio API surface differs and the swap needs a human eye.
+
+These are checked **before** `review.openaiPatterns`, which stays the fallback for any unrecognised variant. When a new specialised model ships (or one is retired), update these two lists.
+
+### Reasoning vs non-reasoning — which OpenAI target?
+
+The GPT-5 family are **reasoning** models: they reject `temperature`, `top_p`, and the other sampling params. So the target depends on whether the call relies on those:
+
+- **General code** (no meaningful `temperature`, or `temperature` was incidental) → migrate to the **GPT-5** tier above (the default). This is what most code wants.
+- **Temperature-dependent code** → migrate to the **non-reasoning `gpt-4.1` tier** and **keep `temperature`**. This is the right call for:
+  - **LLM-as-judge / evaluators** that set `temperature=0` for reproducible scores (e.g. Phoenix `OpenAIModel(...)`, `LLM(...)`, eval classifiers),
+  - **deterministic extraction / classification** pinned at `temperature=0`.
+  Match the size tier: a `gpt-4o-mini` eval judge → `gpt-4.1-mini` (keep `temperature=0`), not `gpt-5.4-mini`.
+
+`gpt-4.1` / `gpt-4.1-mini` / `gpt-4.1-nano` are treated as **current** by the scanner (not flagged) — they're a legitimate, non-deprecated choice. `gpt-4o` / `gpt-4o-mini` are also non-reasoning + active but older; migrate them to the GPT-5 tier by default, or to `gpt-4.1*` when temperature matters.
+
+## Workflow
+
+### 1. Verify the latest models are still current
+
+`models.json` is a cache — refresh it before a big migration. **The scanner can't browse, so the lookup is yours to run; `--refresh` gives you the exact checklist:**
+
+```bash
+node .agents/skills/check-models/scripts/scan-models.mjs --refresh
+```
+
+It prints the current policy, the WebSearch queries, the authoritative source URLs, and which keys to edit. Then:
+
+- WebSearch `"latest OpenAI models"` and `"latest Anthropic Claude models"` (use the current month); cross-check Anthropic against the **`claude-api`** skill's `shared/models.md` (canonical Claude IDs + the Claude-side code changes).
+- Confirm each tier's latest model **and that the tier still exists** (a new flagship may ship with no `-mini` yet — keep mini/nano on the older generation). Skip any suspended/withdrawn model.
+- If anything changed, **propose the `models.json` diff and confirm before writing** (bump `flagship`/`mini`/`opus`/… and `updated`). That one edit updates the scanner, the skill, and the CI gate together.
+
+**When to refresh:** the scanner tells you. Every run prints a `⚠ Model policy may be out of date…` hint (and sets `stale.suggestRefresh` in `--json`) when either the policy is older than ~45 days **or** the content references a model newer than the policy knows (e.g. a `gpt-5.6` appears while the policy flagship is `gpt-5.5`). Newer-than-policy models are left untouched — never downgraded — so refresh first, then re-scan.
+
+### 2. Scan
+
+```bash
+node .agents/skills/check-models/scripts/scan-models.mjs python typescript   # scan paths
+node .agents/skills/check-models/scripts/scan-models.mjs --json > out.json   # machine-readable
+```
+
+Each finding is one of:
+- `✗ error` — an outdated lowercase canonical model ID (e.g. `gpt-4o-mini`). Migrate it.
+- `⚠ review` / `⚠ replace` (prose) — a model named in prose (`GPT-4o`) or a specialised variant (`*-codex`, `*-chat-latest`, `gpt-4v`). Use judgement (see below).
+- `⚠ param` — a GPT-5/o-series code change to apply (`max_tokens`, `temperature`, …).
+
+### 3. Migrate the model IDs
+
+For each finding, replace with the scanner's suggested target, **preserving local style**:
+- Keep the original separator/case for prose: `GPT-4o` → `GPT-5.5` (not `gpt-5.5`); `claude-sonnet-4.5` (dotted) → `claude-sonnet-4.6`.
+- Drop stale date snapshots: `gpt-5-2025-08-07` → `gpt-5.5` (use the bare alias, don't invent a date).
+- Keep both halves of any SDK-version tab in sync (v7/v8 examples).
+
+#### Platform-specific IDs (Bedrock, Databricks, OpenRouter, LiteLLM)
+
+The scanner matches the embedded `claude-*` / `gpt-*` substring inside a platform-wrapped ID and flags it. Bump the **version** but **keep the host platform's ID format** — these are not bare first-party IDs:
+
+- **Amazon Bedrock** — IDs look like `[region.]anthropic.claude-<...>[-vN:0]`. Claude 4.x (Opus 4.x, Sonnet 4.5+, Haiku 4.5) require a **cross-region inference profile**, so they take a `us.` / `eu.` / `apac.` prefix and **drop** the on-demand `-vN:0` suffix the Claude 3 IDs used. e.g. `anthropic.claude-3-haiku-20240307-v1:0` → `us.anthropic.claude-haiku-4-5`. Match the region prefix to the doc's endpoint (the repo's Bedrock docs default to `us.`).
+- **Databricks** — Foundation Model endpoint names look like `databricks-claude-sonnet-4-6`. Databricks owns these names and **availability is workspace/region-dependent** — bump the version following their pattern, but verify the endpoint actually exists rather than assuming it.
+- **OpenRouter / LiteLLM** — provider-prefixed: `anthropic/claude-sonnet-4-6`, `openai/gpt-5.4-mini`. Keep the `provider/` prefix; bump only the model half.
+
+When a migration changes more than the version number (a Bedrock region prefix, a Databricks endpoint name), **call it out for the reviewer** — it may need adjusting for their region/workspace.
+
+### 4. Apply the code changes
+
+**These changes apply to the _raw OpenAI SDK_ only** (`client.chat.completions.create(...)`, `openai.OpenAI()...`). When migrating such a call **to GPT-5 or o-series** (reasoning models), in the same example:
+- Rename `max_tokens` → `max_completion_tokens`.
+- Remove `temperature` (unless it's the default `1`), `top_p`, `presence_penalty`, `frequency_penalty`, `logprobs`, `top_logprobs`, `logit_bias` — reasoning models reject them. Steer with `reasoning_effort` (`low`/`medium`/`high`) and `verbosity` instead.
+
+**Wrapper libraries need a split rule** — `phoenix.evals.OpenAIModel(...)`, `langchain_openai.ChatOpenAI(...)`, `litellm.completion(...)`, etc. expose their own kwargs:
+- **`max_tokens`: keep it.** The wrapper owns this kwarg and maps it to `max_completion_tokens` internally; renaming it passes an unknown constructor arg and breaks the call.
+- **non-default `temperature` / `top_p`:** a GPT-5/o-series reasoning model rejects any `temperature` ≠ 1 with a 400 — wrapper or not. Two correct outcomes:
+  - if the value **matters** (eval judge / deterministic call) → re-target to the non-reasoning **`gpt-4.1`** tier and **keep** the `temperature` (see "Reasoning vs non-reasoning" above);
+  - if it was **incidental** → drop the explicit value and stay on the GPT-5 model.
+- Otherwise migrate the **model ID only**.
+
+> **`phoenix.evals` gotcha:** the legacy model classes (`OpenAIModel`, `LiteLLMModel`, `AnthropicModel`) accept `temperature` in their constructor (valid). The newer `LLM(provider=…, model=…)` does **not** — its `**kwargs` are forwarded to the *SDK client constructor* (for `api_key`/`base_url`), so `LLM(provider="openai", model="gpt-4.1", temperature=0)` raises `TypeError` at construction. Set sampling params on the **evaluator** instead: `ClassificationEvaluator(name=…, llm=…, prompt_template=…, choices=…, temperature=0)` (the `create_classifier(...)` helper takes no `temperature`).
+
+(The scanner flags any `max_tokens`/`temperature` token as a `⚠ param` — it can't tell a raw call from a wrapper or know the value, so this is the judgement call.)
+
+**Watch token caps on reasoning models.** GPT-5/o-series count *reasoning* tokens against `max_completion_tokens`, so a tiny cap that worked on gpt-4o (e.g. `max_tokens=20` for a short answer) can return empty/truncated output. When migrating such a call, raise the cap to a safe value (≥256) or set `reasoning_effort: "minimal"`.
+
+When migrating **to Claude Opus 4.8/4.7 or Sonnet 4.6**: `budget_tokens` and `temperature`/`top_p`/`top_k` are removed — use `thinking: {type: "adaptive"}` + `output_config.effort`. Defer to the **`claude-api`** skill (`/claude-api migrate`) for the full Claude code-change checklist; don't hand-edit Claude SDK calls from memory.
+
+### 5. Skip what shouldn't change
+
+Do **not** rewrite:
+- **Autogenerated code** — `api-clients/` is generated SDK reference; never hand-edit it (it's in `excludePaths`, so the scanner skips it). Fix the model references at their generator/source instead.
+- **Historical / release-notes / changelog / migration-guide** content — it documents what *was* true ("v1.2 added gpt-4o support"). These paths are in `excludePaths`; respect the same rule for any historical prose the scanner happens to catch.
+- **Non-model tokens** that share a prefix: `gpt-oss-*` (open-weight models, version-pinned), `claude-code`, `claude-agent-sdk`, web-crawler user-agents (`claude-web`, `claude-user`, `claude-searchbot`), image filenames. These are in the `ignore` list and won't be flagged.
+- **Comparative prose** that names an old model on purpose ("unlike GPT-4, GPT-5.5 can…"). Leave the historical reference; only update where the doc is telling the reader which model to *use*.
+- **Markdown image alt text** — `![…model gpt-4o-mini…](screenshot.png)` describes what a screenshot *shows*. Editing the alt text alone would make it misdescribe the image (the pixels still show the old model), so the scanner skips model names inside `![ … ]`. Update these by **regenerating the screenshot**, not by editing the alt text.
+
+To suppress a single line the scanner shouldn't touch, add a `check-models:ignore` comment on it.
+
+### 6. Verify
+
+Re-run the scanner — `✗ error` count should be 0. Remaining `⚠` are prose/variants you've consciously reviewed.
+
+Also guard against a **stale date carried onto a new alias**: when migrating a dated ID, drop the date (`claude-sonnet-4-5-20250929` → `claude-sonnet-4-6`, **not** `claude-sonnet-4-6-20250929` — that snapshot doesn't exist). The scanner can't catch this (it date-strips before classifying, so a wrong-dated current alias looks current), so grep for it:
+
+```bash
+grep -rnE 'claude-(opus-4-8|sonnet-4-6)-20[0-9]' python typescript   # these aliases have no dated form
+```
+
+## Updating for a new model launch
+
+The whole skill is driven by `models.json`. When a new model ships: WebSearch to confirm the IDs and which tiers exist, edit the relevant value(s) in `models.json` plus `updated`, re-run the scanner. No code changes to the scanner are normally needed — it derives the full old→new table from the policy numbers.
diff --git a/.agents/skills/check-models/scripts/models.json b/.agents/skills/check-models/scripts/models.json
new file mode 100644
index 0000000..988d0a0
--- /dev/null
+++ b/.agents/skills/check-models/scripts/models.json
@@ -0,0 +1,129 @@
+{
+  "_comment": "Single source of truth for the check-models skill. Update the numbers in `openai` and `anthropic` whenever a new model ships, then re-run the scanner. The scanner derives the full replacement table from these policy values — you should not need to enumerate every old ID.",
+  "updated": "2026-06-18",
+  "verifiedBy": "WebFetch developers.openai.com/api/docs/models + /deprecations (2026-06-18); Google via ai.google.dev/gemini-api/docs/models (2026-06-18); Anthropic via claude-api skill shared/models.md",
+
+  "openai": {
+    "flagship": "gpt-5.5",
+    "pro": "gpt-5.5-pro",
+    "mini": "gpt-5.4-mini",
+    "nano": "gpt-5.4-nano",
+    "flagshipMinVersion": 5.4,
+    "proMinVersion": 5.5,
+    "miniMinVersion": 5.4,
+    "nanoMinVersion": 5.4,
+    "notes": "GPT-5.5 is the latest flagship (released 2026-04-23, API id gpt-5.5-2026-04-23) and has no mini/nano variant — mini/nano stay on the 5.4 generation. GPT-5.4 also remains available as a cheaper flagship (still current, NOT deprecated per OpenAI's models page), so flagshipMinVersion is 5.4 — only gpt-5.3 and older flag for migration; gpt-5.4 and gpt-5.5 both pass. Migration target for outdated flagships is still gpt-5.5 (the `flagship` value). Match the SIZE tier: a *-mini model migrates to the latest mini, never the flagship. The GPT-5 family are REASONING models — they reject temperature/top_p (see codeChanges). For temperature-dependent code use the nonReasoning tier below.",
+    "nonReasoning": {
+      "flagship": "gpt-4.1",
+      "mini": "gpt-4.1-mini",
+      "nano": "gpt-4.1-nano",
+      "note": "Latest NON-reasoning OpenAI models — active (NOT deprecated, per https://developers.openai.com/api/docs/deprecations checked 2026-06-17) and they ACCEPT temperature/top_p. Use these (not GPT-5) for code that needs temperature control — LLM-as-judge evaluators, deterministic extraction/classification at temperature 0. gpt-4o / gpt-4o-mini are also non-reasoning + active but older; gpt-4.1 supersedes them.",
+      "treatAsCurrent": true
+    },
+    "deprecatedOpenAI": ["gpt-3.5-turbo (snapshots; shutdown 2026-10-23)", "o1", "o1-mini", "o1-preview", "o3-mini", "o4-mini", "gpt-5 / gpt-5-mini legacy snapshots (shutdown 2026-12-11)", "gpt-5.2-chat-latest (2026-08-10)"]
+  },
+
+  "anthropic": {
+    "opus": "claude-opus-4-8",
+    "sonnet": "claude-sonnet-4-6",
+    "haiku": "claude-haiku-4-5",
+    "notes": "Opus 4.8 / Sonnet 4.6 / Haiku 4.5 are the current GA tiers. Claude Fable 5 and Claude Mythos 5 exist but were export-control-suspended on 2026-06-12 — NEVER migrate to them. Match the SIZE tier (opus/sonnet/haiku)."
+  },
+
+  "google": {
+    "_comment": "Gemini tiers span generations (the latest STABLE per tier is on different generations, like OpenAI). `current` IDs are valid (not flagged). `deprecated` IDs map to their current-tier replacement. Verified against https://ai.google.dev/gemini-api/docs/models + /changelog (2026-06-18). Match the SIZE tier: pro/flash/flash-lite.",
+    "tiers": { "pro": "gemini-2.5-pro", "flash": "gemini-3.5-flash", "flash-lite": "gemini-3.1-flash-lite" },
+    "current": [
+      "gemini-2.5-pro", "gemini-3.1-pro", "gemini-3.5-pro",
+      "gemini-3.5-flash", "gemini-3-flash", "gemini-2.5-flash",
+      "gemini-3.1-flash-lite", "gemini-2.5-flash-lite"
+    ],
+    "deprecated": {
+      "gemini-2.0-flash": "gemini-3.5-flash",
+      "gemini-2.0-flash-lite": "gemini-3.1-flash-lite",
+      "gemini-1.5-pro": "gemini-2.5-pro",
+      "gemini-1.5-flash": "gemini-3.5-flash",
+      "gemini-1.5-flash-8b": "gemini-3.1-flash-lite",
+      "gemini-1.0-pro": "gemini-2.5-pro",
+      "gemini-pro": "gemini-2.5-pro",
+      "gemini-pro-vision": "gemini-2.5-pro"
+    }
+  },
+
+  "codeChanges": {
+    "_comment": "Reasoning-model (GPT-5 / o-series) parameter migrations. The scanner flags these; the skill applies them.",
+    "renames": [
+      { "from": "max_tokens", "to": "max_completion_tokens", "scope": "openai-reasoning", "note": "GPT-5 and o-series reject max_tokens in Chat Completions; use max_completion_tokens." }
+    ],
+    "removeForReasoning": [
+      "temperature", "top_p", "presence_penalty", "frequency_penalty",
+      "logprobs", "top_logprobs", "logit_bias"
+    ],
+    "removeNote": "GPT-5 / o-series reasoning models reject these sampling params (temperature must be the default 1 if sent). Remove them; steer with `reasoning_effort` (low|medium|high) and `verbosity` (low|medium|high) instead.",
+    "anthropicNote": "Anthropic Opus 4.8/4.7 reject temperature/top_p/top_k and budget_tokens (use thinking:{type:'adaptive'} + effort). See the claude-api skill `shared/model-migration.md` for full Claude code changes."
+  },
+
+  "specialized": {
+    "_comment": "Non-chat OpenAI families (realtime / audio / image / transcribe / tts / embeddings / moderation) that don't map to the flagship/mini tiers. `current` IDs are treated as valid (not flagged); `deprecated` IDs map to their current replacement. Checked before review.openaiPatterns, which stays as the fallback for unrecognised variants. Verified against https://developers.openai.com/api/docs/models and /deprecations (2026-06-18).",
+    "current": [
+      "gpt-realtime", "gpt-realtime-1.5", "gpt-realtime-2", "gpt-realtime-mini",
+      "gpt-realtime-translate", "gpt-realtime-whisper",
+      "gpt-audio", "gpt-audio-1.5", "gpt-audio-mini",
+      "gpt-image-2",
+      "gpt-4o-transcribe", "gpt-4o-mini-transcribe",
+      "gpt-4o-mini-tts",
+      "text-embedding-3-small", "text-embedding-3-large",
+      "omni-moderation", "omni-moderation-latest"
+    ],
+    "deprecated": {
+      "gpt-4o-realtime-preview": "gpt-realtime-1.5",
+      "gpt-4o-mini-realtime-preview": "gpt-realtime-mini",
+      "gpt-4o-audio-preview": "gpt-audio-1.5",
+      "gpt-4o-mini-audio-preview": "gpt-audio-mini",
+      "gpt-4o-search-preview": "gpt-5.4-mini",
+      "gpt-4o-mini-search-preview": "gpt-5.4-mini"
+    }
+  },
+
+  "review": {
+    "_comment": "Tokens that look like model IDs but need a human decision — the scanner flags them (severity: warn) and never auto-rewrites them. specialized.current/deprecated (above) take precedence for known non-chat models.",
+    "openaiPatterns": [
+      "codex", "chat-latest", "realtime", "audio", "search", "transcribe",
+      "image", "tts", "whisper", "embedding", "moderation", "instruct"
+    ],
+    "openaiExact": ["gpt-4v"]
+  },
+
+  "ignore": {
+    "_comment": "Known NON-model tokens that share the model prefix. The scanner never reports these.",
+    "openai": ["gpt-oss-20b", "gpt-oss-120b", "gpt-oss"],
+    "openaiPatterns": ["oss", "\\.png", "\\.jpg", "\\.jpeg", "\\.gif", "\\.svg"],
+    "anthropic": [
+      "claude-code", "claude-code-tracing", "claude-agent-sdk", "claude-web",
+      "claude-user", "claude-searchbot", "claude-powered", "claude-trace",
+      "claude-session", "claude-fable-5", "claude-mythos-5"
+    ],
+    "anthropicPatterns": [
+      "^claude-code", "^claude-md", "searchbot", "-bot$", "\\.png", "\\.jpg",
+      "\\.jpeg", "\\.gif", "\\.svg", "\\.log", "fable", "mythos", "-powered"
+    ]
+  },
+
+  "excludePaths": [
+    "**/release-notes/**",
+    "**/changelog/**",
+    "**/*release-note*",
+    "**/*-releases.*",
+    "**/on-premise-releases*",
+    "**/*changelog*",
+    "**/*migration*",
+    "**/*migrate*",
+    "**/api-clients/**",
+    "**/.agents/skills/check-models/**",
+    "**/node_modules/**",
+    "**/.git/**",
+    "**/.venv/**"
+  ],
+
+  "scanExtensions": [".mdx", ".md", ".ipynb", ".py", ".ts", ".tsx", ".js", ".jsx"]
+}
diff --git a/.agents/skills/check-models/scripts/scan-models.mjs b/.agents/skills/check-models/scripts/scan-models.mjs
new file mode 100644
index 0000000000000000000000000000000000000000..065599466b4cc9148582875b1c2756edbb3bf1e5
GIT binary patch
literal 19646
zcmcg!ZFAekcD|qWE7q(h0WA@b6+2Dk>o|%nC+>A@kEPo?x1wx<<dQ^00xSUPB@$=)
zsXw6YOlLa%(BGC{(&ss67XYajCvK;1Zv<d>&z?Pd-uEp3=3cZY@~EGrkxfrcnhotn
z6q&)q4o*5VyiE&J4B|AkdB}#BNjfY{wkU^jX@}<R+@^c4Ol0=ba*}8BWME{FdG(?&
zNot1Kpx6S`()P3Lq%d(hGy_20;*(4?jf)}~CAKg&9!yMJ?3&!pr}4m=AU3Hzv$-ka
zne7zGFV@(9S|q3O)TX7e=Sfi(?JaaaP3#$g1yh-1nw(e@J4E`65>PCVj?H8-i&Kn9
z#%Y!d{F5|0OMzdiYc*wdV%a0h1-V6@bE+Vo<~AN)n8h4D442*x&3i?2#vWyyk8O!L
z0Cs5e<P@|S<=M=X6Km$#G#OkFV7i#~Z7!5i+J-+DS!#a$=YKc-#SFkFSz)VQXA_$O
zYtX>0rI;l1g4;k?e=g#&-Q|;!ZIf+sTNciKF3iz9E+<78hR5h3|G1H6luT_A4HKXd
z4DBdhOv_y}I2*R_0M+RXlhMfhu^$)qTLG1&(+d+1hgjz{NwIsUg>Mety*Fc$=uUvL
zaEBNF%&_|a)FGkc{?wAk=8N)981Z!4fMEQ-+VjD5F|_Zv{5!{|S!Wa{HH>(A7GHo7
z&y&(@bL2js$gk$1(cMTdOwZ<dmiIb^J;hF!7bd90VV-BF3HBfC-v*)6VUmtbQkYS;
zNQdFZMlzdcd1*%YLgr{u+;&Gr>xpmSmNnb>UbkKXH9I)CNC#%w>;=kAhaWNdz!nW)
zm$KJ?e*gL~BP~|PEb=J;Zw#`uD9yqC>z5zj?H&GTp017w6ytDa%Q(c4_7gYa$Nl&F
z=uG@V@KOp14Js&AFMdI<plzOht9^4jjZ26};IIwBc>x5j9&4_yOpBb_;ALQ{h`xLK
z`qlHl0_5KxynPeS<GirJs0hh4FTvL&O&~b^oF!>MB)3efA!;p5)moIJZ(402$t~{S
zAH;TC$P?c_`=7&6fPpl+9U)#%!vY$>20ImDvU!O`TpCKjt{KJCf+}%3t)A2DB{;Ks
zrTFbaF-)h_cca`Eld6sDh1NVeHp?d)qb!FSw53S^fBOmk{83gHLT8T42|jFYw$0^+
z0b6tDV{{|JN6GOMd`w2vx=)`zHLXsE$hF!AxT>0ZYRY_J*$q~~uJ#4)t}BJ|uAWZl
zTyI_HoddXb*ITrZLwIvjQNE6gBaj>8-qxq!s)Takd{In-7+bU4;PUGRC7DoTTge~;
z0O_a|wm>iP&yU%e$)-c-QmBr0L*GNAU6{Vb|6@<AG&n<Qip6+rVZ1<nhh69e@o?|;
z{>K-4e?7n~j~;urR|~Q?wlAokyg=2vIK;OXP`OSbR7%3dT#PhUW-w2=$W!x13{aR9
zZ{jz>ux*|RPnliw69st$`<VxUK!Ra}*mev$Mdq8YA8qp=iPR7qPg)At=Zg=icnCA;
zsRPmfvNZiXI|Gx^exBLBm`Fa;!ovzfr_OB-!p>lT`WMm-=8(1;0uU3V8hev3(v(`b
zoFqjEBt6daJV~(wgelI>6yC%xlza?hp2;+@_FL5Yx!oeXvq=K~M0diRSnLD(*<n(O
zFC&0*JA=-L`bP`Ie@<i0350DOADqEt0H3hpHHXlOP>wA&bY;^6*+rGvj6&=7ID-Qg
zCq$wC7JFZ8%Z$hM=W&*eVOODEFR_VU0gS1L#^bO|=$NcDW>XtZvvJT{@doLvcp`Ff
z>hC!p=bgAZVncS#{Yz2Bx2b$-yj4%P#SS!8_j2QP42|Wun&4OYe0l#8EC6v;o_=Op
zEiiV^1oC|ew+(w1kF$1fxv}xwxps>8F390rSvVA<X*{N@_4NK_7DFiXZE3it`V9|w
z#bzh^&L@0G<J5KF(-J^yw>X2ELFYIP`Gn3;uBk2clRo3)Bu-8i`b{1Q;|JY0Ag1$=
zp}1szU{*}f86V^k<K=H>nv_<)AGPWatz#_vFst@15P8{!i!MA$AyeVn;(NNnz>RHC
zW~Lt>7bjl1Ii_xtjt@ig7u!FuaSm|_7ES~Pj^48j%qjiEb8EpHfF|_B*Lhk-;sn!j
zTQY{RR4${Cns}bHFwe^Pn$FLsF?>JmUsRBqUrjPBt^sgF{(%ylxY06`$EDc<7k{4H
z0UR0FPzP#n0p}gEE#3__cag&xqauLJj*@%^&WjURT#`usP0Dg!>_!n{osx<Sd_hqT
z2iYuQ*c2%-ZtXshh1T6|8pVBhq8)8;!=nsNs>DZY(_$u72;tD4Mq^ob2c&Jz?baR*
zCUH8pPz4?t(V8+phnV6dfzki<U;fn;6GT8mH+MK28WTXBXK6Bsr^amp$Zs~qtL`XZ
zi4Ti5fbWFnCG@vMf>cOb74<ly;}vCvy*Dqwr_|PHVP8{}#6rVDMKrMCICK#d1VMya
z%rd3hoF(vy)68^a%P-(FQMH{|I~U20D8mNP87F{+x48m^m9e>O{T5)Aj9O@2V80lk
z5);IRe_;dMeqgmIu&;=dBF;C=W5k=fvz$&rLImHuJp?qwtr`NxE}{&L<2zN)#T<IR
zMgU%)=dPMQ3i9A^@B7#LPHbQ*l|&TV88`y_1V$M$_T_HC5@x3224!uFc*U5nLi1__
zX}o~>#4>fihFfCyxM)bXm_mg5x+wJ&IgbRiEvKzp727*5ueKwCe<^$qod!?Cy~vhS
zZ?Y4?rg`vFlKunC6vU3)BL6C5CwdhS>JO?_S3ZeS^+Vl?g|y)V(K#fe`|LGV<Ue?!
zmyj3Un}XS!7MZh2!qd-1Rtms_GpWIQy<v`k1(Yxn8ITkNML9yJ5{n|aA2F-Yfy<%{
z1<?Vo30oo#g+~U)9E0~%u@F~hhyloDAi|BFcnY8<gTauoY#@GH9qkk)Gbm_<@=q`l
zaKJ_-y(hqfzmp>Q;`1k(fq|qcPHHcoiML0uXb)nSXJCyM!DuVJGbkdHjU-jlf@F+S
z$iIlI=3*dO&<K!=7-=Q-YGk5A%;S6rc-hBK45ql0qztT*D1^e=bjwY&bVF6+fivLB
z1aQI;Y+iCnFcvd>0*1+)PHj+xW%fElO6fTg89}=piU0>u&>gmq!fxl758Zy$egaB}
z!#snFEMex$X%?43@BZcN=;84)<lmj+Wv>kvgJIbd3BDHTfHpxr$~Vki1)5Q6L|jwO
z53HnN6(4$K3_+p~FnLiD(lj~%>K=NhjjIwkegPcN0rH!{v)yipzwKvjtU_3n5bw>}
zZbkY{i38IBK#k>bZElqIAkEPw5*>LzQjBC~#ttotDV?mSLlM2+&ac`Ie!Sj{k8Gur
zBDxu2K4ltve&AWu3~Yh~6Wv%cSUe~f;H1}khx-Rmn20;32CJY(AiL4YS51d30CEF$
z63zp59E$wRmgrmpZKxF|U<4<@P$-d!Jc40Ua55qjSi&>GMh%h&t=7h`rB+quarlVQ
z1AK4aND1MNJq{ml0p1{{o5XJTu4W8L&Sco$^01*};1^&`B!%-N1O-LyEL$K1l{5m<
za-2mPy$Y+QjSJ+Q$d)Z?8_!L;j8_aISE7m|?*6fP_H6ZY<q<boX-bwnfS{%0z{9J=
z>kj5rHq#M+05B`^HWiKpj|8ijK@N%BvZiwH4n*r-4Ur)8f9rgTwkDcu?W0M7OBGU8
zJqRV#I7M#m-j<;<+9d;ejkN2Xl%>qHtTQFi2j`bLu~s2k6>FwfL4;agX{TC@9T7Bt
z9oCjENhRZ11cV@wlPES@?4=j-ZOd=r`p|cA3FDI%?VI$2mwrYpPWE*Ih(5oGXseZ7
zZ`QBUL2T}NyL@D`C{JNV(NmoACR9jikfze~x>6zg%4bKRiZsf@jcpe|_gswl#TT?B
zrtvDx$W><3-^dfyOlxdWa|CB_?0^&#jV3dUW@4xC%l@}%M`r;mmH{z>k`@poErqN|
zj>>UL5Ouo4%STK8cgOwreuQ)=@?=!Got^ESuRGgcbsqlh4dcF9!m@Nu#j;eRs&S7V
zZtv{C(ISjT+h6Z&cTRWKfvAkTu~p`IJk9!t*?Y2mFm`xDL%|Ii@1M^D0n+-=GMn;b
zOcjEddR<51=7;;2%b<OA)IIKYkE1bG)$QK@qE&f`jo#e{-3QX+54SW+<I#uc_`!`G
z{vhok8Y`q)_Zb9&$yA%U^N06v(<+VHJd=RYo(F#6)u^P{*K-T%;Q&b!M~!<-NHVDk
zX00KIMTU$!x{0@wTeZwG3OOAA-ikmuyiFzBt=QF?f%D*WPJ(KBTLOx24)|O?G<E~|
zuvXG)>s(M7jZ`26Q_k@;DE{1J;Kr!(pcLN4TLcGtuL6Wk%>>ysMTz!RS4t#a7koWk
z2E5hLTGgj6_W0e(RL9ktR9MydkC${=;7-8HQPta`qo_X4+zt|$dAcb9k023UL3Bs)
z3?7e_bI^ENCNs+jW-&|{9wjrTxFRIt<Fpv$3G&N$K^`+wm=@=YL!LU}i?~M`sfKi8
zU|WV6Q>$P<CMM|mc+ViBCnug07y`})+AhA#tpbHlKAZwc<XPbY+4DILGvEh-C$qDt
zii(tv8rg0MdvGAE=d<)qru}$u!blM6rstfKo*We;Yhc9Xd187M#G$s;=q9=GE{#)a
zJba}qqdQ&9Yk@=5E!6!5h*w!nnwX%NhaL(V!1m$lj(hWi7^2!cH^6Nk!2n8)C}s0)
z|7Sap)LfaB7jqj<;^OUD`VM(uo0k_M;+^U0D7bz!*Q%th^NU+z3+@L_Xe_eRD{~%k
z@G*T5_#Zfi1c#l)d5Z2h4TJf`djmmy6JCmhlcGY?bLv1dapNh{plf5Xwa!@eCUk3E
z=Bg3^(%VVaN{9V(&~u&G*u3g=qCs#Sh{fEoJio;dpy(8?F-)-w37E)1Ey9WLc%x<j
z986}eWssifqwt{$-38rmN`-Sr-@JXp<7m*5Ubi?y1Wcwt3I!rtR*681V-hAk;=#bq
zB{cvRFYOFlA0v*m5I{hJ%oG)+DP?ts4)%&^A}m49n;C3O?HFeTpuL4t&j=wAUMKfS
zgxGo14Ic&)nq5f%)_&$=EM+Wkf&{S*x;7cPk{jpeaZm^(ioYOJ6|@8}telVYWT$q8
zXPdnw#*<$1bW1OWdeVsxu1{-(z!@<gUO}x!mL$?#nrOlcwDLStWq_(vWqte@vMv}4
zb2^Fyk6i$a_*de1^e4T>k5@fyI{zz?@Az>Y`f`Ahgy*2tj6hxI#~_Ito*JACu%!+y
zKqvaLaA{qWy6x5RWDU_#>;{NzxdkGdZiT4Sy#XK^Zh?oE2DlrBded;rc;8GQD`5^r
z>Z_m|K5jq54ydo-*zpp5;g*Uv10_A3xcgs)k2~PiK91DN0dltf#B>I&iTtyr$zm>o
zy2^>?4CEJpj>}=(EFA@}FX5~RF5$T;2<AorEpZgCE#OfTaq6?^Lw5AA^XT{!OR6`S
z*tNhDZjoS%x=k2B+xWQB#;w#QJSj8XKx(Drb@Bc!YIBj|>>?Yd;&^~UQHD~MoE2D-
zl5)xskxHePuT$2?-`8kb$Huo%jK}PZfxcg$W<`CFgvmi?cp`6zAZEj4fWuX0sgd-Q
zQW}s1YA{&Lp;reI>O$!f+UgSI+@(ir*mKT|dHzajl911bF2y0mP%3|rr$n+2Tc8IN
z5^fVNeUh+tMz615%>wGg4wP~zn_&%yDjDjJd&=dX_9axCCB-{g8d7aXd!2tgcI-#D
zS8|}OAKo2y9)U9=(|WE4F9&jd*b<4b!-YEDAf(U;d*@stU!?RD#!cq{H+!_9+Qnx8
zpS7f_NiuLZv6Y07l`B%4W$$g!tHs7%^WXRUS@gOKFSLddxaPT2p(yUu8t75QwkYpI
z>Z;+{!$wvCRuzZe;@ns5YYqs7`!@IH_GC#@?=-Ay!^N*wp<-{M|Bob!a$X~a()3|}
zw{Fm6OG7JEYZ_+A8Zl0dx<jU$o7z(NRfl#o3j`1AG3x`@K6V}hUTxX|ElxFELl)pt
zbee|kRiUaQf2O?K@*l}7bnZV4@9$8hI{?Z-u;OmuYWDvV?vZ#UFC=2}I;cJt;aSF)
zg{x1+OY?NZTR1w4QyarqWV8fDB@kuBa5Ae?7(FeZ+|F>&9XD!Jf?2{+Lv>1_0eq^v
zBU#^eBk=i+*)dOGks(s}H)X@56y?*J>{c`UNn-(Y)F}L*r4D-L37;99jD+~wp{E3>
zffV^_b^tx<SWk~3r4WmT!R7l0aV{1P`&7lRbp#6MAUi|V1d9vk#IMGnbU7Mf5yNLf
zhGUP~AXqTKjN!1REUtmolCuasJh-k7b%E=;@S38#rW6}W?~aNd?r1Rnzld(fpA+0R
z2i+9g)@HQX)5o#RXw@BhHv-d+s8QKmb=ht*Qmk{hJ7iq(ecUWA2yTXq-U*ROn?a+p
z=^?AcQRIYpEf3Axl;^Nxdv4&S^r7a!LkTPI=tVrC*3%029MR$c)xj#xDB(mjr-ttE
z?VtAFe0;xeo>E10`-oxaxgKqI{_glF?)-wkJe+)Rr9LP#v2-u5IykE5n)#5mpItcx
z>xQfj9czWhHJA*O#tAg$HPhWZx+G_<pJ3?8W~CS0IxlLAuN}NHN60EDte<PTjr7Bv
zgw$i#+EwHoZeDLlLY;AQ2S;Rjo~~LIHSYAB9>Z_)E+UE=@A}uE1K{zx1G8yRH>vaP
zbjx_YEhp5@(qPwS@%cwVyQqQ$9fjo3y@;(m!h`xpI?#1=c2l}(8VGRx+0zN;hF>Nt
zx{<48GW4h?MmP0L@m1Dc`S!e3!6I#FFOQ{ojyIx(5I__T>sJC@@0`c0E~!I8`!WZV
zJffZiMLX|(qQnQ<WSK_pxYzCWQ4qMqpPtJgH<oQ!Fz|a9j&!tNZSVSOrOt&jB3?)f
zgbb*>htC3sUk5R#<BUbGI-jXR9-y<rT)Z{G2m&gf1MlLXJKAdIQc$c}FM>SFaFL}^
zB4E#v(ZvxJ&Or`r>5o2HAG?2!G7VaoqG1u_Mn}bsx?gAU^aP&*&EadfNYA6dvLp!4
zFW?)(;30A;Lii%3*3Ki`SVDubdg&5`V)_VQWQh0+$lnz&P`NTFv-|=T6oNGYiiQY2
zTy>*1GGjyg5wu-kbywXGSuFhb6!(kTK0h^IJ8wkgV$EwPw;B;E>=Sa?xkIdNxTZ#R
zKeivqQ(aCYfG6>od)1zJle-C+u*Ovgrrb970I)<!%*$qkz-Av8QUXpXJ)9Pjk)AP{
z*Iy8jO4X}g%jjXrQYeY))9N-%rC?bOyvM6EuX%IEo!N4WbnBS!*3JJ3=R_L+;Rpxm
z)rskRD;z<|(Nb<Ho8!g`!y>uDeU<IP2Y~SxMuG|@@a`hzjjQ3)ZFB#UzQ+=mgnr!L
zdw~;eTro)@-{5hm5V<J9D!23R7b6q~?V7J2nFj`kebpa-&aEsHI31}=Vg0Z~l|k>h
z6euub%Hk$ZovM<-p~0G7i@PQTuQs8!M)wxBtuAHa6$<PW{>MebPLot}EdtU)dg6VX
zT^jsEYLTU=VhS8yltHVDiw&M1l<Ff4WXGzarn|}iGyQH9gb%>D4jc0M2t(c6Bcu&+
zsRAGGVc&%BfD~Z!t$^_24Hw{?B!gYK$tVEFJqjmQI{jpQ`tQCoIH_el&z2mzn(ij+
zd)oDf@4jo|G$Bqk4W#_#RayoUAh?AsyJ2QxO$x$3$;zPx^3o-LH<9wB-MZeei6+j)
z1>#hK5ZWO`fy(|h2#AT?)8?jMW0eGT(_IJKQJ`xhxzA4WFrAk>Dp<KK@VRQZ*3ixY
zb62&fbWwEX&(t-j0!kut*nizBtQGyp1gtXyhWZ(Gkob&PXB1LPEO+@t^68TpYkkyH
zCDzr9LK^vHj||J*U^pf`1YjVt)4ynx1`?SK!rsm?Q7tyYH)yX)!_ZW4FL;d1172v0
z3~5m)LuGn|F9#<mfu-<6q$Aep#hpLDe`WeOkB8F)PX(3QyuleJ^NY0a4jv&vxRY13
zcM<VSQS}GR0BVuqvIae3W@=y@A+X3t!O!(BmnO#$i=li9M<;+Ug}JTG>>-#b#8pNF
z)(T5gz3%4CCYUTgfo7Ep2vP@$U<=dnIvs4edfi}K+*M_rGor3DN$JgSsWIvN>3dna
zR92F@T)(>qh032D+-H%@?h~cL))k4cD<`bo=*s;K=IQwA=ed10z8WW^tKxLrc21j6
zv$;p0#7{t1t7_$(7DsMEbiurL@5n&U;v4>JjsxDN>q!ej0y)anz+j%7LtLn@5LFIx
za6~c5<lqMr?70IjapmZYND1f^=D2WoSCCoILt>0NL->J8Vl2B+*9O-35}FfXD^8LD
zLg_AJ^d<}Mv~U~IhBIEbf*9M4n>eDzrA`?JZk7}tb?J-~H$CeT-jYIQ1<nW_o6hZ$
z+G&}$u{fSaYoRdV%o>Re%>SN<ddXbT>oXEbHdPXMdpKW>?sboXqYvHV;|Fcjf&|eR
zVpIW9VL6&11vo^wwrO@?-0By6V7ZxNkz<iFmF*<m10f*vi3oiSP^&PGaA<C|d?GR5
zB4bI_$(wgjhM+*h6y1Qb5B{iB(ODINOc*%a&;m5nXh;u5N;C?PnjLLJXK|#6$(u&m
zspR77M3bw-WnKzqnAcKKYY(}>{VB>XE=Efo88^QLP0hQ{EjY|pOPzJr0cP~ES`~$P
zv<IQnZE}OqXG4);U2X8N*Vjb_r<Yb4t~kwwk~*)y6iJ8=?!O9UX&)NY9l*wqK&wXF
z=A`r!kO^`mFcILpb)Hend*4HDsXk%aN8GStfJ1%W7K758!!Of|-(%>u6!#~6wR}ba
z1xLH&;7~7H8{8d4@hRi~90Eg_V1M=Dsn$5c&O-B=_sJ=P{^&xLpd4T%Xkzg`*@8#L
zs!x0IIj^kX0!V_71;Q+E$%yXoJ;MyFslMQdNm2Y4xFBJUIyjtn0y}T%lJtNhF{V+E
zBx_e-UYT;YsY|gN%~ll-jX<!Sa*=XQm5UdK9NyWOH;$7dIVGOsKw=JYME#Of8Bu!^
zZF<fsCb~(`Y`-l~YB0%B#aYEM*=5Zeucm2=<2^?SB?DG)k27D7G5{LFiuSsgONOj7
zy7;cTcPhE!wpZIJF&U$umvAH`{1Kep#pR|C+l<!83{A(fvF52%M<0RBDNDsM3cT_%
z2x4UERH5ObXTMTFs)8JE9bs(8JUkZWAb-MFg6V~V<FRb@BN4kQ?9Mi>jlHJ8$R{fv
z3Utrf7`jI&O;qww&X*f5$fK5QtW9LT8BSOHT`dTp5$S@4lxT=6B1@bh^A<Yfddq;N
zuw^5iimdjD-BWNvNdR2Gem=svv)(XY*(e~5n1$4@f=h8opmZxo(yP%sD0|K8;Fwin
zCJ>y5&f9b3bIMY)m+C%O1@`Ws^|O3w?-Ad8Ulp1SUfKc*1U|IQJJf_7myqY3T~+6D
z;fi{0l8jNLR{9*egDpe9zc>^Jvl$laRd?MBVQ=Yqh&kTLCDgV?WzF#-uqg@!`6C~$
zzSn`4Sj{ydUaM4~496~0c2+>4T5W;gy<LY3q$qllJrTb0rZ(K<F)j}qRL=4GiM%$8
zjakcwfWEr(tzDn1VnAs$spfU;w`QBo@~vK^>$FAhU+jG7?v}sT9|-_H`dKj`pZj+=
zjo0<=Cee?$3x+}Ziw#_gge6>B*uQW?yk=C|!;zA00jl~d2isfbE0+GxgCG^wMoi==
zyEK8+R^#VT5bEP24tSRGq-?Ql!)VEsU}Q4A@Vfx8_v?TDxA_US9Y@@H%h+A8XQG38
zTqq2jRxdL7GdfB2@p4dQj~x{L1WX}*IFb+N%dndo^BzfxZjWg!AjhAk2--^sh@(-r
z*Kk-hg(%nFLW$!VG3#}dQDD}*jVwPL;I7D_!R1%ohb4bwMi8NVH+~Ic9TP}!GD7hh
z4<wyyMO45NbS^~Pael=g9;pN0&w6qqDLEb5T^}|x;1x$gW8x8ni8$e>QN$0_%7RLy
zSaI+>MoWPL23Hs6`Lh(Y6W}kY%MXHR019HjjdWDp^;v884t4KBXZY+qT}5gtMQx=n
z%1ImU{R2tYBKTGfB0gpb9U)n#9r`B?!=y6Nzo6w-xj<0JEFMH@-Dr74+9uEel26g}
z0cz!*L2VF>s3!foCFQEjy0~8LI{sHt^)S8K5ZA0Ju%2G6U;pdh%$tl!F-i{ag;KPt
z3xNJdYg7swD_mK}5q2KGTrw&?iZ4Yi>5zs8$`e;8NOd+UV_fG3<+la8De@!$Ql<M=
z;EQnw^lJ`_I5PBF<e>rFr(;UofdYr#$cm`1!t(z^?w&!bwA4z{lIr%266&<kN-*P2
zfLWW>GjenSfgOcfSg`EDVbx@gvb-MJa2}Zi*P!*JuDv#w-wl97<y1UgnmTR7Pndwl
zYDpTM<9@$<(({W3!KiXHDS88OR~UO)?e`WyH931pou)|Q)Ib&N9hym#vz)j#5UZv~
xeBPp3AosapW1vt80v-v+%N+;zQ!wx>S5St9#ILXTK#ccUpHnYCH4mR`{0};xtS0~f

literal 0
HcmV?d00001

diff --git a/.github/workflows/check-models.yml b/.github/workflows/check-models.yml
new file mode 100644
index 0000000..f93ade9
--- /dev/null
+++ b/.github/workflows/check-models.yml
@@ -0,0 +1,97 @@
+name: Check model versions
+
+# Flags out-of-date OpenAI / Anthropic model references introduced by a PR.
+# Only added lines are inspected, so existing references don't block unrelated PRs.
+# Policy (latest models + ignore lists) lives in .agents/skills/check-models/scripts/models.json.
+# Platform-wrapped IDs (Bedrock `[region.]anthropic.claude-…`, Databricks
+# `databricks-claude-…`, OpenRouter/LiteLLM `provider/model`) are flagged on their
+# embedded model substring — migrate the version but keep the platform's ID format.
+# See the "Platform-specific IDs" section of the check-models skill.
+
+on:
+  pull_request:
+    branches: [main]
+
+permissions:
+  contents: read
+  pull-requests: write
+
+jobs:
+  check-models:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd
+        with:
+          fetch-depth: 0 # need the base commit to diff against
+
+      - uses: actions/setup-node@48b55a011bda9f5d6aeb4c2d9c7362e8dae4041e
+        with:
+          node-version: 20
+
+      - name: Scan changed lines for outdated models
+        id: scan
+        run: |
+          node .agents/skills/check-models/scripts/scan-models.mjs \
+            --diff "${{ github.event.pull_request.base.sha }}" \
+            python typescript \
+            --json --no-fail > model-scan.json
+          cat model-scan.json
+
+      - name: Comment and gate
+        uses: actions/github-script@f28e40c7f34bde8b3046d885e986cb6290c5673b
+        with:
+          script: |
+            const fs = require('fs');
+            const r = JSON.parse(fs.readFileSync('model-scan.json', 'utf8'));
+            const marker = '<!-- check-models-bot -->';
+
+            const errors = r.findings.filter(f => f.severity === 'error');
+            const warns  = r.findings.filter(f => f.severity === 'warn');
+
+            const row = f => `| \`${f.file}\`:${f.line} | \`${f.token}\` | ${f.action === 'replace' ? '`' + f.target + '`' : '—'} | ${f.reason} |`;
+            let body = `${marker}\n## 🤖 Model version check\n\n`;
+
+            // Surface a stale-policy warning (content references a newer model than the policy knows, or the policy is old)
+            if (r.stale && r.stale.suggestRefresh) {
+              const newer = (r.stale.newerThanPolicy || []).map(n => `\`${n.token}\``).join(', ');
+              body += newer
+                ? `> ⚠️ **The model policy looks out of date** — this PR references ${newer}, newer than \`models.json\` knows. Run \`scan-models.mjs --refresh\` and update the policy.\n\n`
+                : `> ⚠️ **The model policy is ${r.stale.ageDays} days old** (updated ${r.updated}). Consider \`scan-models.mjs --refresh\`.\n\n`;
+            }
+
+            if (errors.length === 0 && warns.length === 0) {
+              body += `✓ No outdated model references in the changed lines. _(policy ${r.updated})_`;
+            } else {
+              if (errors.length) {
+                body += `### ✗ ${errors.length} outdated model reference(s) — please update\n\n`;
+                body += `| Location | Found | Suggested | Why |\n|---|---|---|---|\n`;
+                body += errors.map(row).join('\n') + '\n\n';
+              }
+              if (warns.length) {
+                body += `### ⚠ ${warns.length} item(s) to review (not blocking)\n\n`;
+                body += `Prose mentions, specialised variants (\`*-codex\`, \`*-chat-latest\`), or GPT-5/o-series code changes (\`max_tokens\` → \`max_completion_tokens\`, drop \`temperature\`).\n\n`;
+                body += `| Location | Found | Suggested | Why |\n|---|---|---|---|\n`;
+                body += warns.map(row).join('\n') + '\n\n';
+              }
+              body += `_See the [\`check-models\`](.agents/skills/check-models/SKILL.md) skill. Policy date: ${r.updated}. Add \`check-models:ignore\` to a line to skip it._\n`;
+              body += `\n> ℹ️ **Platform-wrapped IDs** (Bedrock \`[region.]anthropic.claude-…\`, Databricks \`databricks-claude-…\`, OpenRouter/LiteLLM \`provider/model\`) are flagged on their embedded model name — bump the version but keep the platform's ID format (e.g. Bedrock 4.x needs a \`us.\`/\`eu.\`/\`apac.\` inference-profile prefix). See the skill's _Platform-specific IDs_ section.`;
+            }
+
+            // upsert a single bot comment
+            const { owner, repo } = context.repo;
+            const prNumber = context.payload.pull_request.number;
+            const comments = await github.paginate(github.rest.issues.listComments, { owner, repo, issue_number: prNumber });
+            const existing = comments.find(c => c.body && c.body.includes(marker));
+            // Post when there are findings, or when this PR concretely introduces a
+            // newer-than-policy model. Don't open a fresh comment for date-staleness
+            // alone (it would fire on every PR once the policy ages) — only update an existing one.
+            const worthPosting = errors.length || warns.length || (r.stale && r.stale.byContent);
+            if (existing) {
+              await github.rest.issues.updateComment({ owner, repo, comment_id: existing.id, body });
+            } else if (worthPosting) {
+              await github.rest.issues.createComment({ owner, repo, issue_number: prNumber, body });
+            }
+
+            if (errors.length) {
+              core.setFailed(`${errors.length} outdated model reference(s) introduced by this PR.`);
+            }

From 96cac1ed846fa71cd3266f1634b421498e74d765 Mon Sep 17 00:00:00 2001
From: Jim Bennett <jimbobbennett@mac.com>
Date: Thu, 18 Jun 2026 15:57:58 -0700
Subject: [PATCH 2/3] two-tier PR gate: fail on introduced outdated models,
 warn on pre-existing
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

CI gate now scans the whole of each touched file and tags findings by whether
the PR changed the line — introduced fails, pre-existing (unchanged line of a
touched file) is a non-blocking warning. `python typescript` scan paths.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 .agents/skills/check-models/SKILL.md          |   3 ++
 .../check-models/scripts/scan-models.mjs      | Bin 19646 -> 21318 bytes
 .github/workflows/check-models.yml            |  43 +++++++++++-------
 3 files changed, 29 insertions(+), 17 deletions(-)

diff --git a/.agents/skills/check-models/SKILL.md b/.agents/skills/check-models/SKILL.md
index 1c1e0d5..7d112b4 100644
--- a/.agents/skills/check-models/SKILL.md
+++ b/.agents/skills/check-models/SKILL.md
@@ -78,8 +78,11 @@ It prints the current policy, the WebSearch queries, the authoritative source UR
 ```bash
 node .agents/skills/check-models/scripts/scan-models.mjs python typescript   # scan paths
 node .agents/skills/check-models/scripts/scan-models.mjs --json > out.json   # machine-readable
+node .agents/skills/check-models/scripts/scan-models.mjs --diff <base> python typescript  # PR-gate mode
 ```
 
+**`--diff <base>` (the CI gate)** scans the whole of each *touched* file but tags every finding with `changed` (was the line added/modified by the PR?). It splits outdated-model errors into two tiers: **introduced** (on a changed line) → fails the run; **pre-existing** (on an unchanged line of a touched file) → reported as a non-blocking warning. Untouched files are never read. Full-scan mode (no `--diff`) tags everything `changed:true`, so it fails on any error as before.
+
 Each finding is one of:
 - `✗ error` — an outdated lowercase canonical model ID (e.g. `gpt-4o-mini`). Migrate it.
 - `⚠ review` / `⚠ replace` (prose) — a model named in prose (`GPT-4o`) or a specialised variant (`*-codex`, `*-chat-latest`, `gpt-4v`). Use judgement (see below).
diff --git a/.agents/skills/check-models/scripts/scan-models.mjs b/.agents/skills/check-models/scripts/scan-models.mjs
index 065599466b4cc9148582875b1c2756edbb3bf1e5..4684f6f0e926e9e7d47e25df6f62c30304a406f8 100644
GIT binary patch
delta 2015
zcmah~ON%2_6sECN$Yum}<>pMLXRN|X1yKe&Aw6m)&@oM?>26V(O>U}Ar7m54Z@6_U
znb0)RrEY5>zF=^nTcusOa3i?VAK=Et_y^ny3Z8qbs*|P_10j#|`p$R0bLxjrHh%hQ
z<C|Z$m(3=Of|!X3h9hte4jUnlA~=;Hx^T+WNec5Ab&Lvh%u`6XK$x=>SQuj1BtIT;
z3}L(=JHo-CU!#me0huVX+K?1v1Prvyf|$G`o=_Hq6I|jHhWmT(fy{KsG=>dRIssHl
zsuY}5+Po7onZyATR9y*RV+w$&!UjI%spet=A&#{ROoR|BIVGD+u{mXmoi-KDIL<*x
zo}v$Z0a2DDjg+#XZPlb+u@GX!vP7q#RfZlA1e~E-Q2$hgI5@$nwska{5N%L4+A{9o
z2nl11*^En7T3kQ>d3);_uX2Qr&3y64_7|H=9)WYSqUz2a2>jGk<9Y!%I~}N7BI+&#
zQc#x}wt=u->APrj#uR6QKr@BnMGbSQbPQ>xA{G$t8qkYo=A2pbTc9xUXf*kwooAP2
zc@KQw58&KwllM8j*Ri=2p))0l`j%B&n1F@YC^7&sL<pov7Ni}mNi<TF^Doo^EYKNC
z$X-f;szmvZJKd*C3>TaCe%~;nJTjv;LGSDi_j*P?$bz^EvrTwl24IF_O%WHN`7!6z
zcqLj+cq+|OEhC{q>}!gJZt+*cIuhYaBfW_6<WhU!)TuQk!z{qC?zSs^N>L>J7quJg
zDLchN>@*c3a7sfV!O;2$DH`KM1}7FVDVjb>?sYraH%dttxjCu5NV1SwML^y44Rxe3
zhm?_Ocwh%=xAF*e{`c#@-8GUyp?&h|5qFNG@uBvByU^KzGYU7aB03QyN)9i<%kBK5
zZhtEgM6UI&GWsz~okX}+93&67eew01PdsHtbv&mf<w%?n)W+(7v?`X$+4uL44~O^P
z?)L5*SyXc{q*LdHZr6iZYId>QejPJSC#Bcuz1u%JHbK{TUS2yZj=2xs6@rN?O1i%G
zcI%6^CTa`a&}NsI>?YgAAFZZ70$N9sZVAq>S>~41bEONSumJAUeIxNZPGqT${Hj;8
z(Gg2hBo+(eY2Ll+fc<ZHXDz&DmQzmiZ+D$dGs^te-CM<%zjj}JV28Ux)|Hvitu?5A
zf?}d$&s>tKe(ALKPvvE^rJ5z^)!^c8|GSOX)@~ErgZkr7AJw4+t4=SWTI>5?8f*RW
z=O5>rgR@uc;U>2J!yO#E1tl%64b7*6!IN8OWXQiCys&RG>`RT5<vL)hIr6}ISIw2J
zoL@PeBLk}5uI1CWp1-|f>(aM+CfQ7;+Ijxx;6o$%p*eADdM*Wfc{5Zdd^lKtI+Y&I
rH3di2N*j%u2V?53FOj{b^jM?&QQnLjSC59Df3WvlZG}GnZ12`TtpKn;

delta 393
zcmYjMu}T9$5am!(1hEv1&EzhG3nVK68;xfS7;SRmVHqP?b9?b%-Icoy2qE^?Q;1@~
z%Em@GYfC#DKS0nA@E3IN(qx($X5M@A-usjBc4xeNY+}7(C4OVNh!?x%B|^vmCY44W
zV1mo7gXFaJZL9<`0(&4ute_h)UIiWs7UT8aSAXur1;_xpS|U65NGu~|+q4R26>xTd
zLQPK<7;Evg!XOh>OsKBr)}Y>lqT|3w_$<&5d?=Rzr|mUQzn({FjdT-=5lfAx6%VNB
zXpTLzC#o4@Ub)!>)69bDXvtrDA%|D)IzGD=e!3euh!y25p@*+V!O<p67DX~LL9>1y
zZcW(I{r(Z%qcP^aZR~i3lH`nVIkfvHqBPFnG2u)M)%gVENcmJTAAh}@)ih{PUMeEw
OJ`H&qYX(Q_*!l<Rtbro{

diff --git a/.github/workflows/check-models.yml b/.github/workflows/check-models.yml
index f93ade9..4bc50f8 100644
--- a/.github/workflows/check-models.yml
+++ b/.github/workflows/check-models.yml
@@ -1,7 +1,9 @@
 name: Check model versions
 
-# Flags out-of-date OpenAI / Anthropic model references introduced by a PR.
-# Only added lines are inspected, so existing references don't block unrelated PRs.
+# Flags out-of-date OpenAI / Anthropic / Google model references.
+# Two tiers: outdated models a PR INTRODUCES on changed lines FAIL the check; outdated
+# models that already exist on unchanged lines of a file the PR touches are reported as
+# non-blocking WARNINGS. Untouched files are never inspected, so unrelated PRs aren't blocked.
 # Policy (latest models + ignore lists) lives in .agents/skills/check-models/scripts/models.json.
 # Platform-wrapped IDs (Bedrock `[region.]anthropic.claude-…`, Databricks
 # `databricks-claude-…`, OpenRouter/LiteLLM `provider/model`) are flagged on their
@@ -28,7 +30,7 @@ jobs:
         with:
           node-version: 20
 
-      - name: Scan changed lines for outdated models
+      - name: Scan touched files for outdated models
         id: scan
         run: |
           node .agents/skills/check-models/scripts/scan-models.mjs \
@@ -45,10 +47,16 @@ jobs:
             const r = JSON.parse(fs.readFileSync('model-scan.json', 'utf8'));
             const marker = '<!-- check-models-bot -->';
 
-            const errors = r.findings.filter(f => f.severity === 'error');
-            const warns  = r.findings.filter(f => f.severity === 'warn');
+            // Three buckets:
+            //  introduced  — outdated model on a line this PR added/changed  → FAILS the check
+            //  preExisting — outdated model on an unchanged line of a touched file → warn, non-blocking
+            //  warns       — prose / specialised variants / GPT-5 param hints (changed lines) → review
+            const introduced  = r.findings.filter(f => f.severity === 'error' && f.changed !== false);
+            const preExisting = r.findings.filter(f => f.severity === 'error' && f.changed === false);
+            const warns       = r.findings.filter(f => f.severity === 'warn');
 
             const row = f => `| \`${f.file}\`:${f.line} | \`${f.token}\` | ${f.action === 'replace' ? '`' + f.target + '`' : '—'} | ${f.reason} |`;
+            const table = items => `| Location | Found | Suggested | Why |\n|---|---|---|---|\n` + items.map(row).join('\n') + '\n\n';
             let body = `${marker}\n## 🤖 Model version check\n\n`;
 
             // Surface a stale-policy warning (content references a newer model than the policy knows, or the policy is old)
@@ -59,19 +67,20 @@ jobs:
                 : `> ⚠️ **The model policy is ${r.stale.ageDays} days old** (updated ${r.updated}). Consider \`scan-models.mjs --refresh\`.\n\n`;
             }
 
-            if (errors.length === 0 && warns.length === 0) {
-              body += `✓ No outdated model references in the changed lines. _(policy ${r.updated})_`;
+            if (!introduced.length && !preExisting.length && !warns.length) {
+              body += `✓ No outdated model references in this PR. _(policy ${r.updated})_`;
             } else {
-              if (errors.length) {
-                body += `### ✗ ${errors.length} outdated model reference(s) — please update\n\n`;
-                body += `| Location | Found | Suggested | Why |\n|---|---|---|---|\n`;
-                body += errors.map(row).join('\n') + '\n\n';
+              if (introduced.length) {
+                body += `### ✗ ${introduced.length} outdated model reference(s) introduced — please update (this fails the check)\n\n`;
+                body += `On lines this PR adds or changes:\n\n` + table(introduced);
+              }
+              if (preExisting.length) {
+                body += `### ⚠️ ${preExisting.length} pre-existing outdated model(s) in files this PR touches — not blocking\n\n`;
+                body += `These are on unchanged lines, so the check still passes — but since you're already editing these files, consider updating them too.\n\n` + table(preExisting);
               }
               if (warns.length) {
                 body += `### ⚠ ${warns.length} item(s) to review (not blocking)\n\n`;
-                body += `Prose mentions, specialised variants (\`*-codex\`, \`*-chat-latest\`), or GPT-5/o-series code changes (\`max_tokens\` → \`max_completion_tokens\`, drop \`temperature\`).\n\n`;
-                body += `| Location | Found | Suggested | Why |\n|---|---|---|---|\n`;
-                body += warns.map(row).join('\n') + '\n\n';
+                body += `Prose mentions, specialised variants (\`*-codex\`, \`*-chat-latest\`), or GPT-5/o-series code changes (\`max_tokens\` → \`max_completion_tokens\`, drop \`temperature\`).\n\n` + table(warns);
               }
               body += `_See the [\`check-models\`](.agents/skills/check-models/SKILL.md) skill. Policy date: ${r.updated}. Add \`check-models:ignore\` to a line to skip it._\n`;
               body += `\n> ℹ️ **Platform-wrapped IDs** (Bedrock \`[region.]anthropic.claude-…\`, Databricks \`databricks-claude-…\`, OpenRouter/LiteLLM \`provider/model\`) are flagged on their embedded model name — bump the version but keep the platform's ID format (e.g. Bedrock 4.x needs a \`us.\`/\`eu.\`/\`apac.\` inference-profile prefix). See the skill's _Platform-specific IDs_ section.`;
@@ -85,13 +94,13 @@ jobs:
             // Post when there are findings, or when this PR concretely introduces a
             // newer-than-policy model. Don't open a fresh comment for date-staleness
             // alone (it would fire on every PR once the policy ages) — only update an existing one.
-            const worthPosting = errors.length || warns.length || (r.stale && r.stale.byContent);
+            const worthPosting = introduced.length || preExisting.length || warns.length || (r.stale && r.stale.byContent);
             if (existing) {
               await github.rest.issues.updateComment({ owner, repo, comment_id: existing.id, body });
             } else if (worthPosting) {
               await github.rest.issues.createComment({ owner, repo, issue_number: prNumber, body });
             }
 
-            if (errors.length) {
-              core.setFailed(`${errors.length} outdated model reference(s) introduced by this PR.`);
+            if (introduced.length) {
+              core.setFailed(`${introduced.length} outdated model reference(s) introduced by this PR.`);
             }

From 71df10b601af810b231a7e6b956bc8533bf9c263 Mon Sep 17 00:00:00 2001
From: Jim Bennett <jimbobbennett@mac.com>
Date: Thu, 18 Jun 2026 16:03:33 -0700
Subject: [PATCH 3/3] test: demo the check-models gate (touch one model line;
 do not merge)

---
 .../example_arize_ax_self_optimizing_loop_dag.py                | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/python/cookbooks/airflow_example_dags/example_arize_ax_self_optimizing_loop_dag.py b/python/cookbooks/airflow_example_dags/example_arize_ax_self_optimizing_loop_dag.py
index 7a1beef..dec4677 100644
--- a/python/cookbooks/airflow_example_dags/example_arize_ax_self_optimizing_loop_dag.py
+++ b/python/cookbooks/airflow_example_dags/example_arize_ax_self_optimizing_loop_dag.py
@@ -83,7 +83,7 @@
   ``"false"`` so you can inspect the artifacts in the Arize UI between
   runs.
 - ``arize_ax_self_optimizing_model`` — the model used by the server-side
-  experiment tasks (default ``"gpt-4o-mini"``). The optimizer always
+  experiment tasks (default ``"gpt-4o-mini"``). The optimizer always  # demo: gate test edit
   uses ``gpt-4o`` regardless.
 """