Skip to content

Add Live Evaluations web UI guide#1870

Merged
dmontagu merged 8 commits intomainfrom
dmontagu/live-evals-docs
Apr 23, 2026
Merged

Add Live Evaluations web UI guide#1870
dmontagu merged 8 commits intomainfrom
dmontagu/live-evals-docs

Conversation

@dmontagu
Copy link
Copy Markdown
Contributor

Summary

  • Add docs/guides/web-ui/live-evals.md — a walkthrough of the Live Evaluations page (directory, target detail, time window, sort, evaluator shapes, trace integration). Mirrors the shape of the existing evals.md doc.
  • Register the new page in mkdocs.yml under the Evaluate: nav section (right after the Evals guide) and in the search-grouping list.

The Python side is already documented in ai.pydantic.dev/evals/online-evaluation/, so this guide links out there rather than duplicating it.

Test plan

  • uv run mkdocs build succeeds with no warnings
  • New page renders at /guides/web-ui/live-evals/
  • Sidebar nav shows Live Evaluations directly below Evals

@dmontagu dmontagu self-assigned this Apr 18, 2026
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no bugs or issues to report.

Open in Devin Review

@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented Apr 18, 2026

Deploying logfire-docs with  Cloudflare Pages  Cloudflare Pages

Latest commit: 02111b9
Status: ✅  Deploy successful!
Preview URL: https://9ad08d96.logfire-docs.pages.dev
Branch Preview URL: https://dmontagu-live-evals-docs.logfire-docs.pages.dev

View logs

- Type column: derive `agent` vs `function` from `gen_ai.agent.name`
  on the event itself (stamped by the OnlineEvaluation capability via
  OTel baggage), not from the parent span. The platform query was
  reworked to drop the parent-span join, so decorator targets always
  classify as `function` now.
- Recent events table cap: 50, not 20 (matches the detail page).
- Rename "failure count" → "error count" in the detail-page and
  Evaluator Shapes sections, matching the UI rename from "failures"
  to "errors" (exception-driven errors read distinctly from boolean
  fails).
- Attribute list: fix `evaluator_version` → `evaluator.version`
  (SDK emits with dots, matching `score.value` / `score.label`).
  Add `gen_ai.evaluation.evaluator.source` (JSON-serialized
  EvaluatorSpec) and `gen_ai.agent.name` (what drives the kind
  classification) to the list.
… link wording

- Intro paragraph: "one row per evaluator" is wrong — each row is one
  target, carrying a sparkline per evaluator. Rewritten to describe
  target-rows with per-evaluator sparklines inside.
- Sidebar label: "Evals: Live" → "Evals: Live Monitoring" to match
  nav-items.ts. Also mention the chevron that expands a row to show
  per-evaluator detail in the directory.
- Trace link label: match the UI's actual "Open trace in live view"
  text rather than paraphrasing.
- Add a short note on evaluator-version badges — a feature we added
  during the review pass but wasn't yet documented.
Comment thread docs/guides/web-ui/live-evals.md
Comment thread docs/guides/web-ui/live-evals.md Outdated
Comment thread docs/guides/web-ui/live-evals.md
Comment thread docs/guides/web-ui/live-evals.md Outdated
Comment thread docs/guides/web-ui/live-evals.md
Comment thread docs/guides/web-ui/live-evals.md Outdated
Comment thread docs/guides/web-ui/live-evals.md Outdated
Comment thread docs/guides/web-ui/live-evals.md
Comment thread docs/guides/web-ui/live-evals.md
Comment thread docs/guides/web-ui/live-evals.md Outdated
Comment thread docs/guides/web-ui/live-evals.md
…-naming drift

- mkdocs nav: rename old "Evals" entry to "Evals: Datasets & Experiments"
  and new "Live Evaluations" entry to "Evals: Live Monitoring" to match
  the platform sidebar labels.
- Sweep stale UI-label references across evaluate/datasets/*.md and
  guides/web-ui/evals.md (sidebar click instructions, fictional
  breadcrumbs, and "Evals tab" phrasing that doesn't exist in the UI).
- live-evals.md: link to semconv + @evaluate + evaluator versioning
  docs; replace "---" bullet separators with em-dashes; correct the
  Type-bullet to explain that an @evaluate function nested in a
  Pydantic AI agent run is classified as agent (baggage propagation);
  rewrite the evaluator.source bullet (UI groups by (target, name), so
  the source doesn't distinguish rows).
Both pages documented the same UI surface with ~30% overlap. Clarify
their scope:

- guides/web-ui/evals.md = reference for the Evals: Datasets &
  Experiments page (what you see, not how to create it). New intro
  states scope and cross-links to ui.md / sdk.md / PAI.
- evaluate/datasets/ui.md = dataset/case lifecycle tasks (create,
  edit, manage cases, export). Drop the "Navigating Datasets" overlap
  block and the duplicate "Viewing Experiments" block; point readers
  at evals.md for those. New intro states the task-oriented scope and
  cross-links to evals.md.
Pair with platform PR that changes the Live Evaluations classifier to
require gen_ai.agent.name == gen_ai.evaluation.target (so an @evaluate
function called from inside a Pydantic AI agent no longer classifies
as agent). Drops the baggage-propagation caveat from the Type bullet
and clarifies that agent.name is still propagated for drill-down but
is no longer the classifier.
@dmontagu dmontagu merged commit fb6eb84 into main Apr 23, 2026
18 checks passed
@dmontagu dmontagu deleted the dmontagu/live-evals-docs branch April 23, 2026 05:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants