Add Live Evaluations web UI guide by dmontagu · Pull Request #1870 · pydantic/logfire

dmontagu · 2026-04-18T01:23:29Z

Summary

Add docs/guides/web-ui/live-evals.md — a walkthrough of the Live Evaluations page (directory, target detail, time window, sort, evaluator shapes, trace integration). Mirrors the shape of the existing evals.md doc.
Register the new page in mkdocs.yml under the Evaluate: nav section (right after the Evals guide) and in the search-grouping list.

The Python side is already documented in ai.pydantic.dev/evals/online-evaluation/, so this guide links out there rather than duplicating it.

Test plan

uv run mkdocs build succeeds with no warnings
New page renders at /guides/web-ui/live-evals/
Sidebar nav shows Live Evaluations directly below Evals

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no bugs or issues to report.

cloudflare-workers-and-pages · 2026-04-18T01:24:28Z

Deploying logfire-docs with Cloudflare Pages

Latest commit:	`02111b9`
Status:	✅ Deploy successful!
Preview URL:	https://9ad08d96.logfire-docs.pages.dev
Branch Preview URL:	https://dmontagu-live-evals-docs.logfire-docs.pages.dev

View logs

- Type column: derive `agent` vs `function` from `gen_ai.agent.name` on the event itself (stamped by the OnlineEvaluation capability via OTel baggage), not from the parent span. The platform query was reworked to drop the parent-span join, so decorator targets always classify as `function` now. - Recent events table cap: 50, not 20 (matches the detail page). - Rename "failure count" → "error count" in the detail-page and Evaluator Shapes sections, matching the UI rename from "failures" to "errors" (exception-driven errors read distinctly from boolean fails). - Attribute list: fix `evaluator_version` → `evaluator.version` (SDK emits with dots, matching `score.value` / `score.label`). Add `gen_ai.evaluation.evaluator.source` (JSON-serialized EvaluatorSpec) and `gen_ai.agent.name` (what drives the kind classification) to the list.

… link wording - Intro paragraph: "one row per evaluator" is wrong — each row is one target, carrying a sparkline per evaluator. Rewritten to describe target-rows with per-evaluator sparklines inside. - Sidebar label: "Evals: Live" → "Evals: Live Monitoring" to match nav-items.ts. Also mention the chevron that expands a row to show per-evaluator detail in the directory. - Trace link label: match the UI's actual "Open trace in live view" text rather than paraphrasing. - Add a short note on evaluator-version badges — a feature we added during the review pass but wasn't yet documented.

@evaluate

…-naming drift - mkdocs nav: rename old "Evals" entry to "Evals: Datasets & Experiments" and new "Live Evaluations" entry to "Evals: Live Monitoring" to match the platform sidebar labels. - Sweep stale UI-label references across evaluate/datasets/*.md and guides/web-ui/evals.md (sidebar click instructions, fictional breadcrumbs, and "Evals tab" phrasing that doesn't exist in the UI). - live-evals.md: link to semconv + @evaluate + evaluator versioning docs; replace "---" bullet separators with em-dashes; correct the Type-bullet to explain that an @evaluate function nested in a Pydantic AI agent run is classified as agent (baggage propagation); rewrite the evaluator.source bullet (UI groups by (target, name), so the source doesn't distinguish rows).

Both pages documented the same UI surface with ~30% overlap. Clarify their scope: - guides/web-ui/evals.md = reference for the Evals: Datasets & Experiments page (what you see, not how to create it). New intro states scope and cross-links to ui.md / sdk.md / PAI. - evaluate/datasets/ui.md = dataset/case lifecycle tasks (create, edit, manage cases, export). Drop the "Navigating Datasets" overlap block and the duplicate "Viewing Experiments" block; point readers at evals.md for those. New intro states the task-oriented scope and cross-links to evals.md.

@evaluate

Pair with platform PR that changes the Live Evaluations classifier to require gen_ai.agent.name == gen_ai.evaluation.target (so an @evaluate function called from inside a Pydantic AI agent no longer classifies as agent). Drops the baggage-propagation caveat from the Type bullet and clarifies that agent.name is still propagated for drill-down but is no longer the classifier.

Add Live Evaluations web UI guide

022cd4b

dmontagu self-assigned this Apr 18, 2026

devin-ai-integration Bot reviewed Apr 18, 2026

View reviewed changes

dmontagu added 3 commits April 22, 2026 00:11

Merge remote-tracking branch 'origin/main' into dmontagu/live-evals-docs

cedcd35

alexmojaki reviewed Apr 22, 2026

View reviewed changes

Comment thread docs/guides/web-ui/live-evals.md

dmontagu added 4 commits April 22, 2026 13:39

Merge branch 'main' into dmontagu/live-evals-docs

02111b9

dmontagu merged commit fb6eb84 into main Apr 23, 2026
18 checks passed

dmontagu deleted the dmontagu/live-evals-docs branch April 23, 2026 05:21

dmontagu mentioned this pull request Apr 23, 2026

docs: link Logfire Live Evaluations UI from online-evaluation guide pydantic/pydantic-ai#5176

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Live Evaluations web UI guide#1870

Add Live Evaluations web UI guide#1870
dmontagu merged 8 commits intomainfrom
dmontagu/live-evals-docs

dmontagu commented Apr 18, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dmontagu commented Apr 18, 2026

Summary

Test plan

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying logfire-docs with Cloudflare Pages

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cloudflare-workers-and-pages Bot commented Apr 18, 2026 •

edited

Loading