Skip to content

[WIP][OOT HUD] Full pipeline: API endpoint, ClickHouse schema, replicator mapping, and frontend pages#1

Draft
subinz1 wants to merge 2 commits intomainfrom
oot-hud-pipeline
Draft

[WIP][OOT HUD] Full pipeline: API endpoint, ClickHouse schema, replicator mapping, and frontend pages#1
subinz1 wants to merge 2 commits intomainfrom
oot-hud-pipeline

Conversation

@subinz1
Copy link
Copy Markdown
Owner

@subinz1 subinz1 commented Apr 24, 2026

Summary

Implements the HUD-side ingestion and display for Out-of-Tree CI results, as described in the OOT HUD RFC V3. This is the end-to-end pipeline: from receiving relay callbacks to displaying results on HUD pages.

Write Path

  • API endpoint (torchci/pages/api/oot/results.ts): Receives {trusted, untrusted} payloads from the result Lambda (PR [WIP][CRCR] Initial implementation of L2 pytorch/test-infra#7967), validates auth (x-hud-internal-bot), enforces 2MB payload cap and daily budget, extracts/flattens fields, and writes to DynamoDB (torchci-oot-workflow-job)
  • ClickHouse schema (clickhouse_db_schema/default.oot_workflow_job/schema.sql): New table with OOT-specific columns (test counts, artifact URL, environment, relay-measured timing metrics)
  • Replicator mapping: Added torchci-oot-workflow-jobdefault.oot_workflow_job to clickhouse-replicator-dynamo Lambda

Read Path

  • Global OOT Summary (/oot): Table of all OOT backend repos sorted by pass rate, with avg duration and last run time
  • Per-Backend Dashboard (/oot/[org]/[repo]): Matrix view — rows = PyTorch PRs, columns = downstream CI jobs, color-coded status chips
  • PR View Integration: Collapsible "Out-of-Tree Backends" accordion on existing PR pages, showing OOT results when they exist

Files Changed

File Action
torchci/pages/api/oot/results.ts New — API endpoint
torchci/lib/oot/ootUtils.ts New — types, validation, extraction
clickhouse_db_schema/default.oot_workflow_job/schema.sql New — CH schema
aws/lambda/clickhouse-replicator-dynamo/lambda_function.py Edit — +1 line to SUPPORTED_TABLES
torchci/pages/oot/index.tsx New — global summary page
torchci/pages/oot/[org]/[repo].tsx New — per-backend dashboard
torchci/components/oot/OotPrSection.tsx New — PR view OOT section
torchci/pages/[repoOwner]/[repoName]/pull/[prNumber].tsx Edit — added OotPrSection
torchci/clickhouse_queries/oot_summary/* New — saved query
torchci/clickhouse_queries/oot_backend_dashboard/* New — saved query
torchci/clickhouse_queries/oot_pr_results/* New — saved query

Test plan

  • Verify API endpoint accepts valid {trusted, untrusted} payloads and writes to DynamoDB
  • Verify auth rejects requests without x-hud-internal-bot header
  • Verify payload > 2MB is rejected with 400
  • Verify daily budget enforcement (429 after limit)
  • Verify ClickHouse schema creates successfully
  • Verify replicator picks up new DynamoDB records and inserts into ClickHouse
  • Verify /oot page renders summary table with correct pass rates
  • Verify /oot/[org]/[repo] renders matrix view with correct status chips
  • Verify PR page shows OOT accordion when results exist, hides when empty

Authored with Claude.

…mapping, and frontend pages

Implements the HUD-side ingestion and display for Out-of-Tree CI results,
as described in the OOT HUD RFC V3. The relay (PR pytorch#7967) forwards
{trusted, untrusted} payloads to the new /api/oot/results endpoint, which
validates, extracts fields, and writes to DynamoDB. DynamoDB Streams
replicates to ClickHouse via the existing replicator Lambda. Three frontend
views display the results: a global OOT summary, a per-backend matrix
dashboard, and a collapsible section on PR pages.

Authored with Claude.
@subinz1 subinz1 marked this pull request as draft April 25, 2026 08:03
@subinz1 subinz1 changed the title [OOT HUD] Full pipeline: API endpoint, ClickHouse schema, replicator mapping, and frontend pages [WIP][OOT HUD] Full pipeline: API endpoint, ClickHouse schema, replicator mapping, and frontend pages Apr 25, 2026
subinz1 added a commit to subinz1/rfcs that referenced this pull request Apr 28, 2026
Defines the HUD-side ingestion and display layer for OOT CI results,
building on RFC-0050 (Cross-Repository CI Relay). Covers the complete
write path (Result Lambda → HUD API → DynamoDB → ClickHouse), three
frontend views (global summary, per-backend dashboard, PR integration),
storage schemas, DB protection (rate limits, payload caps, daily budgets),
and security design (OIDC, trusted/untrusted split, callback token proposal).

Reference implementation: subinz1/test-infra#1
subinz1 added a commit to subinz1/rfcs that referenced this pull request Apr 28, 2026
Rename from RFC-0051 to RFC-0001. Defines the HUD-side ingestion and
display layer for OOT CI results, building on the Cross-Repository CI
Relay. Covers write path, storage schemas, DB protection, security,
and three frontend views.

Reference implementation: subinz1/test-infra#1
Address @ZainRizvi's review on pytorch/rfcs#96:

- Auth: X-Hud-Internal-Bot → dedicated X-OOT-Relay-Token header
- Validation: removed schema validation from HUD (moved to relay)
- Removed daily budget enforcement
- DynamoDB: PutItem → UpdateItem to prevent null clobbering
- DynamoKey: expanded to {repo}/{delivery_id}/{workflow_name}/{job_name}/{run_attempt}
- Timestamps: use downstream-reported started_at/completed_at instead of now()
- Timing metrics: only set queue_time/execution_time when non-null
- ClickHouse schema: added job_name, run_attempt columns
- Queries: select job_name, run_attempt as proper columns
- Frontend: updated interfaces to include new fields
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant