[WIP][OOT HUD] Full pipeline: API endpoint, ClickHouse schema, replicator mapping, and frontend pages#1
Draft
[WIP][OOT HUD] Full pipeline: API endpoint, ClickHouse schema, replicator mapping, and frontend pages#1
Conversation
…mapping, and frontend pages Implements the HUD-side ingestion and display for Out-of-Tree CI results, as described in the OOT HUD RFC V3. The relay (PR pytorch#7967) forwards {trusted, untrusted} payloads to the new /api/oot/results endpoint, which validates, extracts fields, and writes to DynamoDB. DynamoDB Streams replicates to ClickHouse via the existing replicator Lambda. Three frontend views display the results: a global OOT summary, a per-backend matrix dashboard, and a collapsible section on PR pages. Authored with Claude.
subinz1
added a commit
to subinz1/rfcs
that referenced
this pull request
Apr 28, 2026
Defines the HUD-side ingestion and display layer for OOT CI results, building on RFC-0050 (Cross-Repository CI Relay). Covers the complete write path (Result Lambda → HUD API → DynamoDB → ClickHouse), three frontend views (global summary, per-backend dashboard, PR integration), storage schemas, DB protection (rate limits, payload caps, daily budgets), and security design (OIDC, trusted/untrusted split, callback token proposal). Reference implementation: subinz1/test-infra#1
subinz1
added a commit
to subinz1/rfcs
that referenced
this pull request
Apr 28, 2026
Rename from RFC-0051 to RFC-0001. Defines the HUD-side ingestion and display layer for OOT CI results, building on the Cross-Repository CI Relay. Covers write path, storage schemas, DB protection, security, and three frontend views. Reference implementation: subinz1/test-infra#1
Address @ZainRizvi's review on pytorch/rfcs#96: - Auth: X-Hud-Internal-Bot → dedicated X-OOT-Relay-Token header - Validation: removed schema validation from HUD (moved to relay) - Removed daily budget enforcement - DynamoDB: PutItem → UpdateItem to prevent null clobbering - DynamoKey: expanded to {repo}/{delivery_id}/{workflow_name}/{job_name}/{run_attempt} - Timestamps: use downstream-reported started_at/completed_at instead of now() - Timing metrics: only set queue_time/execution_time when non-null - ClickHouse schema: added job_name, run_attempt columns - Queries: select job_name, run_attempt as proper columns - Frontend: updated interfaces to include new fields
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements the HUD-side ingestion and display for Out-of-Tree CI results, as described in the OOT HUD RFC V3. This is the end-to-end pipeline: from receiving relay callbacks to displaying results on HUD pages.
Write Path
torchci/pages/api/oot/results.ts): Receives{trusted, untrusted}payloads from the result Lambda (PR [WIP][CRCR] Initial implementation of L2 pytorch/test-infra#7967), validates auth (x-hud-internal-bot), enforces 2MB payload cap and daily budget, extracts/flattens fields, and writes to DynamoDB (torchci-oot-workflow-job)clickhouse_db_schema/default.oot_workflow_job/schema.sql): New table with OOT-specific columns (test counts, artifact URL, environment, relay-measured timing metrics)torchci-oot-workflow-job→default.oot_workflow_jobtoclickhouse-replicator-dynamoLambdaRead Path
/oot): Table of all OOT backend repos sorted by pass rate, with avg duration and last run time/oot/[org]/[repo]): Matrix view — rows = PyTorch PRs, columns = downstream CI jobs, color-coded status chipsFiles Changed
torchci/pages/api/oot/results.tstorchci/lib/oot/ootUtils.tsclickhouse_db_schema/default.oot_workflow_job/schema.sqlaws/lambda/clickhouse-replicator-dynamo/lambda_function.pytorchci/pages/oot/index.tsxtorchci/pages/oot/[org]/[repo].tsxtorchci/components/oot/OotPrSection.tsxtorchci/pages/[repoOwner]/[repoName]/pull/[prNumber].tsxtorchci/clickhouse_queries/oot_summary/*torchci/clickhouse_queries/oot_backend_dashboard/*torchci/clickhouse_queries/oot_pr_results/*Test plan
{trusted, untrusted}payloads and writes to DynamoDBx-hud-internal-botheader/ootpage renders summary table with correct pass rates/oot/[org]/[repo]renders matrix view with correct status chipsAuthored with Claude.