[WIP] RFC-0001: HUD Integration for Out-of-Tree CI Results#1
Draft
[WIP] RFC-0001: HUD Integration for Out-of-Tree CI Results#1
Conversation
Defines the HUD-side ingestion and display layer for OOT CI results, building on RFC-0050 (Cross-Repository CI Relay). Covers the complete write path (Result Lambda → HUD API → DynamoDB → ClickHouse), three frontend views (global summary, per-backend dashboard, PR integration), storage schemas, DB protection (rate limits, payload caps, daily budgets), and security design (OIDC, trusted/untrusted split, callback token proposal). Reference implementation: subinz1/test-infra#1
c94be0f to
d57ee39
Compare
Rename from RFC-0051 to RFC-0001. Defines the HUD-side ingestion and display layer for OOT CI results, building on the Cross-Repository CI Relay. Covers write path, storage schemas, DB protection, security, and three frontend views. Reference implementation: subinz1/test-infra#1
…WS refs - Artifact URLs now flow through Result Handler (not directly to HUD) - Removed daily budget enforcement - Split implementation plan into 6 clearly defined phases with task tables - Removed AWS/Vercel/Terraform/IAM-specific references throughout - Clarified that only completed records are replicated to ClickHouse (in_progress stays in DynamoDB only for mutable state tracking)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This RFC defines the HUD-side ingestion and display layer for Out-of-Tree (OOT) CI results, building on RFC-0050 (Cross-Repository CI Relay for PyTorch Out-of-Tree Backends).
Data Flow
flowchart LR subgraph Downstream["Downstream CI (OOT Backend)"] DS["Run tests\n+ upload artifacts"] end subgraph ART["Artifact Storage (org-managed)"] STORE[("Logs, test reports,\nJUnit XML")] end subgraph Relay["Relay Server"] RH["Result Handler\n• OIDC verify\n• Allowlist check\n• Rate limit"] end subgraph HUD["HUD"] API["/api/oot/results\n• Auth check\n• Payload validation\n• Payload caps (2MB)"] end subgraph Storage["Storage"] DDB[("DynamoDB\ntorchci-oot-workflow-job\n(in_progress + completed)")] STR["DynamoDB Stream"] REP["clickhouse-replicator-dynamo"] CH[("ClickHouse\ndefault.oot_workflow_job\n(completed only)")] end subgraph Frontend["HUD Frontend"] P1["/oot — Global Summary"] P2["/oot/org/repo — Per-Backend"] P3["/pr/N — OOT Section"] end DS -->|"Upload artifacts"| STORE DS -->|"① POST in_progress\n② POST completed\n+ artifact_url\n(OIDC token)"| RH RH -->|"X-Hud-Internal-Bot\n{trusted, untrusted}"| API API -->|"PutItem"| DDB DDB --> STR --> REP -->|"completed only"| CH CH -->|"Query results +\nartifact_url"| P1 & P2 & P3 P2 & P3 -.->|"User clicks\nexternal link"| STOREKey points:
completedcallback payload and flow through the Result Handler → HUD API → DynamoDB → ClickHouseartifact_urlfrom ClickHouse and render it as an external link — no direct connection between HUD and downstream storagecompletedrecords are replicated to ClickHouse;in_progressstays in DynamoDB for mutable state trackingWhat this RFC covers
in_progresscallbacks → DynamoDB only (mutable state tracking)completedcallbacks → DynamoDB → replicated to ClickHouse (dashboard queries)/oot— Global OOT CI summary (cross-repo health overview, repos sorted by pass rate)/oot/[org]/[repo]— Per-backend dashboard (matrix view: PRs × jobs, failure drill-down, external artifact links)/pr/[number]— Collapsible "Out-of-Tree Backends" section in existing PR pagesReference implementation
A working reference implementation is available at subinz1/test-infra#1, which includes the API endpoint, ClickHouse schema, replicator mapping, saved ClickHouse queries, and all three frontend pages.
Status
This is a WIP draft. Feedback welcome.