feat: add autonomous E2E CI failure fix workflow with skills and Playwright agents by zdrapela · Pull Request #4516 · redhat-developer/rhdh

zdrapela · 2026-03-31T15:52:05Z

Summary

Add 7 AI agent skills for autonomous E2E CI failure investigation and fix workflow
Add /fix-e2e command that orchestrates the full lifecycle: parse failure, create branch, deploy RHDH, reproduce, diagnose/fix with Playwright agents, verify, and submit PR with Qodo review
Initialize Playwright Test Agents (healer/generator/planner) with MCP server for live browser interaction
All skills managed via rulesync and synced to OpenCode, Claude Code, and Cursor

Skills

Skill	Purpose
`parse-ci-failure`	Parse Prow URL or Jira ticket/URL to extract failure details
`setup-fix-branch`	Create branch from correct upstream release branch
`deploy-rhdh`	Deploy RHDH via local-run.sh with error recovery
`reproduce-failure`	Run failing test, classify as consistent/flaky/unreproducible
`diagnose-and-fix`	Root cause analysis + fix using Playwright healer agent
`verify-fix`	Multi-run stability check + code quality validation
`submit-and-review`	Create cross-fork PR, trigger Qodo review, monitor CI

openshift-ci · 2026-03-31T15:52:09Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

github-actions · 2026-03-31T16:16:53Z

The container image build workflow finished with status: cancelled.

github-actions · 2026-03-31T16:56:46Z

Image was built and published successfully. It is available at:

…wright agents Add a 7-phase skill-based workflow for autonomously investigating and fixing failing E2E CI tests. Includes /fix-e2e command, Playwright Test Agent definitions (healer/generator/planner), and supporting rules. Skills: parse-ci-failure, setup-fix-branch, deploy-rhdh, reproduce-failure, diagnose-and-fix, verify-fix, submit-and-review. Managed via rulesync (skills feature added) and synced to OpenCode, Claude Code, and Cursor.

- Use full Prow CI job name for -j flag (not shortened names) - Add required -r flag for CLI mode (prevents interactive mode) - Add image repo mapping table per release branch Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The rulesync:generate hook reverted the parse-ci-failure changes in the previous commit. Re-applying to .rulesync source so the hook propagates correctly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…bles Skills now reference the e2e-fix-workflow rule for mapping tables (job→branch, job→platform, branch→image repo/tag) instead of duplicating them. This reduces token usage by ~40% while keeping the unique procedural value each skill provides: - parse-ci-failure: structured output template with derivation details, GCS URLs, and flag breakdown table - deploy-rhdh: concrete example command, prominent CLI mode warning - diagnose-and-fix: "Try Healer Agent First" callout, removed duplicated coding conventions (~80 lines) - reproduce-failure: removed duplicated project reference table - submit-and-review: prominent dynamic GITHUB_USER extraction, removed CI check types table Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-04-01T06:57:03Z

The container image build workflow finished with status: cancelled.

…it hooks - reproduce-failure: ask user for approval when bug cannot be reproduced - verify-fix: ask user for approval when verification cannot be run - submit-and-review: always create draft PRs (--draft flag), add Step 0 to resolve pre-commit hooks with yarn install - fix-e2e command: reflect all changes in orchestrator - e2e-fix-workflow rule: add pre-commit hooks and draft PR sections Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-04-01T07:37:38Z

Image was built and published successfully. It is available at:

- Phase 3: verify cluster connectivity before deployment, ask user before skipping if no cluster available - Phase 4: ask user before skipping reproduction if no deployment - Rule: add critical rule that no phase may be skipped silently - Submit-and-review: add CI job triggering step after PR creation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-04-01T08:12:34Z

Image was built and published successfully. It is available at:

The affected presubmit CI job should be triggered after addressing Qodo review feedback, not before. This avoids wasting CI resources on code that may still change from review comments. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-04-01T08:19:09Z

The container image build workflow finished with status: cancelled.

github-actions · 2026-04-01T08:57:45Z

Image was built and published successfully. It is available at:

…loyments Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-04-01T14:33:26Z

Image was built and published successfully. It is available at:

Never guess presubmit job names — always comment /test ?, wait for the openshift-ci bot response, and only trigger jobs from that list. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-04-02T06:42:54Z

Image was built and published successfully. It is available at:

…al workspace - diagnose-and-fix: make healer agent mandatory for ALL failure categories, add initialization and .env setup instructions, only supplement with manual diagnosis for data/platform/deployment/product-bug issues - reproduce-failure: use healer agent for test reproduction with richer diagnostics, add initialization and .env instructions, keep direct execution as fallback - verify-fix: use healer agent for single-run verification and for debugging stability check failures - Delete fix-e2e-skills-workspace directory (eval artifacts)

… init

github-actions · 2026-04-02T09:27:34Z

The container image build workflow finished with status: cancelled.

sonarqubecloud · 2026-04-02T09:28:02Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

github-actions · 2026-04-02T10:06:54Z

Image was built and published successfully. It is available at:

gustavolira · 2026-04-06T19:14:06Z

e2e-tests/playwright/seed.spec.ts

@@ -0,0 +1,22 @@
+import { test, expect } from "@playwright/test";


This file sits in playwright/ alongside real tests and no project in projects.json explicitly includes or excludes it. It'll get picked up by any project with a broad testMatch glob. If it's scaffolding for the Playwright Test Generator agent, should it live somewhere outside the test directory, or be excluded in playwright.config.ts?

gustavolira · 2026-04-06T19:14:06Z

.rulesync/skills/diagnose-and-fix/SKILL.md

+env | grep -E "^(BASE_URL|K8S_CLUSTER|CONTAINER_PLATFORM|IS_OPENSHIFT|JOB_NAME|IMAGE_|TAG_NAME|NAME_SPACE|GITHUB_APP|...)" > .env
+```
+
+The `.env` file is already in `.gitignore` — never commit it.


This says .env is already in .gitignore, but e2e-tests/.gitignore doesn't have a .env entry. The instructions above dump Vault secrets, K8S tokens, and GitHub app credentials into that file — needs an actual gitignore entry to back up this claim.

gustavolira · 2026-04-06T19:14:06Z

rulesync.jsonc

    "rules",
-    "commands"
+    "commands",
+    "skills"


Is the rulesync CI check (rulesync-check.yaml) validated to work with the skills feature? This is the first time it's being added. Worth confirming the check handles skills sync the same way it handles rules and commands.

gustavolira · 2026-04-06T19:14:06Z

.rulesync/skills/diagnose-and-fix/SKILL.md

+```bash
+cd e2e-tests
+source local-test-setup.sh <showcase|rbac>
+env | grep -E "^(BASE_URL|K8S_CLUSTER|CONTAINER_PLATFORM|IS_OPENSHIFT|JOB_NAME|IMAGE_|TAG_NAME|NAME_SPACE|GITHUB_APP|...)" > .env


That trailing |... in the grep regex will match any three characters, not just serve as a placeholder. If someone copy-pastes this, it'll capture unintended variables. Either list the actual env var prefixes exhaustively or use a different approach (like env > .env with a note to review).

gustavolira · 2026-04-06T19:14:06Z

.rulesync/skills/diagnose-and-fix/SKILL.md

+
+See https://playwright.dev/docs/test-agents for the full list of supported tools and options.
+
+This creates configuration files with the Playwright MCP server and agent definitions. The generated files are local tooling — do NOT commit them.


This says "do NOT commit them", but the PR commits e2e-tests/.mcp.json, e2e-tests/opencode.json, and all the agent/prompt markdown files in e2e-tests/.claude/agents/ and e2e-tests/.opencode/prompts/ — which are exactly what npx playwright init-agents generates. If these are meant to be checked in as the project's canonical agent configs, this note needs updating. If they shouldn't be committed, they shouldn't be in this PR.

gustavolira · 2026-04-06T19:14:06Z

e2e-tests/opencode.json

+    }
+  },
+  "tools": {
+    "playwright*": false


This silently disables all Playwright MCP tools outside agent contexts. If a developer tries to use them directly (without going through a subagent), they'll get no feedback about why they're unavailable. Worth a comment in the file or the skill docs explaining this design choice.

gustavolira · 2026-04-06T19:14:06Z

.rulesync/skills/submit-and-review/SKILL.md

+   ```bash
+   # Make the change locally
+   # Then commit and push
+   git add -A


git add -A can accidentally stage the .env file from the earlier step (especially since it's not actually gitignored — see other comment on diagnose-and-fix). Should use git add <specific-files> instead, which is also what the project's own commit conventions recommend.

gustavolira · 2026-04-06T19:14:06Z

.rulesync/skills/diagnose-and-fix/SKILL.md

+
+1. **Mark the test as `test.fixme()`** with a descriptive comment:
+   ```typescript
+   test.fixme('RHIDP-XXXX: Button no longer visible after version upgrade');


Playwright's test.fixme() signature is test.fixme(condition?, description?). This string-only form works (the string is truthy) but it's non-standard and could confuse contributors checking the Playwright docs. The canonical form would be test.fixme(true, 'RHIDP-XXXX: description').

openshift-ci bot added the do-not-merge/work-in-progress label Mar 31, 2026

zdrapela and others added 4 commits April 1, 2026 08:53

fix: correct parse-ci-failure local-run.sh job name mapping

3f7ddc4

The rulesync:generate hook reverted the parse-ci-failure changes in the previous commit. Re-applying to .rulesync source so the hook propagates correctly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

zdrapela force-pushed the skill-agent-e2e-fix branch from e8cd40c to d237b44 Compare April 1, 2026 06:56

fix(skills): add deployment execution rules to prevent concurrent dep…

ceb2dd5

…loyments Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(skills): enforce /test ? workflow before triggering CI jobs

6f5826b

Never guess presubmit job names — always comment /test ?, wait for the openshift-ci bot response, and only trigger jobs from that list. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

zdrapela added 2 commits April 2, 2026 11:26

chore: add Claude Code loop option and Playwright docs link to healer…

7d8ebfd

… init

gustavolira reviewed Apr 6, 2026

View reviewed changes

		@@ -0,0 +1,22 @@
		import { test, expect } from "@playwright/test";


		See https://playwright.dev/docs/test-agents for the full list of supported tools and options.

		This creates configuration files with the Playwright MCP server and agent definitions. The generated files are local tooling — do NOT commit them.

Conversation

zdrapela commented Mar 31, 2026

Summary

Skills

Uh oh!

openshift-ci bot commented Mar 31, 2026

Uh oh!

github-actions bot commented Mar 31, 2026

Uh oh!

github-actions bot commented Mar 31, 2026

Uh oh!

github-actions bot commented Apr 1, 2026

Uh oh!

github-actions bot commented Apr 1, 2026

Uh oh!

github-actions bot commented Apr 1, 2026

Uh oh!

github-actions bot commented Apr 1, 2026

Uh oh!

github-actions bot commented Apr 1, 2026

Uh oh!

github-actions bot commented Apr 1, 2026

Uh oh!

github-actions bot commented Apr 2, 2026

Uh oh!

github-actions bot commented Apr 2, 2026

Uh oh!

sonarqubecloud bot commented Apr 2, 2026

Quality Gate passed

Uh oh!

github-actions bot commented Apr 2, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants