Skip to content

feat: add autonomous E2E CI failure fix workflow with skills and Playwright agents#4516

Draft
zdrapela wants to merge 11 commits intoredhat-developer:mainfrom
zdrapela:skill-agent-e2e-fix
Draft

feat: add autonomous E2E CI failure fix workflow with skills and Playwright agents#4516
zdrapela wants to merge 11 commits intoredhat-developer:mainfrom
zdrapela:skill-agent-e2e-fix

Conversation

@zdrapela
Copy link
Copy Markdown
Member

Summary

  • Add 7 AI agent skills for autonomous E2E CI failure investigation and fix workflow
  • Add /fix-e2e command that orchestrates the full lifecycle: parse failure, create branch, deploy RHDH, reproduce, diagnose/fix with Playwright agents, verify, and submit PR with Qodo review
  • Initialize Playwright Test Agents (healer/generator/planner) with MCP server for live browser interaction
  • All skills managed via rulesync and synced to OpenCode, Claude Code, and Cursor

Skills

Skill Purpose
parse-ci-failure Parse Prow URL or Jira ticket/URL to extract failure details
setup-fix-branch Create branch from correct upstream release branch
deploy-rhdh Deploy RHDH via local-run.sh with error recovery
reproduce-failure Run failing test, classify as consistent/flaky/unreproducible
diagnose-and-fix Root cause analysis + fix using Playwright healer agent
verify-fix Multi-run stability check + code quality validation
submit-and-review Create cross-fork PR, trigger Qodo review, monitor CI

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Mar 31, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@github-actions
Copy link
Copy Markdown
Contributor

The container image build workflow finished with status: cancelled.

@github-actions
Copy link
Copy Markdown
Contributor

Image was built and published successfully. It is available at:

zdrapela and others added 4 commits April 1, 2026 08:53
…wright agents

Add a 7-phase skill-based workflow for autonomously investigating and fixing
failing E2E CI tests. Includes /fix-e2e command, Playwright Test Agent
definitions (healer/generator/planner), and supporting rules.

Skills: parse-ci-failure, setup-fix-branch, deploy-rhdh, reproduce-failure,
diagnose-and-fix, verify-fix, submit-and-review.

Managed via rulesync (skills feature added) and synced to OpenCode, Claude
Code, and Cursor.
- Use full Prow CI job name for -j flag (not shortened names)
- Add required -r flag for CLI mode (prevents interactive mode)
- Add image repo mapping table per release branch

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The rulesync:generate hook reverted the parse-ci-failure changes
in the previous commit. Re-applying to .rulesync source so the
hook propagates correctly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…bles

Skills now reference the e2e-fix-workflow rule for mapping tables
(job→branch, job→platform, branch→image repo/tag) instead of
duplicating them. This reduces token usage by ~40% while keeping
the unique procedural value each skill provides:

- parse-ci-failure: structured output template with derivation
  details, GCS URLs, and flag breakdown table
- deploy-rhdh: concrete example command, prominent CLI mode warning
- diagnose-and-fix: "Try Healer Agent First" callout, removed
  duplicated coding conventions (~80 lines)
- reproduce-failure: removed duplicated project reference table
- submit-and-review: prominent dynamic GITHUB_USER extraction,
  removed CI check types table

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@zdrapela zdrapela force-pushed the skill-agent-e2e-fix branch from e8cd40c to d237b44 Compare April 1, 2026 06:56
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

The container image build workflow finished with status: cancelled.

…it hooks

- reproduce-failure: ask user for approval when bug cannot be reproduced
- verify-fix: ask user for approval when verification cannot be run
- submit-and-review: always create draft PRs (--draft flag), add
  Step 0 to resolve pre-commit hooks with yarn install
- fix-e2e command: reflect all changes in orchestrator
- e2e-fix-workflow rule: add pre-commit hooks and draft PR sections

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

Image was built and published successfully. It is available at:

- Phase 3: verify cluster connectivity before deployment, ask user
  before skipping if no cluster available
- Phase 4: ask user before skipping reproduction if no deployment
- Rule: add critical rule that no phase may be skipped silently
- Submit-and-review: add CI job triggering step after PR creation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

Image was built and published successfully. It is available at:

The affected presubmit CI job should be triggered after addressing
Qodo review feedback, not before. This avoids wasting CI resources
on code that may still change from review comments.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

The container image build workflow finished with status: cancelled.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

Image was built and published successfully. It is available at:

…loyments

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

Image was built and published successfully. It is available at:

Never guess presubmit job names — always comment /test ?, wait for the
openshift-ci bot response, and only trigger jobs from that list.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

Image was built and published successfully. It is available at:

zdrapela added 2 commits April 2, 2026 11:26
…al workspace

- diagnose-and-fix: make healer agent mandatory for ALL failure categories,
  add initialization and .env setup instructions, only supplement with
  manual diagnosis for data/platform/deployment/product-bug issues
- reproduce-failure: use healer agent for test reproduction with richer
  diagnostics, add initialization and .env instructions, keep direct
  execution as fallback
- verify-fix: use healer agent for single-run verification and for
  debugging stability check failures
- Delete fix-e2e-skills-workspace directory (eval artifacts)
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

The container image build workflow finished with status: cancelled.

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Apr 2, 2026

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

Image was built and published successfully. It is available at:

@@ -0,0 +1,22 @@
import { test, expect } from "@playwright/test";
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file sits in playwright/ alongside real tests and no project in projects.json explicitly includes or excludes it. It'll get picked up by any project with a broad testMatch glob. If it's scaffolding for the Playwright Test Generator agent, should it live somewhere outside the test directory, or be excluded in playwright.config.ts?

env | grep -E "^(BASE_URL|K8S_CLUSTER|CONTAINER_PLATFORM|IS_OPENSHIFT|JOB_NAME|IMAGE_|TAG_NAME|NAME_SPACE|GITHUB_APP|...)" > .env
```

The `.env` file is already in `.gitignore` — never commit it.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This says .env is already in .gitignore, but e2e-tests/.gitignore doesn't have a .env entry. The instructions above dump Vault secrets, K8S tokens, and GitHub app credentials into that file — needs an actual gitignore entry to back up this claim.

"rules",
"commands"
"commands",
"skills"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the rulesync CI check (rulesync-check.yaml) validated to work with the skills feature? This is the first time it's being added. Worth confirming the check handles skills sync the same way it handles rules and commands.

```bash
cd e2e-tests
source local-test-setup.sh <showcase|rbac>
env | grep -E "^(BASE_URL|K8S_CLUSTER|CONTAINER_PLATFORM|IS_OPENSHIFT|JOB_NAME|IMAGE_|TAG_NAME|NAME_SPACE|GITHUB_APP|...)" > .env
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That trailing |... in the grep regex will match any three characters, not just serve as a placeholder. If someone copy-pastes this, it'll capture unintended variables. Either list the actual env var prefixes exhaustively or use a different approach (like env > .env with a note to review).


See https://playwright.dev/docs/test-agents for the full list of supported tools and options.

This creates configuration files with the Playwright MCP server and agent definitions. The generated files are local tooling — do NOT commit them.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This says "do NOT commit them", but the PR commits e2e-tests/.mcp.json, e2e-tests/opencode.json, and all the agent/prompt markdown files in e2e-tests/.claude/agents/ and e2e-tests/.opencode/prompts/ — which are exactly what npx playwright init-agents generates. If these are meant to be checked in as the project's canonical agent configs, this note needs updating. If they shouldn't be committed, they shouldn't be in this PR.

}
},
"tools": {
"playwright*": false
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This silently disables all Playwright MCP tools outside agent contexts. If a developer tries to use them directly (without going through a subagent), they'll get no feedback about why they're unavailable. Worth a comment in the file or the skill docs explaining this design choice.

```bash
# Make the change locally
# Then commit and push
git add -A
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git add -A can accidentally stage the .env file from the earlier step (especially since it's not actually gitignored — see other comment on diagnose-and-fix). Should use git add <specific-files> instead, which is also what the project's own commit conventions recommend.


1. **Mark the test as `test.fixme()`** with a descriptive comment:
```typescript
test.fixme('RHIDP-XXXX: Button no longer visible after version upgrade');
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Playwright's test.fixme() signature is test.fixme(condition?, description?). This string-only form works (the string is truthy) but it's non-standard and could confuse contributors checking the Playwright docs. The canonical form would be test.fixme(true, 'RHIDP-XXXX: description').

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants