ci: Add Docker setup for docker driver tests by akshaylive · Pull Request #810 · UiPath/skills

akshaylive · 2026-05-16T17:04:22Z

Summary

Add Docker environment setup to GitHub Actions smoke test workflow to support tasks using driver: docker sandbox configuration.

Verify Docker is available and running
Build coder-eval-agent Docker image required by coder_eval's docker driver
Temporarily tag the skill-agents-bindings-sync test to validate Docker setup before running full smoke suite

Changes

smoke-skills.yml: Add Docker verification and image build steps
bindings_sync.yaml: Add temporary tag for isolated testing

Test Plan

Verify workflow runs successfully with Docker image build
Confirm skill-agents-bindings-sync test passes with docker driver
Remove temporary tag and restore --tags smoke once validated

🤖 Generated with Claude Code

github-actions · 2026-05-16T17:04:35Z

Claude finished @akshaylive's task in 1m 49s —— View job

PR Review: ci: Add Docker setup for docker driver tests

Read review criteria and project conventions
Read full diff and existing PR conversation
Read each changed file in full context
Perform review checks
Post review feedback

Summary

Adds Docker verification and coder-eval-agent image build steps to the smoke-skills workflow, switches bindings_sync to driver: docker, and adds env_passthrough_extra for SKILLS_REPO_PATH to the experiment defaults. Uses a temporary tag to isolate the Docker test while validating the setup.

Change-by-Change Review

1. `.github/workflows/smoke-skills.yml` — Docker setup steps

Severity: OK

The "Verify Docker is available" and "Build coder-eval Docker image" steps are well-placed (after Node setup, before test execution). The Docker verification step (docker run --rm hello-world > /dev/null) is a reasonable smoke check. Credentials for the Docker image build are correctly sourced from secrets.

2. `.github/workflows/smoke-skills.yml` — `--tags smoke` → `--tags temporary`

Severity: High

This changes the entire smoke suite from --tags smoke to --tags temporary. While the PR description explains this is intentional for isolated validation, merging this to main means all other smoke tests stop running in CI until reverted. This is a risky intermediate state:

If the revert is forgotten, CI provides zero smoke coverage for every other skill.
There's no tracking mechanism (no issue link, no TODO comment, no follow-up PR) to ensure the cleanup happens.

Recommendation: Consider one of these alternatives to avoid disabling all smoke tests:

Run both tag sets: --tags smoke,temporary (if coder-eval supports OR logic), or add a second step that runs --tags temporary alongside the existing --tags smoke step.
Add a clear # TODO(akshaylive): revert to --tags smoke after Docker validation — tracking issue #NNN comment so it's visible and tracked.
Keep this change on a feature branch and only merge once validated, then submit the final PR with --tags smoke restored and temporary tag removed.

3. `tests/experiments/default.yaml` — `sandbox.docker.env_passthrough_extra`

Severity: Low

Adding SKILLS_REPO_PATH to env_passthrough_extra under sandbox.docker is necessary for Docker containers to access the plugin path. This is a global default that applies to all tasks using driver: docker.

One minor note: this only takes effect for driver: docker tasks, so it won't affect existing driver: tempdir tasks. Clean and scoped.

4. `tests/tasks/uipath-agents/bindings_sync.yaml` — driver and tag changes

Severity: OK (with caveat)

driver: tempdir → driver: docker — core change, correct for the purpose.
Adding temporary tag — fine as a temporary measure.

Caveat: The temporary tag and the --tags temporary workflow change in item #2 are coupled — both must be reverted together. Consider adding a comment in the YAML noting this is temporary.

What's Missing

No tracking for the revert: The test plan says "Remove temporary tag and restore --tags smoke once validated" but there's no issue, TODO, or follow-up PR to track this. A merged --tags temporary on main silently disables all other smoke tests.
No validation that make docker-image exists: .coder_eval is a submodule — the Makefile target docker-image isn't verifiable from this diff. If it doesn't exist or has a different name, the workflow will fail at that step.

Area Ratings

Area	Status	Notes
Frontmatter	N/A	No skill files modified
E2E Tests	OK	Task YAML changes are valid
Skill Body	N/A	No skill files modified
References & Assets	N/A	No references modified
Repo Hygiene	Issue	`--tags temporary` disables all other smoke tests on `main`

Issues for Manual Review

Verify make docker-image target exists in .coder_eval/Makefile and produces the expected coder-eval-agent image name.
Confirm GitHub Actions runners have Docker available and sufficient disk space for image builds.
Ensure there's a plan to revert --tags temporary → --tags smoke promptly after validation.

Conclusion

The Docker setup steps (verification + image build) and the env_passthrough_extra addition are clean and well-scoped. The main concern is the --tags smoke → --tags temporary change — merging this to main disables all other smoke tests until reverted, with no tracking mechanism for the revert. Recommend either running both tag sets in parallel, or adding a tracked follow-up before merge.

github-actions · 2026-05-16T17:04:39Z

Claude finished @akshaylive's task in 1m 2s —— View job

Coder-eval task lint (advisory)

1 task YAML changed; verdicts: 0 Critical, 0 High, 1 Medium, 0 Low, 0 OK.

Rubric: .claude/commands/lint-task.md. This check is advisory and never blocks merge.

Evidence of passing run

❌ High — PR body does not claim the changed task has been run and passed. The test plan mentions verifying the workflow and confirming the test passes, but as future TODOs (checkboxes are unchecked), not as evidence of a completed run. Please edit the PR description to add a line like: Ran skill-agents-bindings-sync locally and it passed.

Per-task lint

`tests/tasks/uipath-agents/bindings_sync.yaml` — verdict: Medium

The actual diff is small: tags gained temporary and driver changed from tempdir to docker. The task itself (prompt, criteria) is unchanged from main. Linting the full file against the rubric:

Issues:

[Medium] Prompt over-specification (lines 91-97): The prompt prescribes the exact procedure the bindings skill should follow — "1. Scan graph.py AND storage.py", "2. Resolve the BUCKET_NAME constant…", "3. Resolve the SCRAPER_PROCESS constant…", "4. Produce bindings for: process (data-scraper), bucket (invoice-data), asset (api-key)", "5. Bind all resources to the single entrypoint." This is a 5-step recipe that tells the agent how to sync rather than just what the goal is ("sync bindings.json so it reflects all resource calls"). An agent that follows these steps doesn't need the skill. However, steps 2-4 also serve as ground-truth anchors (naming the expected values the criteria check), which partially mitigates. Net: Medium.

Suggested fixes:

Consider condensing lines 91-97 to a goal statement: "Sync bindings.json so it reflects all resource calls found across ALL Python files (not just the main entry point). The skill should resolve module-level constants to their literal values." The criteria already validate the specific resource names and count — repeating them in the prompt is borderline but acceptable as ground-truth anchors; the procedural steps are the real concern.

Within-PR duplicates

No duplicate clusters detected (only 1 task YAML changed).

Conclusion

⚠ 1 task has issues, max severity Medium (prompt over-specification). Additionally, no evidence of passing run was found in the PR body (High). Advisory only — not blocking merge.

…task and experiment defaults

Update all tasks with 'smoke' tag to use driver: docker for consistent Docker sandbox validation in CI. This includes 62 files with 'smoke' in the filename plus 53 additional smoke tests identified by tag. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

Add UIPATH_CLI_* environment variables to env_passthrough_extra so smoke tests have pre-authenticated CLI access without hanging on interactive login prompts. Also pass TRACES_SMOKE_PROCESS_KEY. This fixes timeouts in skill tests that invoke uip commands requiring authentication. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

akshaylive requested review from AlexandruCGhimisi, AlvinStanescu, RaduAna-Maria, adrian-badea, al3xanndru, andreibalas-uipath, cosmyo, dmetzgar, gabrielavaduva, gozhang2, marius-bughiu, rockymadden and uipreliga as code owners May 16, 2026 17:04

akshaylive force-pushed the akshaya/docker branch from 047b879 to c676671 Compare May 17, 2026 15:21

akshaylive requested review from CalinMPopa, IsabellaCapriottiUIPath, ZuerWang99, ashleyetheridge, bansal-anushree, chandhanshanth, dushyant-uipath, jianjunwang2, litheon, sriramva-uipath, t-hsia, uipathswapnil and yadvender-uipath as code owners May 18, 2026 15:25

akshaylive force-pushed the akshaya/docker branch from 41b741d to f33ff06 Compare May 18, 2026 15:32

akshaylive requested review from amoluipath, andrei-uipath, bai-uipath, baishalighosh, cezara98t, chandusailella, jiyangzh, misupantea-uipath, nikhil-maryala, paul-ciobanu, tmatup and vaishalisharma-uipath as code owners May 18, 2026 15:32

akshaylive and others added 7 commits May 18, 2026 16:31

feat: add docker sandbox with env_passthrough_extra to bindings_sync …

1becfd1

…task and experiment defaults

feat: add docker sandbox with env_passthrough_extra to bindings_sync …

bb35f84

…task and experiment defaults

ci: include workflow changes in smoke test detection logic

e93fb65

ci: revert to --tags smoke for full test suite

3fd35ab

fix: pass bedrock model and task parallelism to container

275fb5c

akshaylive force-pushed the akshaya/docker branch from a56e1ec to e66cffd Compare May 18, 2026 23:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: Add Docker setup for docker driver tests#810

ci: Add Docker setup for docker driver tests#810
akshaylive wants to merge 7 commits into
mainfrom
akshaya/docker

akshaylive commented May 16, 2026

Uh oh!

github-actions Bot commented May 16, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

akshaylive commented May 16, 2026

Summary

Changes

Test Plan

Uh oh!

github-actions Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: ci: Add Docker setup for docker driver tests

Summary

Change-by-Change Review

1. .github/workflows/smoke-skills.yml — Docker setup steps

2. .github/workflows/smoke-skills.yml — --tags smoke → --tags temporary

3. tests/experiments/default.yaml — sandbox.docker.env_passthrough_extra

4. tests/tasks/uipath-agents/bindings_sync.yaml — driver and tag changes

What's Missing

Area Ratings

Issues for Manual Review

Conclusion

Uh oh!

github-actions Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coder-eval task lint (advisory)

Evidence of passing run

Per-task lint

tests/tasks/uipath-agents/bindings_sync.yaml — verdict: Medium

Within-PR duplicates

Conclusion

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented May 16, 2026 •

edited

Loading

1. `.github/workflows/smoke-skills.yml` — Docker setup steps

2. `.github/workflows/smoke-skills.yml` — `--tags smoke` → `--tags temporary`

3. `tests/experiments/default.yaml` — `sandbox.docker.env_passthrough_extra`

4. `tests/tasks/uipath-agents/bindings_sync.yaml` — driver and tag changes

github-actions Bot commented May 16, 2026 •

edited

Loading

`tests/tasks/uipath-agents/bindings_sync.yaml` — verdict: Medium