Skip to content

ci: Add Docker setup for docker driver tests#810

Open
akshaylive wants to merge 7 commits into
mainfrom
akshaya/docker
Open

ci: Add Docker setup for docker driver tests#810
akshaylive wants to merge 7 commits into
mainfrom
akshaya/docker

Conversation

@akshaylive
Copy link
Copy Markdown
Collaborator

Summary

Add Docker environment setup to GitHub Actions smoke test workflow to support tasks using driver: docker sandbox configuration.

  • Verify Docker is available and running
  • Build coder-eval-agent Docker image required by coder_eval's docker driver
  • Temporarily tag the skill-agents-bindings-sync test to validate Docker setup before running full smoke suite

Changes

  • smoke-skills.yml: Add Docker verification and image build steps
  • bindings_sync.yaml: Add temporary tag for isolated testing

Test Plan

  • Verify workflow runs successfully with Docker image build
  • Confirm skill-agents-bindings-sync test passes with docker driver
  • Remove temporary tag and restore --tags smoke once validated

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 16, 2026

Claude finished @akshaylive's task in 1m 49s —— View job


PR Review: ci: Add Docker setup for docker driver tests

  • Read review criteria and project conventions
  • Read full diff and existing PR conversation
  • Read each changed file in full context
  • Perform review checks
  • Post review feedback

Summary

Adds Docker verification and coder-eval-agent image build steps to the smoke-skills workflow, switches bindings_sync to driver: docker, and adds env_passthrough_extra for SKILLS_REPO_PATH to the experiment defaults. Uses a temporary tag to isolate the Docker test while validating the setup.

Change-by-Change Review

1. .github/workflows/smoke-skills.yml — Docker setup steps

Severity: OK

The "Verify Docker is available" and "Build coder-eval Docker image" steps are well-placed (after Node setup, before test execution). The Docker verification step (docker run --rm hello-world > /dev/null) is a reasonable smoke check. Credentials for the Docker image build are correctly sourced from secrets.

2. .github/workflows/smoke-skills.yml--tags smoke--tags temporary

Severity: High

This changes the entire smoke suite from --tags smoke to --tags temporary. While the PR description explains this is intentional for isolated validation, merging this to main means all other smoke tests stop running in CI until reverted. This is a risky intermediate state:

  • If the revert is forgotten, CI provides zero smoke coverage for every other skill.
  • There's no tracking mechanism (no issue link, no TODO comment, no follow-up PR) to ensure the cleanup happens.

Recommendation: Consider one of these alternatives to avoid disabling all smoke tests:

  1. Run both tag sets: --tags smoke,temporary (if coder-eval supports OR logic), or add a second step that runs --tags temporary alongside the existing --tags smoke step.
  2. Add a clear # TODO(akshaylive): revert to --tags smoke after Docker validation — tracking issue #NNN comment so it's visible and tracked.
  3. Keep this change on a feature branch and only merge once validated, then submit the final PR with --tags smoke restored and temporary tag removed.

3. tests/experiments/default.yamlsandbox.docker.env_passthrough_extra

Severity: Low

Adding SKILLS_REPO_PATH to env_passthrough_extra under sandbox.docker is necessary for Docker containers to access the plugin path. This is a global default that applies to all tasks using driver: docker.

One minor note: this only takes effect for driver: docker tasks, so it won't affect existing driver: tempdir tasks. Clean and scoped.

4. tests/tasks/uipath-agents/bindings_sync.yaml — driver and tag changes

Severity: OK (with caveat)

  • driver: tempdirdriver: docker — core change, correct for the purpose.
  • Adding temporary tag — fine as a temporary measure.

Caveat: The temporary tag and the --tags temporary workflow change in item #2 are coupled — both must be reverted together. Consider adding a comment in the YAML noting this is temporary.

What's Missing

  • No tracking for the revert: The test plan says "Remove temporary tag and restore --tags smoke once validated" but there's no issue, TODO, or follow-up PR to track this. A merged --tags temporary on main silently disables all other smoke tests.
  • No validation that make docker-image exists: .coder_eval is a submodule — the Makefile target docker-image isn't verifiable from this diff. If it doesn't exist or has a different name, the workflow will fail at that step.

Area Ratings

Area Status Notes
Frontmatter N/A No skill files modified
E2E Tests OK Task YAML changes are valid
Skill Body N/A No skill files modified
References & Assets N/A No references modified
Repo Hygiene Issue --tags temporary disables all other smoke tests on main

Issues for Manual Review

  • Verify make docker-image target exists in .coder_eval/Makefile and produces the expected coder-eval-agent image name.
  • Confirm GitHub Actions runners have Docker available and sufficient disk space for image builds.
  • Ensure there's a plan to revert --tags temporary--tags smoke promptly after validation.

Conclusion

The Docker setup steps (verification + image build) and the env_passthrough_extra addition are clean and well-scoped. The main concern is the --tags smoke--tags temporary change — merging this to main disables all other smoke tests until reverted, with no tracking mechanism for the revert. Recommend either running both tag sets in parallel, or adding a tracked follow-up before merge.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 16, 2026

Claude finished @akshaylive's task in 1m 2s —— View job


Coder-eval task lint (advisory)

1 task YAML changed; verdicts: 0 Critical, 0 High, 1 Medium, 0 Low, 0 OK.

Rubric: .claude/commands/lint-task.md. This check is advisory and never blocks merge.

Evidence of passing run

High — PR body does not claim the changed task has been run and passed. The test plan mentions verifying the workflow and confirming the test passes, but as future TODOs (checkboxes are unchecked), not as evidence of a completed run. Please edit the PR description to add a line like: Ran skill-agents-bindings-sync locally and it passed.

Per-task lint

tests/tasks/uipath-agents/bindings_sync.yaml — verdict: Medium

The actual diff is small: tags gained temporary and driver changed from tempdir to docker. The task itself (prompt, criteria) is unchanged from main. Linting the full file against the rubric:

Issues:

  • [Medium] Prompt over-specification (lines 91-97): The prompt prescribes the exact procedure the bindings skill should follow — "1. Scan graph.py AND storage.py", "2. Resolve the BUCKET_NAME constant…", "3. Resolve the SCRAPER_PROCESS constant…", "4. Produce bindings for: process (data-scraper), bucket (invoice-data), asset (api-key)", "5. Bind all resources to the single entrypoint." This is a 5-step recipe that tells the agent how to sync rather than just what the goal is ("sync bindings.json so it reflects all resource calls"). An agent that follows these steps doesn't need the skill. However, steps 2-4 also serve as ground-truth anchors (naming the expected values the criteria check), which partially mitigates. Net: Medium.

Suggested fixes:

  • Consider condensing lines 91-97 to a goal statement: "Sync bindings.json so it reflects all resource calls found across ALL Python files (not just the main entry point). The skill should resolve module-level constants to their literal values." The criteria already validate the specific resource names and count — repeating them in the prompt is borderline but acceptable as ground-truth anchors; the procedural steps are the real concern.

Within-PR duplicates

  • No duplicate clusters detected (only 1 task YAML changed).

Conclusion

  • ⚠ 1 task has issues, max severity Medium (prompt over-specification). Additionally, no evidence of passing run was found in the PR body (High). Advisory only — not blocking merge.

akshaylive and others added 7 commits May 18, 2026 16:31
Update all tasks with 'smoke' tag to use driver: docker for consistent
Docker sandbox validation in CI. This includes 62 files with 'smoke' in
the filename plus 53 additional smoke tests identified by tag.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Add UIPATH_CLI_* environment variables to env_passthrough_extra so smoke tests
have pre-authenticated CLI access without hanging on interactive login prompts.
Also pass TRACES_SMOKE_PROCESS_KEY. This fixes timeouts in skill tests that
invoke uip commands requiring authentication.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant