[TMHUB-31803] docs(uipath-test): align skill + smoke tests with @uipath/test-manager-tool v1.0 CLI surface by ganeshborle · Pull Request #730 · UiPath/skills

ganeshborle · 2026-05-13T10:35:39Z

Summary

Rewrite skills/uipath-test/SKILL.md so every uip tm command table matches the release/v1.0 branch of UiPath/cli (packages/test-manager-tool v1.0.1). Plural top-level groups (testcases, testsets, executions), new rows for testcases run/add/remove, testsets run, executions get-stats/run/list-filtered, executions testcaselogs list (nested), testcaselog start/finish, teststeplog list, user get. Optional flags surfaced on report get, attachment download, result download, wait. Anti-patterns now flag the v0.9 → v1.0 renames and moved verbs.
Align command-pattern regexes in the three task YAMLs under tests/tasks/uipath-test/ to the same plural surface (auth_project_discovery.yaml unchanged; only testset_hierarchy_discovery.yaml and report_generation.yaml touched).

Out of scope

requirement group (only on cli/main, v1.1) — explicitly excluded so the skill stays a faithful match for release/v1.0.
No change to env_packages in the task sandboxes — they remain @uipath/cli / @uipath/test-manager-tool (unpinned). Pinning to @beta is a separate decision tied to the v1.0 npm publish.

⚠️ Why this is draft

@uipath/test-manager-tool@latest on npm is still 0.9.0. Until v1.0 is promoted to the @latest dist-tag, the smoke CI workflow's npm install -g @uipath/cli@latest will still pull the v0.9.x tool — which means:

the agent loads this skill,
runs uip tm testsets list ...,
the v0.9 tool returns unknown command 'testsets',
the test fails.

Do not merge until one of:

@uipath/test-manager-tool v1.0 is on the @latest dist-tag on npm, or
The env_packages lines in the three task YAMLs are pinned to @beta (one-liner per file) for the interim period.

Source of truth

CLI surface read from origin/release/v1.0 of UiPath/cli (packages/test-manager-tool@1.0.1), files:

attachment.ts  execution.ts   project.ts   report.ts   requirement.ts (not on v1.0)
result.ts      testcase.ts    testcaselog.ts  testset.ts  teststeplog.ts  user.ts  wait.ts

Test plan

Verify SKILL.md description still passes hooks/validate-skill-descriptions.sh (≤ 1024 chars).

After v1.0 publish (or with @beta pin), run smoke + integration locally:

SKILLS_REPO_PATH=/path/to/skills coder-eval run \
  tests/tasks/uipath-test/auth_project_discovery.yaml \
  tests/tasks/uipath-test/testset_hierarchy_discovery.yaml \
  -e tests/experiments/default.yaml --tags smoke

SKILLS_REPO_PATH=/path/to/skills coder-eval run \
  tests/tasks/uipath-test/report_generation.yaml \
  -e tests/experiments/integration.yaml --tags integration

Smoke-skills CI workflow green on the same SHA that flips this PR out of draft.

🤖 Generated with Claude Code

uipreliga · 2026-05-14T15:14:36Z

Code Review Summary

This PR renames the uip tm CLI surface across SKILL.md from singular→plural (testcase→testcases, testset→testsets, execution→executions), renames execute→run, adds new groups (executions list-filtered, executions get-stats, executions retry, teststeplog list, testcaselog start/finish, user get), and reshapes list-testcaselogs into nested executions testcaselogs list. Tests get matching regex updates. Three reviewers (Opus + Codex + Gemini) converge on the same picture.

🔴 Critical

references/publish-and-link-guide.md was not updated (lines 11–13, 21, 56, 64, 79, 82, 88, 105, 109). It still uses uip tm testcase list-automations, testcase link-automation, testcase execute, testset execute, testcase list, etc. SKILL.md:202 links this guide as the canonical workflow for "Publish a project and link it to a Test Manager test case". An agent that follows the link will run commands that no longer exist on the new CLI. (Flagged independently by Codex and Gemini.)
references/test-result-report-guide.md:84 was not updated — still uses uip tm testcase list-result-history. SKILL.md:201 links this guide as the canonical workflow for the QA report generation flow that tests/tasks/uipath-test/report_generation.yaml exercises. (Flagged independently by Codex and Gemini.)

🟠 High

SKILL.md:70 hint paragraph is incomplete. Lists --test-case-key consumers as update, delete, link-automation, unlink-automation, list-testsets, but the PR adds testcases add and testcases remove at lines 67–68 which take --test-case-keys (plural, comma-separated PROJECT_KEY:NUMBER values). The singular/plural flag name is precisely the kind of landmine the note exists to prevent. (Flagged independently by Codex and Gemini.)
SKILL.md:142 Critical Rule feat(EvalsBreakdown): break down eval skill #1 has a dangling fragment. Reads "...before any Test Manager operation. Use \uip login`."— the trailing sentence has no antecedent. Suggest:"...If not authenticated, run `uip login` to sign in."`

🟡 Medium

Heading hierarchy (SKILL.md:22–35). ## Concepts → ### What is Testmanager? → then ### Project Commands, ### Test Cases Commands, … all live as siblings of "What is Testmanager?", so the full command catalogue formally sits under ## Concepts. Either drop the ### What is Testmanager? subhead so the prose becomes the Concepts intro and introduce a sibling ## Commands header, or restructure.
Spelling: "Testmanager" vs "Test Manager" (SKILL.md:23, 25). Heading and lead sentence write the product as one word; every other place in the file (and the brand) uses two words.
executions list vs executions list-filtered (SKILL.md:89–90) overlap heavily. Agent has no decision rule on when to pick which.
Trailing newline missing (SKILL.md:212) — \ No newline at end of file in diff. Cosmetic, but most lints flag it.

🟢 Low / informational

Three different run verbs (SKILL.md:66 testcases run, :81 testsets run, :92 executions run) all have distinct semantics. Inherited from the CLI, but a one-line disambiguation in the doc would prevent confusion.
testcases run flag style ambiguity (SKILL.md:66). Note says space-separated UUIDs; sibling testcases add/remove uses comma-separated. Worth confirming against the CLI to make sure the example matches.
Coverage gap. PR specifically flags executions testcaselogs list (nested subcommand) as a landmine but adds no smoke test for it; both YAML changes still cover singular→plural renames only.
Token-optimization rule drift. Repo .claude/rules/token-optimization.md says strip articles; many table Purpose cells retain "a new test case", "the failed test cases", "a summary report". Low priority on a doc-alignment PR.

Test changes

report_generation.yaml:52 and testset_hierarchy_discovery.yaml:44,52 regex updates correctly mirror the SKILL.md renames. The two-alternation pattern accepts both --flag-a ... --flag-b and --flag-b ... --flag-a orderings; extra flags interleaved still match via .*. ✅

Overall assessment

The SKILL.md rewrite itself is solid — table structure is clean, the Anti-patterns section explicitly calls out the nested-subcommand mistake, and the test YAMLs are aligned. The PR is materially incomplete because the two reference guides under references/ were not swept; SKILL.md actively points agents into those guides, so the workflows they teach are now broken. Treat the two critical findings as merge blockers; the high-severity ones are quick textual fixes in SKILL.md itself.

🤖 Generated with Claude Code (Opus 4.7) + multi-model review (Codex, Gemini)

uipreliga

Provided comments; ask the codeowners for the skill to approve.

Critical fixes: - references/publish-and-link-guide.md: sweep singular testcase/testset/ execute → plural testcases/testsets/run so the linked workflow actually matches the v1.0 CLI surface. - references/test-result-report-guide.md:84: fix `testcase` → `testcases` in the commented decision table. High: - SKILL.md hint paragraph rewritten to call out all THREE test-case identifier flag shapes (--test-case-id UUID, --test-case-key singular, --test-case-keys plural-comma for testcases add/remove) — this is the landmine the note exists for. - SKILL.md Critical Rule #1 dangling `Use uip login.` fragment replaced with a complete sentence. Medium: - Heading hierarchy: drop the 'What is Testmanager?' h3 so the command tables sit under a sibling '## Commands' rather than under '## Concepts'. - 'Testmanager' → 'Test Manager' to match the brand and the rest of the file. - Add a decision rule for executions list vs executions list-filtered. - Add trailing newline at EOF. Low: - One-block disambiguation note for the three different `run` verbs (testcases run / testsets run / executions run) so the agent picks the right one. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-15T19:37:44Z

Claude finished @ganeshborle's task in 2m 23s —— View job

Coder-eval task lint (advisory)

2 task YAMLs changed; 0 Critical, 0 High, 1 Medium, 0 Low, 1 OK.

Rubric: .claude/commands/lint-task.md. This check is advisory and never blocks merge.

Evidence of passing run

❌ High — PR body does not claim the changed tasks have been run and passed. The test plan checkboxes are all unchecked, and the PR notes that @uipath/test-manager-tool@latest on npm is still v0.9.0 (which would fail against the updated plural command surface). Please edit the PR description to add a line like: Ran skill-test-testset-hierarchy-discovery and skill-test-report-generation-qa locally and they passed. (after v1.0 is on @latest or with @beta pin).

Per-task lint

`tests/tasks/uipath-test/report_generation.yaml` — verdict: OK

Changes are minimal: execution → executions in the command_pattern regex (line 53) and description (line 51). The task's design is solid — command_executed criteria verify the agent invoked uip login status and uip tm executions list with correct flags, while run_command criteria validate both file existence and substantive report content (regression section, pass/fail/none breakdown, test set reference via Python script). No issues.

`tests/tasks/uipath-test/testset_hierarchy_discovery.yaml` — verdict: Medium

Changes: testset → testsets in three command_pattern regexes (lines 45, 53) and descriptions (lines 43, 51).

Issues:

[Medium] Meaningful coverage: all three success_criteria (lines 34–56) are command_executed with min_count: 1 and no output validation. The test proves the agent ran the right discovery-chain commands (project list, testsets list, testsets list-testcases) with required flags, but does not verify that any command succeeded or returned data. An agent that runs uip tm testsets list --project-key INVALID --output json (which errors) would still pass.

Suggested fixes:

Consider adding a run_command criterion that re-runs one of the discovery commands (e.g., uip tm project list --output json) and checks expected_exit_code: 0, or pipes the output through a json_check asserting the response array is non-empty. Even one output-validation criterion would raise this to OK. Example:
```
- type: run_command
  description: "project list returns valid JSON array"
  command: "uip tm project list --output json | python3 -c \"import json,sys; d=json.load(sys.stdin); assert isinstance(d,list) and len(d)>0\""
  timeout: 30
  expected_exit_code: 0
  weight: 1.0
  pass_threshold: 1.0
```
(This is advisory — a smoke test with command_executed only is common and acceptable for this tier.)

Within-PR duplicates

No duplicate clusters detected. The two changed tasks exercise materially distinct operations (hierarchy discovery vs. report generation) at different tiers (smoke vs. integration).

Conclusion

⚠ 1 task has issues, max severity Medium. Evidence of passing run not yet claimed (expected — v1.0 CLI not yet published). Advisory only — not blocking merge.

github-actions · 2026-05-15T19:37:45Z

PR Review: `uipath-test` skill v1.0 CLI alignment

Read review criteria and project conventions
Read full diff and existing PR comments
Read each changed file in full for context
Run skill-specific checks (links, CODEOWNERS, cross-refs, secrets)
Post review findings

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b5040b4f8c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7beba20d42

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

ganeshborle · 2026-05-16T16:47:44Z

@bai-uipath Can you please have a look at the smoke run of this? Not sure why it is failing with the Test Manager is not enabled for this tenant. however it passes locally and test manager is enabled in the codereval org. Previously, it was picking some other org something of Autopilot.

vaishalisharma-uipath · 2026-05-18T05:58:24Z

 |---|---|
 | `uip tm testcases create --project-key <PROJECT_KEY> --name <TEST_CASE_NAME>` | Create a new test case in a Test Manager project. |
-| `uip tm testcases list --project-key <PROJECT_KEY>` | List all test cases in a Test Manager project. |
+| `uip tm testcases list --project-key <PROJECT_KEY>` | List all test cases in a Test Manager project. Optional `--filter <text>` to search by name/key. |


About filter, let's add a generic instruction about --filter or --search in 'Critical Rules' section

Any specific thing that we will mention in the critical rules section. As far as I understand, this is very command specific, if we add that in the critical section, coding agent might try these verbs for other commands as well. I think, we should keep it attached with the commands.

…nt `uip tm` surface Refresh `skills/uipath-test/SKILL.md` and its two reference guides so every `uip tm` command, flag, and identifier shape matches the current Test Manager CLI (`@uipath/test-manager-tool`): - Top-level command groups: `testcases` / `testsets` / `executions` (plural). The previous singular forms (`testcase`, `testset`, `execution`) no longer exist on the CLI. - Run verb: `run` everywhere (`testcases run`, `testsets run`, `executions run`), no longer `execute`. The three are distinct — added a disambiguation block calling out that `executions run` is the *re-run* variant, while `testcases run` / `testsets run` start new executions. - Test case logs: surfaced under `uip tm executions testcaselogs list` (nested) and `uip tm testcaselog start / finish / list-assertions` (top-level, singular). Anti-patterns section names the nested-subcommand landmine explicitly. - Test step logs: documented under `uip tm teststeplog list`. - New verbs: `executions get-stats`, `executions list-filtered` (with a decision rule vs the simpler `executions list`), `executions retry`, `testcaselog start` / `finish`, `user get`. - Three test-case identifier flag shapes called out as a single block: `--test-case-id` (UUID, for `run` / `list-steps` / `list-result-history`), `--test-case-key` (singular, for `update` / `delete` / `link-automation` / `unlink-automation` / `list-testsets`), and `--test-case-keys` (plural, comma-separated, for `testcases add` / `testcases remove`). Reference guides swept for the same surface: - `publish-and-link-guide.md`: pipeline diagram, Steps 4–6, and Common Pitfalls all updated to plural commands + `run`. - `test-result-report-guide.md`: decision table updated to plural `testcases list-result-history`; Prerequisites prose tightened to reference `Test Manager project key` and `Test Manager test set key` (the canonical key terminology, not "id"). Smoke tests realigned to the new surface: - `report_generation.yaml` and `testset_hierarchy_discovery.yaml` success-criteria regexes match the plural commands. Other cleanup: - Heading hierarchy: dropped `### What is Testmanager?` so command tables live under a sibling `## Commands` header (was nesting under `## Concepts`). - Brand spelled `Test Manager` (two words) consistently. - Critical Rule #1 reworded to a complete sentence instead of trailing `Use uip login.`. - Trailing newline at EOF. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 55839f0514

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

ganeshborle force-pushed the docs/uipath-test-skill-v1-cli branch from 64eb031 to 1194849 Compare May 13, 2026 10:44

ganeshborle self-assigned this May 14, 2026

ganeshborle added the uipath-case-management UiPath skill area: uipath-case-management label May 14, 2026

uipreliga self-requested a review May 14, 2026 15:17

uipreliga reviewed May 14, 2026

View reviewed changes

ganeshborle force-pushed the docs/uipath-test-skill-v1-cli branch 2 times, most recently from aa90da0 to b5040b4 Compare May 15, 2026 19:34

ganeshborle marked this pull request as ready for review May 15, 2026 19:37

ganeshborle requested review from amoluipath and vaishalisharma-uipath as code owners May 15, 2026 19:37

chatgpt-codex-connector Bot reviewed May 15, 2026

View reviewed changes

Comment thread tests/tasks/uipath-test/report_generation.yaml

ganeshborle force-pushed the docs/uipath-test-skill-v1-cli branch from b5040b4 to 7beba20 Compare May 15, 2026 19:53

chatgpt-codex-connector Bot reviewed May 15, 2026

View reviewed changes

Comment thread tests/tasks/uipath-test/testset_hierarchy_discovery.yaml

vaishalisharma-uipath reviewed May 18, 2026

View reviewed changes

ganeshborle force-pushed the docs/uipath-test-skill-v1-cli branch from 7beba20 to 55839f0 Compare May 18, 2026 07:29

chatgpt-codex-connector Bot reviewed May 18, 2026

View reviewed changes

Comment thread tests/tasks/uipath-test/report_generation.yaml

Comment thread tests/tasks/uipath-test/testset_hierarchy_discovery.yaml

Conversation

ganeshborle commented May 13, 2026 • edited by atlassian Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Out of scope

⚠️ Why this is draft

Source of truth

Test plan

Uh oh!

uipreliga commented May 14, 2026

Code Review Summary

🔴 Critical

🟠 High

🟡 Medium

🟢 Low / informational

Test changes

Overall assessment

Uh oh!

uipreliga left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coder-eval task lint (advisory)

Evidence of passing run

Per-task lint

tests/tasks/uipath-test/report_generation.yaml — verdict: OK

tests/tasks/uipath-test/testset_hierarchy_discovery.yaml — verdict: Medium

Within-PR duplicates

Conclusion

Uh oh!

github-actions Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: uipath-test skill v1.0 CLI alignment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

ganeshborle commented May 16, 2026

Uh oh!

Uh oh!

Uh oh!

vaishalisharma-uipath May 18, 2026

Choose a reason for hiding this comment

Uh oh!

ganeshborle May 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ganeshborle commented May 13, 2026 •

edited by atlassian Bot

Loading

github-actions Bot commented May 15, 2026 •

edited

Loading

`tests/tasks/uipath-test/report_generation.yaml` — verdict: OK

`tests/tasks/uipath-test/testset_hierarchy_discovery.yaml` — verdict: Medium

github-actions Bot commented May 15, 2026 •

edited

Loading

PR Review: `uipath-test` skill v1.0 CLI alignment