Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
2ffa3c8
docs: recommend skill-creator for new skill drafting
marc0olo May 29, 2026
9cc224c
docs: extend skill-creator guidance to cover skill improvements
marc0olo May 29, 2026
01a818f
docs: require eval review and testrun before any skill PR
marc0olo May 29, 2026
6093eb5
docs: fix inconsistencies in contributing guide and agent instructions
marc0olo May 29, 2026
a90c869
docs: clarify IC evals are kept as regression safety net
marc0olo May 29, 2026
d416a1a
feat: install skill-creator and fix remaining doc inconsistencies
marc0olo May 29, 2026
941fe4f
fix: correct upstream diff direction in CONTRIBUTING.md (introduced b…
marc0olo May 29, 2026
1a96305
chore: remove legacy skill template; use skill-creator to draft new s…
marc0olo May 29, 2026
410b2bc
feat: add improve-ic-skill, patch skill-creator bugs, remove skills-lock
marc0olo May 29, 2026
2f8b972
docs: add targeted eval guidance to upstream sync checklist
marc0olo May 29, 2026
91145ba
feat(improve-ic-skill): require explicit problem statement before sta…
marc0olo May 29, 2026
cd95759
docs(skill-creator): add upstream commit SHA and install date to PATC…
marc0olo May 29, 2026
c1dfb4d
fix(skill-creator): replace PyYAML with stdlib parser in quick_valida…
marc0olo May 29, 2026
8c30d24
fix(skill-creator): add explicit field name warning to grader.md (tex…
marc0olo May 29, 2026
f0162f1
fix: address all review findings — improve-ic-skill, CONTRIBUTING.md,…
marc0olo May 29, 2026
2e3339e
fix: correct diff notation, clarify Cowork in PATCHES.md, improve Ste…
marc0olo May 29, 2026
d43a9ea
fix: resolve all remaining review findings
marc0olo May 29, 2026
02a9918
fix: final consistency pass — all third-review findings addressed
marc0olo May 30, 2026
8c8fd98
fix: clarify branch naming placeholder and PATCHES.md commit SHA label
marc0olo May 30, 2026
260bf11
feat(improve-ic-skill): hold the line — do not proceed without a clea…
marc0olo May 30, 2026
c6aed3f
fix(improve-ic-skill): hard stop in Step 0 — no work without specific…
marc0olo May 30, 2026
768416e
feat(icp-cli): add instructions for running parallel pocket-ics in wo…
raymondk Jun 1, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
175 changes: 175 additions & 0 deletions .agents/skills/improve-ic-skill/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
---
name: improve-ic-skill
description: Improve an existing skill in the IC Skills repo (skills.internetcomputer.org). Load this skill whenever asked to improve, fix, update, enhance, review, or add content to an existing skill at skills/<skill-name>/SKILL.md — including adding pitfalls, updating canister IDs, fixing code examples, strengthening the description, or updating evals. Do NOT use for creating a brand-new skill (use skill-creator for that). Trigger examples: "the motoko skill is missing a pitfall", "update icp-cli for the new recipe format", "the ckbtc description is weak", "add an eval for the canister-security skill".
---

# Improve IC Skill

A token-efficient workflow for improving an existing skill in the IC Skills repo. The guiding principle: **understand the problem first, change precisely, verify only what changed.** Evals exist as regression safety nets — run them selectively, not automatically on every pass.

## Step 0 — Understand the problem before touching anything

**This is a hard stop. Do not read any file, run any command, or make any change until you have a specific problem statement from the user.**

You need to know:
1. **What is the specific issue?** (e.g. "canister ID X is stale", "missing pitfall about Y", "description doesn't trigger on Z", "code example uses deprecated API W")
2. **Why does it matter?** (e.g. "agents generate broken code", "skill never triggers for this use case", "upstream sync added a new command we don't cover")

If the request is vague ("improve the motoko skill", "the ckbtc skill feels weak") — ask:

> "What specific issue should I focus on? For example: a missing pitfall, a stale canister ID, a broken code example, or a description that isn't triggering correctly."

**If the user cannot or will not provide a specific issue — stop. Do not proceed.** Do not read the skill. Do not run commands. Do not "explore and find improvements yourself." Explain clearly:

> "I need a specific problem to solve before I can start. Blind improvements risk changing things that don't need changing. To find a concrete issue, try: running `npm run validate`, checking `evaluations/<skill-name>.json` for failing cases, or pointing me to a GitHub issue or upstream diff."

If the user pushes back, repeats the vague request, or expresses frustration — hold the position. The answer is always the same: no specific issue = no work started.

The only exception: if the reason is unambiguous from context already in the conversation (a GitHub issue body, an upstream diff, a specific error message) — proceed directly to Step 1 without asking.

## Step 1 — Understand the skill

Read `skills/<skill-name>/SKILL.md` fully. Check:
- What it covers and what it's missing
- Whether it's upstream-tracked: `grep <skill-name> .claude/upstream.md` — if it appears, note which sections are owned by icskills (those cannot be overwritten from upstream)
- Recent git history: `git log --oneline -5 -- skills/<skill-name>/SKILL.md`

## Step 2 — Validate baseline

```bash
npm run validate
```

Fix any errors before making content changes. Warnings are acceptable — don't chase them unless they're relevant to your improvement.

## Step 3 — Identify improvements

Read with fresh eyes. Common high-value areas:

**Pitfalls** — highest-value content; every pitfall documented prevents an agent hallucination. Look for: incorrect usage patterns you know about, error messages agents typically misread, wrong defaults, API gotchas.

**Canister IDs and version numbers** — verify against mainnet and current release tags. Stale IDs silently break agent-generated code.

**Code examples** — verify syntax compiles, APIs still exist, no deprecated patterns.

**Description** — must state what it does, when to use it, AND when NOT to use it. Include specific keywords agents match on. A weak description means the skill never triggers.

**Required frontmatter** — `metadata.title` and `metadata.category` are required by CI. Missing these blocks deployment.

For upstream-tracked skills (`motoko`, `migrating-motoko`, `migrating-motoko-enhanced`, `mops-cli`): read `.claude/upstream.md` carefully. Only modify icskills-owned sections freely; changes to shared content should also be filed upstream.

## Step 4 — Apply improvements

Edit `skills/<skill-name>/SKILL.md`. Stay focused — don't refactor things you weren't asked to touch.

## Step 5 — Validate again

```bash
npm run validate
```

Fix any errors. Warnings from unrelated sections are fine to leave.

Optionally run LLM quality scoring if the changes were substantial:

```bash
skill-validator score evaluate --provider claude-cli skills/<skill-name>
```

## Step 6 — Check and update evals

Open `evaluations/<skill-name>.json` at the repo root (not inside the skill directory).

**If the file doesn't exist yet**, create it as part of this improvement session. A skill being improved is the right moment to seed its first evals. Start lean — 1-2 output_evals covering the most important pitfalls, and 2-3 trigger_evals for the description. Don't try to be comprehensive; a small, accurate eval file beats an overwhelming one that never gets run.

For **upstream sync sessions** where no eval file exists: seed cases directly from the upstream diff — new commands, changed defaults, renamed APIs, and new pitfalls in the diff are exactly where agents will hallucinate without updated guidance. The diff is your best source.

```bash
# Verify no eval file exists
ls evaluations/<skill-name>.json 2>/dev/null || echo "No evals yet — create them"
```

**If the file exists**, review the existing cases alongside your changes.

**Always update evals when:**
- You added a new pitfall — pitfalls are exactly where agents hallucinate without guidance; every new pitfall deserves a case
- You changed a command, API, or canister ID that an existing case now describes incorrectly — fix the case
- You introduced a new behavior that agents would likely get wrong without explicit guidance

**Keep the suite lean.** Don't add cases for things that are obvious or already well-covered. Each case is a future regression test that costs tokens to run.

## Step 7 — Run existing evals (only when relevant)

Running the existing suite is a regression check — it answers "did I break something that was already tested?" It is separate from adding new cases (Step 6). You can add a new eval case for a new pitfall without running the existing suite at all.

**Run evals when you modified existing tested content:**
- You changed or removed content that an eval explicitly tests (canister ID, command name, error message)
- You rewrote a section that several evals cover
- The description changed significantly — run trigger evals only
- You're unsure whether a change is safe

**Skip running evals when your changes were purely additive:**
- New pitfall, new section, new example — nothing you touched is already tested

Check what evals exist before deciding:

```bash
node scripts/evaluate-skills.js <skill-name> --list
```

Run only the relevant subset:

```bash
# Single eval by index (cheapest)
node scripts/evaluate-skills.js <skill-name> --eval <N>

# Trigger evals only — for description changes
node scripts/evaluate-skills.js <skill-name> --triggers-only

# Skip baseline to halve token cost when correctness is what matters, not the delta
node scripts/evaluate-skills.js <skill-name> --eval <N> --no-baseline

# Full suite — only when you made broad changes across the whole skill
node scripts/evaluate-skills.js <skill-name>
```

Eval format reference:

```json
{
"output_evals": [
{
"name": "descriptive name",
"prompt": "scoped prompt asking for one thing — specify what to exclude to keep it fast",
"expected_behaviors": [
"Specific, checkable behavior the model should exhibit",
"Another specific behavior"
]
}
],
"trigger_evals": [
{
"query": "realistic user prompt that should trigger this skill",
"expected_behavior": "should trigger"
},
{
"query": "realistic prompt that looks similar but should NOT trigger this skill",
"expected_behavior": "should NOT trigger"
}
]
}
```

Keep prompts tightly scoped — open-ended prompts generate long responses and risk timeouts. Ask for one thing; explicitly exclude what you don't want ("just the function, no deploy steps").

## Step 8 — Final check

```bash
npm run validate
```

The skill is ready for a PR. Include a brief summary of what changed and, if you ran evals in Step 7, paste the relevant results collapsed in a `<details>` block (see [Submit a PR](../../../CONTRIBUTING.md#8-submit-a-pr) for the exact format).

## Upstream-tracked skills

Before editing `motoko`, `migrating-motoko`, `migrating-motoko-enhanced`, or `mops-cli`, read `.claude/upstream.md`. It lists which sections are icskills-owned. You can freely improve owned sections. For shared content, improvements should also be filed as issues upstream so they flow back on the next sync.
Loading
Loading