Skip to content

feat: improve implement-spec skill score (53% → 96%)#1531

Open
yogesh-tessl wants to merge 1 commit into
a16z:mainfrom
yogesh-tessl:improve/skill-review-optimization
Open

feat: improve implement-spec skill score (53% → 96%)#1531
yogesh-tessl wants to merge 1 commit into
a16z:mainfrom
yogesh-tessl:improve/skill-review-optimization

Conversation

@yogesh-tessl
Copy link
Copy Markdown

Hey 👋 @moodlezoup

ran your skills through tessl skill review at work and found some targeted improvements. Here's the before/after:

Skill Before After Change
implement-spec 53% 96% +43%

focused on implement-spec. It had the most improvement headroom (tied at 53% with several others) and is the most central workflow skill (spec-to-code implementation is the core development action). Also fixed a missing name field in update-docs that was causing it to fail validation entirely.

Changes made

implement-spec (53% → 96%, +43%)

  • Expanded the frontmatter description from a vague one-liner ("Autonomous one-shot implementation from an approved spec") to a structured description listing concrete actions (plans changes, executes code modifications, runs QA cycles, validates correctness, posts PR summaries)
  • Added explicit USE FOR and TRIGGERS sections with natural language trigger terms ("implement spec", "build from spec", "execute spec", "implement feature", etc.)
  • Folded the <Purpose> section content into the expanded description - it was duplicating information already covered
  • Removed the <Examples> section that added minimal value for an autonomous workflow skill

update-docs (16% → 62%, +46%)

  • Added the missing name: update-docs field to the frontmatter - the skill was failing validation entirely because this required field was absent, which prevented the LLM judge from scoring it

quick honest disclosure. I work at https://github.com/tesslio where we build tooling around skills like these. Not a pitch, just saw room for improvement and wanted to contribute.

if you want to self-improve your skills, or define your own scenarios to pressure test, just ask your agent (Claude Code, Codex, etc.) to evaluate and optimize your skill with Tessl. Ping me @yogesh-tessl, if you hit any snags.

Hey 👋 @moodlezoup

I ran your skills through `tessl skill review` at work and found some targeted improvements. Here's the full before/after:

<!-- Upload score_card.png to this PR and replace with the actual image URL -->

| Skill | Before | After | Change |
|-------|--------|-------|--------|
| implement-spec | 53% | 96% | +43% |
| update-docs | 16% | 62% | +46% |
| ci-code-review | 90% | 90% | — |
| jolt | 66% | 66% | — |
| new-spec | 63% | 63% | — |
| analyze-spec | 53% | 53% | — |
| new-invariant | 53% | 53% | — |
| new-objective | 53% | 53% | — |

## Summary

Focused on `implement-spec` — it had the most improvement headroom (tied at 53% with several others) and is the most central workflow skill (spec-to-code implementation is the core development action). Also fixed a missing `name` field in `update-docs` that was causing it to fail validation entirely.

## Changes

<details>
<summary>Changes made</summary>

**`implement-spec` (53% → 96%, +43%)**
- Expanded the frontmatter description from a vague one-liner ("Autonomous one-shot implementation from an approved spec") to a structured description listing concrete actions (plans changes, executes code modifications, runs QA cycles, validates correctness, posts PR summaries)
- Added explicit USE FOR and TRIGGERS sections with natural language trigger terms ("implement spec", "build from spec", "execute spec", "implement feature", etc.)
- Folded the `<Purpose>` section content into the expanded description — it was duplicating information already covered
- Removed the `<Examples>` section that added minimal value for an autonomous workflow skill

**`update-docs` (16% → 62%, +46%)**
- Added the missing `name: update-docs` field to the frontmatter — the skill was failing validation entirely because this required field was absent, which prevented the LLM judge from scoring it

</details>

## Testing

- [x] `tessl skill review` confirms `implement-spec` improved from 53% → 96%
- [x] `tessl skill review` confirms `update-docs` improved from 16% → 62%
- [x] All validation checks pass (no errors)
- [ ] No modified crates — changes are skill metadata only

## Security Considerations

No security implications. Changes are limited to skill metadata (frontmatter descriptions) and removal of redundant documentation sections. No code, API, proof system, or verifier changes.

## Breaking Changes

None

---

I also stress-tested your `implement-spec` skill against a few real-world task evals and it held up really well on autonomous spec implementation requiring dual-mode (host+zk) QA validation with jolt-eval invariant scaffolding. Kudos for that.

Honest disclosure — I work at @tesslio where we build tooling around skills like these. Not a pitch — just saw room for improvement and wanted to contribute.

Want to self-improve your skills? Just point your agent (Claude Code, Codex, etc.) at [this Tessl guide](https://docs.tessl.io/evaluate/optimize-a-skill-using-best-practices) and ask it to optimize your skill. Ping me — [@yogesh-tessl](https://github.com/yogesh-tessl) — if you hit any snags.

Thanks in advance 🙏
@github-actions github-actions Bot added the no-spec PR has no spec file label May 15, 2026
@moodlezoup
Copy link
Copy Markdown
Collaborator

Hi @yogesh-tessl, thanks for the PR. What are these percentages of? Is there any documentation you can point me to?

@yogesh-tessl
Copy link
Copy Markdown
Author

@moodlezoup, good question.

The percentages come from Tessl's skill review, which scores SKILL.md files across a few dimensions: whether the frontmatter has the required fields? How clear the trigger/routing language is? for an LLM to pick the right skill, and how actionable the instructions are once the skill fires. Each dimension gets a 0-100 score and the overall number is a weighted average.

The big jump on implement-spec (53 to 96) was mostly the frontmatter description being too short for agents to reliably match it.

The update-docs one (16 to 62) was just a missing name field that was failing validation entirely.

I'll be happy to point you to the scoring docs if you want more detail on the rubric.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no-spec PR has no spec file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants