feat: improve implement-spec skill score (53% → 96%)#1531
Conversation
Hey 👋 @moodlezoup I ran your skills through `tessl skill review` at work and found some targeted improvements. Here's the full before/after: <!-- Upload score_card.png to this PR and replace with the actual image URL --> | Skill | Before | After | Change | |-------|--------|-------|--------| | implement-spec | 53% | 96% | +43% | | update-docs | 16% | 62% | +46% | | ci-code-review | 90% | 90% | — | | jolt | 66% | 66% | — | | new-spec | 63% | 63% | — | | analyze-spec | 53% | 53% | — | | new-invariant | 53% | 53% | — | | new-objective | 53% | 53% | — | ## Summary Focused on `implement-spec` — it had the most improvement headroom (tied at 53% with several others) and is the most central workflow skill (spec-to-code implementation is the core development action). Also fixed a missing `name` field in `update-docs` that was causing it to fail validation entirely. ## Changes <details> <summary>Changes made</summary> **`implement-spec` (53% → 96%, +43%)** - Expanded the frontmatter description from a vague one-liner ("Autonomous one-shot implementation from an approved spec") to a structured description listing concrete actions (plans changes, executes code modifications, runs QA cycles, validates correctness, posts PR summaries) - Added explicit USE FOR and TRIGGERS sections with natural language trigger terms ("implement spec", "build from spec", "execute spec", "implement feature", etc.) - Folded the `<Purpose>` section content into the expanded description — it was duplicating information already covered - Removed the `<Examples>` section that added minimal value for an autonomous workflow skill **`update-docs` (16% → 62%, +46%)** - Added the missing `name: update-docs` field to the frontmatter — the skill was failing validation entirely because this required field was absent, which prevented the LLM judge from scoring it </details> ## Testing - [x] `tessl skill review` confirms `implement-spec` improved from 53% → 96% - [x] `tessl skill review` confirms `update-docs` improved from 16% → 62% - [x] All validation checks pass (no errors) - [ ] No modified crates — changes are skill metadata only ## Security Considerations No security implications. Changes are limited to skill metadata (frontmatter descriptions) and removal of redundant documentation sections. No code, API, proof system, or verifier changes. ## Breaking Changes None --- I also stress-tested your `implement-spec` skill against a few real-world task evals and it held up really well on autonomous spec implementation requiring dual-mode (host+zk) QA validation with jolt-eval invariant scaffolding. Kudos for that. Honest disclosure — I work at @tesslio where we build tooling around skills like these. Not a pitch — just saw room for improvement and wanted to contribute. Want to self-improve your skills? Just point your agent (Claude Code, Codex, etc.) at [this Tessl guide](https://docs.tessl.io/evaluate/optimize-a-skill-using-best-practices) and ask it to optimize your skill. Ping me — [@yogesh-tessl](https://github.com/yogesh-tessl) — if you hit any snags. Thanks in advance 🙏
|
Hi @yogesh-tessl, thanks for the PR. What are these percentages of? Is there any documentation you can point me to? |
|
@moodlezoup, good question. The percentages come from Tessl's skill review, which scores SKILL.md files across a few dimensions: whether the frontmatter has the required fields? How clear the trigger/routing language is? for an LLM to pick the right skill, and how actionable the instructions are once the skill fires. Each dimension gets a 0-100 score and the overall number is a weighted average. The big jump on implement-spec (53 to 96) was mostly the frontmatter description being too short for agents to reliably match it. The update-docs one (16 to 62) was just a missing name field that was failing validation entirely. I'll be happy to point you to the scoring docs if you want more detail on the rubric. |
Hey 👋 @moodlezoup
ran your skills through
tessl skill reviewat work and found some targeted improvements. Here's the before/after:focused on
implement-spec. It had the most improvement headroom (tied at 53% with several others) and is the most central workflow skill (spec-to-code implementation is the core development action). Also fixed a missingnamefield inupdate-docsthat was causing it to fail validation entirely.Changes made
implement-spec(53% → 96%, +43%)<Purpose>section content into the expanded description - it was duplicating information already covered<Examples>section that added minimal value for an autonomous workflow skillupdate-docs(16% → 62%, +46%)name: update-docsfield to the frontmatter - the skill was failing validation entirely because this required field was absent, which prevented the LLM judge from scoring itquick honest disclosure. I work at https://github.com/tesslio where we build tooling around skills like these. Not a pitch, just saw room for improvement and wanted to contribute.
if you want to self-improve your skills, or define your own scenarios to pressure test, just ask your agent (Claude Code, Codex, etc.) to evaluate and optimize your skill with Tessl. Ping me @yogesh-tessl, if you hit any snags.