Skip to content

Skip fenced code blocks in skill file-reference extraction#1658

Open
johnsonr wants to merge 1 commit into
mainfrom
fix/skill-extractor-skip-fenced-code-blocks
Open

Skip fenced code blocks in skill file-reference extraction#1658
johnsonr wants to merge 1 commit into
mainfrom
fix/skill-extractor-skip-fenced-code-blocks

Conversation

@johnsonr
Copy link
Copy Markdown
Contributor

@johnsonr johnsonr commented May 7, 2026

Summary

  • InstructionFileReferenceExtractor scans the entire skill body for markdown links and resource paths, including content inside ``` / ~~~ fenced code blocks. Any skill that teaches code can accidentally surface "references" that are just illustrations.
  • Concrete trigger: a JS template literal of shape `- [${hit.title}](${hit.url})` inside a code fence is parsed as a markdown link with local target ${hit.url}, and skill load fails with references missing files: ${hit.url}.
  • Fix: pre-strip fenced code blocks (CommonMark fence rules — ``` or ~~~, up to 3 leading spaces of indent, matching closing fence, implicit close at EOF) before running the link and resource-path regexes. Inline code spans (single backticks) are deliberately left in place — they're commonly used for real filename references in prose.

Test plan

  • New tests cover: markdown link inside fence, resource path inside fence, tilde fence, fence with language tag, unclosed fence (CommonMark implicit close), multiple interleaved fences, prose immediately after a closing fence.
  • All 14 pre-existing extractor tests still pass.
  • Full embabel-agent-skills test suite green (205 tests).

🤖 Generated with Claude Code

InstructionFileReferenceExtractor scans the entire skill body for
markdown links and resource paths, including content inside ``` and
~~~ fenced code blocks. JS/TS/Python examples that interpolate
variables or mention illustrative paths get treated as real local
file references — failing skill load with "references missing files".

Concrete trigger: a JS template literal of the shape
`\`- [${hit.title}](${hit.url})\`` inside a code fence is parsed as a
markdown link with local target `${hit.url}`. Any skill teaching code
runs into this.

Fix: pre-strip fenced code blocks (CommonMark fence rules — ``` or
~~~ with up to 3 leading spaces of indent, matching closing fence,
implicit close at EOF) before running the link and resource-path
regexes. Inline code spans (single backticks) are left untouched —
they're commonly used for real filename references in prose.

Tests cover: link inside fence, resource path inside fence, tilde
fence, fence with language tag, unclosed fence (CommonMark implicit
close), interleaved prose-and-fence, prose immediately after a
closing fence.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 7, 2026

Quality Gate Failed Quality Gate failed

Failed conditions
B Reliability Rating on New Code (required ≥ A)
C Maintainability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant