Skip to content

fix: handle URLs with balanced parentheses in URL detection#152

Open
ericjypark wants to merge 2 commits intocoder:mainfrom
ericjypark:fix/url-parens-detection
Open

fix: handle URLs with balanced parentheses in URL detection#152
ericjypark wants to merge 2 commits intocoder:mainfrom
ericjypark:fix/url-parens-detection

Conversation

@ericjypark
Copy link
Copy Markdown

Problem

URLs containing parentheses — such as Wikipedia links like https://en.wikipedia.org/wiki/Rust_(programming_language) — are incorrectly truncated. The URL regex character class excludes ( and ), so the match stops at the first parenthesis. Additionally, TRAILING_PUNCTUATION unconditionally strips ), breaking URLs where parentheses are part of the path.

Fix

  • Add () to the URL regex character class so parentheses are captured
  • Remove ) from TRAILING_PUNCTUATION to preserve balanced parens
  • Add a balanced-paren stripping pass: only strip trailing ) when the URL has more close-parens than open-parens (handles the common case of a URL wrapped in prose parentheses like (see https://...))

Test Plan

  • bun test lib/url-detection.test.ts — 24/24 pass (4 new tests for parens handling)
  • biome check . — clean
  • tsc --noEmit — clean
  • Existing test 'strips trailing parenthesis' with (see https://example.com) still passes

URLs containing parentheses (e.g., Wikipedia links like
https://en.wikipedia.org/wiki/Rust_(programming_language)) were
truncated at the first `(` because the URL regex character class
excluded parentheses, and TRAILING_PUNCTUATION unconditionally
stripped `)`.

- Add `()` to URL_REGEX character class
- Remove `)` from TRAILING_PUNCTUATION to avoid stripping balanced parens
- Add balanced-paren stripping: only strip trailing `)` when unbalanced
- Add tests for Wikipedia URLs, nested parens, and wrapped URLs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants