feat(extensions): authenticate GitHub-hosted catalog and download requests with GITHUB_TOKEN/GH_TOKEN#2087
Conversation
…uests with GITHUB_TOKEN/GH_TOKEN
There was a problem hiding this comment.
Pull request overview
Adds GitHub-token authentication to extension catalog fetching and extension ZIP downloads so catalogs/assets hosted in private GitHub repos work when GITHUB_TOKEN/GH_TOKEN is set, while aiming to avoid leaking credentials to non-GitHub hosts.
Changes:
- Introduces
ExtensionCatalog._make_request(url)to attach anAuthorizationheader for GitHub-hosted URLs when a token is available. - Updates all
urllib.request.urlopen(...)call sites inExtensionCatalogto use the new request builder. - Adds unit/integration tests for the request/header behavior and updates user docs to reflect token usage for both catalogs and downloads.
Show a summary per file
| File | Description |
|---|---|
src/specify_cli/extensions.py |
Adds _make_request and routes catalog/download urlopen calls through it to support authenticated GitHub fetches. |
tests/test_extensions.py |
Adds tests validating auth header behavior and that urlopen receives a Request containing the header. |
extensions/EXTENSION-USER-GUIDE.md |
Updates env var documentation and adds an example for private GitHub-hosted catalogs. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 3/3 changed files
- Comments generated: 2
src/specify_cli/extensions.py
Outdated
|
|
||
| headers: Dict[str, str] = {} | ||
| token = os.environ.get("GITHUB_TOKEN") or os.environ.get("GH_TOKEN") | ||
| if token and any( | ||
| host in url | ||
| for host in ("raw.githubusercontent.com", "github.com", "api.github.com") | ||
| ): |
There was a problem hiding this comment.
The GitHub-hosted URL check uses substring matching (host in url), which can incorrectly attach the token to non-GitHub hosts (e.g., https://github.com.evil.com/... or https://internal.example.com/path/github.com/...) and violates the stated goal of preventing credential leakage. Parse the URL and compare urlparse(url).hostname (lowercased) against an allowlist of exact hostnames instead of scanning the full URL string.
| headers: Dict[str, str] = {} | |
| token = os.environ.get("GITHUB_TOKEN") or os.environ.get("GH_TOKEN") | |
| if token and any( | |
| host in url | |
| for host in ("raw.githubusercontent.com", "github.com", "api.github.com") | |
| ): | |
| from urllib.parse import urlparse | |
| headers: Dict[str, str] = {} | |
| token = os.environ.get("GITHUB_TOKEN") or os.environ.get("GH_TOKEN") | |
| hostname = (urlparse(url).hostname or "").lower() | |
| github_hosts = {"raw.githubusercontent.com", "github.com", "api.github.com"} | |
| if token and hostname in github_hosts: |
| catalog = self._make_catalog(temp_dir) | ||
| req = catalog._make_request("https://internal.example.com/catalog.json") | ||
| assert "Authorization" not in req.headers | ||
|
|
There was a problem hiding this comment.
Current tests cover a generic non-GitHub domain, but they don't cover common spoofing cases that would slip through the current substring-based domain check (e.g., https://github.com.evil.com/... or a non-GitHub host whose path/query contains github.com). Add negative tests for these URL shapes to ensure the auth header is never attached outside the intended allowlist.
| def test_make_request_token_not_added_for_github_lookalike_host(self, temp_dir, monkeypatch): | |
| """Auth header is not attached to non-GitHub hosts that only contain github.com in the hostname.""" | |
| monkeypatch.setenv("GITHUB_TOKEN", "ghp_testtoken") | |
| catalog = self._make_catalog(temp_dir) | |
| req = catalog._make_request("https://github.com.evil.com/org/repo/releases/download/v1/ext.zip") | |
| assert "Authorization" not in req.headers | |
| def test_make_request_token_not_added_for_non_github_host_with_github_in_path(self, temp_dir, monkeypatch): | |
| """Auth header is not attached when a non-GitHub host includes github.com only in the URL path.""" | |
| monkeypatch.setenv("GITHUB_TOKEN", "ghp_testtoken") | |
| catalog = self._make_catalog(temp_dir) | |
| req = catalog._make_request("https://evil.example.com/github.com/org/repo/releases/download/v1/ext.zip") | |
| assert "Authorization" not in req.headers | |
| def test_make_request_token_not_added_for_non_github_host_with_github_in_query(self, temp_dir, monkeypatch): | |
| """Auth header is not attached when a non-GitHub host includes github.com only in the query string.""" | |
| monkeypatch.setenv("GITHUB_TOKEN", "ghp_testtoken") | |
| catalog = self._make_catalog(temp_dir) | |
| req = catalog._make_request("https://evil.example.com/download?source=https://github.com/org/repo/releases/download/v1/ext.zip") | |
| assert "Authorization" not in req.headers |
|
Did you also make sure presets are covered similarly? |
…stname parsing & added 3 new spoofing tests -> Issue # 2037
…direct leakage -> Issue # 2037
|
I'd prefer to do it at the same time as it would the same kind of change. Thanks for working with us on this! |
mnriem
left a comment
There was a problem hiding this comment.
Let me know if you can deliver it as part of this PR without too much hassle. Thanks!
…s with GITHUB_TOKEN/GH_TOKEN -> Issue # 2037
There was a problem hiding this comment.
Pull request overview
Adds GitHub-token authentication for GitHub-hosted extension (and preset) catalog/ZIP downloads so private GitHub repositories can be used as sources without failing unauthenticated.
Changes:
- Add
_make_request()and_open_url()helpers to attachAuthorization: token …for GitHub-hosted URLs and use them for catalog fetch + ZIP downloads. - Add unit/integration-style tests asserting auth headers are (or are not) attached based on URL host and env vars.
- Update user docs for extensions and presets to describe
GH_TOKEN/GITHUB_TOKENbehavior and provide private-catalog examples.
Show a summary per file
| File | Description |
|---|---|
src/specify_cli/extensions.py |
Adds request/open helpers and routes catalog fetch + ZIP download through them. |
src/specify_cli/presets.py |
Mirrors the same authenticated request/open behavior for preset catalogs and ZIP downloads. |
tests/test_extensions.py |
Adds test coverage for request header behavior and that catalog fetch / extension download use it. |
tests/test_presets.py |
Adds similar test coverage for preset catalogs and preset downloads. |
extensions/EXTENSION-USER-GUIDE.md |
Updates env var docs + adds private GitHub catalog usage example. |
presets/README.md |
Adds token env var docs + private GitHub catalog usage example. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 6/6 changed files
- Comments generated: 5
| _github_hosts = {"raw.githubusercontent.com", "github.com", "api.github.com"} | ||
|
|
||
| class _StripAuthOnRedirect(urllib.request.HTTPRedirectHandler): | ||
| def redirect_request(_self, req, fp, code, msg, headers, newurl): | ||
| new_req = super().redirect_request(req, fp, code, msg, headers, newurl) | ||
| if new_req is not None: | ||
| hostname = (urlparse(newurl).hostname or "").lower() | ||
| if hostname not in _github_hosts: | ||
| new_req.headers.pop("Authorization", None) | ||
| return new_req |
There was a problem hiding this comment.
_open_url() strips the Authorization header on redirects to any host outside {raw.githubusercontent.com, github.com, api.github.com}. GitHub archive URLs (e.g. https://github.com///archive/refs/tags/.zip) redirect to codeload.github.com, so with a token set this logic will drop the header and private-repo archive downloads will still 404. Consider including codeload.github.com (and any other GitHub-owned redirect targets you need to support) in the allowlist used for redirect decisions, or otherwise preserving Authorization for redirects that remain within trusted GitHub domains.
| _github_hosts = {"raw.githubusercontent.com", "github.com", "api.github.com"} | ||
|
|
||
| class _StripAuthOnRedirect(urllib.request.HTTPRedirectHandler): | ||
| def redirect_request(_self, req, fp, code, msg, headers, newurl): | ||
| new_req = super().redirect_request(req, fp, code, msg, headers, newurl) | ||
| if new_req is not None: | ||
| hostname = (urlparse(newurl).hostname or "").lower() | ||
| if hostname not in _github_hosts: | ||
| new_req.headers.pop("Authorization", None) | ||
| return new_req |
There was a problem hiding this comment.
_open_url() currently removes the Authorization header when redirected to any host not in the small GitHub host set. GitHub “archive/refs” ZIP URLs commonly redirect to codeload.github.com; stripping auth there will break downloading presets/extensions from private repos even when GITHUB_TOKEN/GH_TOKEN is set. Update the redirect allowlist/logic to treat codeload.github.com (and any other GitHub-owned redirect endpoints you want to support) as trusted so auth is preserved when required.
| | Variable | Description | Default | | ||
| |----------|-------------|---------| | ||
| | `SPECKIT_PRESET_CATALOG_URL` | Override the full catalog stack with a single URL (replaces all defaults) | Built-in default stack | | ||
| | `GH_TOKEN` / `GITHUB_TOKEN` | GitHub token for authenticated requests to GitHub-hosted URLs (`raw.githubusercontent.com`, `github.com`, `api.github.com`). Required when your catalog JSON or preset ZIPs are hosted in a private GitHub repository. | None | | ||
|
|
||
| #### Example: Using a private GitHub-hosted catalog | ||
|
|
||
| ```bash | ||
| # Authenticate with a token (gh CLI, PAT, or GITHUB_TOKEN in CI) | ||
| export GITHUB_TOKEN=$(gh auth token) | ||
|
|
||
| # Search a private catalog added via `specify preset catalog add` | ||
| specify preset search my-template | ||
|
|
||
| # Install from a private catalog | ||
| specify preset add my-template | ||
| ``` | ||
|
|
||
| The token is attached automatically to requests targeting GitHub domains. Non-GitHub catalog URLs are always fetched without credentials. |
There was a problem hiding this comment.
The PR title/description focus on ExtensionCatalog, but this change also introduces the same GitHub-token behavior for PresetCatalog and documents it here. Please update the PR title and/or description to reflect that presets are included as well, so reviewers and release notes capture the full scope.
| def _make_request(self, url: str) -> "urllib.request.Request": | ||
| """Build a urllib Request, adding a GitHub auth header when available. | ||
|
|
||
| Reads GITHUB_TOKEN or GH_TOKEN from the environment and attaches an | ||
| ``Authorization: token <value>`` header for requests to GitHub-hosted | ||
| domains (``raw.githubusercontent.com``, ``github.com``, | ||
| ``api.github.com``). Non-GitHub URLs are returned as plain requests | ||
| so credentials are never leaked to third-party hosts. | ||
| """ | ||
| import os | ||
| import urllib.request | ||
| from urllib.parse import urlparse | ||
|
|
||
| headers: Dict[str, str] = {} | ||
| token = os.environ.get("GITHUB_TOKEN") or os.environ.get("GH_TOKEN") | ||
| hostname = (urlparse(url).hostname or "").lower() | ||
| github_hosts = {"raw.githubusercontent.com", "github.com", "api.github.com"} | ||
| if token and hostname in github_hosts: | ||
| headers["Authorization"] = f"token {token}" | ||
| return urllib.request.Request(url, headers=headers) | ||
|
|
||
| def _open_url(self, url: str, timeout: int = 10): | ||
| """Open a URL using _make_request, stripping auth on cross-host redirects. | ||
|
|
||
| When the request carries an Authorization header, a custom redirect | ||
| handler is used to drop that header if the redirect target is not a | ||
| GitHub-hosted domain, preventing token leakage to CDNs or other | ||
| third-party hosts that GitHub may redirect to. | ||
| """ | ||
| import urllib.request | ||
| from urllib.parse import urlparse | ||
|
|
||
| req = self._make_request(url) | ||
|
|
||
| if not req.get_header("Authorization"): | ||
| return urllib.request.urlopen(req, timeout=timeout) | ||
|
|
||
| _github_hosts = {"raw.githubusercontent.com", "github.com", "api.github.com"} | ||
|
|
||
| class _StripAuthOnRedirect(urllib.request.HTTPRedirectHandler): | ||
| def redirect_request(_self, req, fp, code, msg, headers, newurl): | ||
| new_req = super().redirect_request(req, fp, code, msg, headers, newurl) | ||
| if new_req is not None: | ||
| hostname = (urlparse(newurl).hostname or "").lower() | ||
| if hostname not in _github_hosts: | ||
| new_req.headers.pop("Authorization", None) | ||
| return new_req | ||
|
|
||
| opener = urllib.request.build_opener(_StripAuthOnRedirect) | ||
| return opener.open(req, timeout=timeout) | ||
|
|
There was a problem hiding this comment.
_make_request() / _open_url() are duplicated (nearly identically) between ExtensionCatalog and PresetCatalog. To reduce the risk of future divergence (especially around auth/redirect edge cases), consider extracting this into a shared helper (e.g., a small internal urllib utility) used by both catalogs.
| assert captured["req"].get_header("Authorization") == "token ghp_testtoken" | ||
|
|
||
| def test_download_extension_sends_auth_header(self, temp_dir, monkeypatch): | ||
| """download_extension passes Authorization header to urlopen for GitHub URLs.""" |
There was a problem hiding this comment.
This test docstring says the auth header is passed to urlopen, but the implementation path with Authorization uses urllib.request.build_opener(...).open(). Adjust the docstring to match the behavior being asserted.
| """download_extension passes Authorization header to urlopen for GitHub URLs.""" | |
| """download_extension passes Authorization header via opener for GitHub URLs.""" |
mnriem
left a comment
There was a problem hiding this comment.
Please address Copilot feedback. If not applicable, please explain why
Description
Fixes #2037. Closes the authentication gap introduced when multi-catalog support landed in #1707.
Before this change, all network requests in
ExtensionCatalogused bareurllib.request.urlopen(url)with no headers. Any catalog or extension ZIP hosted in a private GitHub repository would silently fail with HTTP 404, regardless of whetherGITHUB_TOKENorGH_TOKENwas set in the environment.This PR adds a
_make_request(url)helper onExtensionCatalogthat attaches anAuthorization: token <value>header when:GITHUB_TOKENorGH_TOKENis present in the environment, andraw.githubusercontent.com,github.com, orapi.github.com)Non-GitHub URLs are always fetched without credentials to prevent token leakage to third-party hosts.
The three affected call sites are:
_fetch_single_catalog— fetches catalog JSON from a configured catalog URLfetch_catalog— legacy single-catalog path used whenSPECKIT_CATALOG_URLis setdownload_extension— downloads extension ZIP from a release asset URLNo behavior change for users without a token set — the code path is identical to before.
Documentation in
EXTENSION-USER-GUIDE.mdhas been updated: the existingGH_TOKEN/GITHUB_TOKENtable entry (which described the token as "for downloads" only) now accurately reflects that it covers catalog fetches as well, and a private-catalog usage example has been added.Testing
uv run specify --help— CLI loads correctly, all commands presentmainbefore this change:TestManifestPathTraversal::test_record_file_rejects_absolute_pathTestCommandRegistrar::test_codex_skill_registration_uses_fallback_script_variant_without_init_optionsTestExtensionCatalogintests/test_extensions.py:_make_request: no-token path,GITHUB_TOKEN,GH_TOKENfallback, precedence when both are set, non-GitHub URL never gets header (security),api.github.comdomainurlopenand assert the capturedRequestobject carries the auth header — one for_fetch_single_catalog, one fordownload_extensionAI Disclosure
This PR was implemented with AI assistance via Claude Code.