fix: redact credentials in logs at the model layer#921
Conversation
Follow-up to #902. A customer reported credentials still leaking into logs, this time via opal_client/data/updater.py (logger.exception serializing the full DataSourceEntry, whose `config` carries auth headers and `data` the payload). A full codebase scan found several more sites with the same pattern. Root cause: log statements interpolate credential-bearing pydantic models (`{entry}`, `{config}`, `model=...`) and loguru's serialize=True sink dumps record["extra"] as JSON, falling back to str() for non-JSON objects. Central fix: a RedactedReprMixin that overrides __repr_args__ on the credential-bearing models so repr()/str() (and thus the loguru serialize path) mask secret fields: - FetcherConfig / HttpFetcherConfig -> headers, data - DataSourceEntry -> config, data - FetchEvent -> config This protects every current and future log site at once. Wire serialization (.json()/.dict()), hashing (calc_hash uses config.json()), dedup keys, and equality are untouched, so transport and behavior are unaffected. Belt-and-suspenders spot fixes (log only safe fields): - opal-client data/updater.py (x3): log entry.url instead of the entry - opal-server data_update_publisher.py: log entry.url instead of the entry - opal-server data/api.py: drop the data_sources_config model from the error - opal-common fastapi_rpc_fetch_provider.py: stop logging rpc_arguments Adds opal-common tests/redaction_test.py covering repr/str redaction, wire-payload integrity, and the loguru serialize=True leak scenario. Note for reviewers: the publish-data-update CLI (cli/commands.py:188) echoes str(update), which now shows config/data as <redacted>. The published payload itself (update.json(), line 180) is unaffected; redacting the stdout echo is intentional since it can land in shell history / CI logs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
✅ Deploy Preview for opal-docs ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
… userinfo) Addresses findings from a multi-agent adversarial review of the redaction fix: - RedactedReprMixin now unions `_redacted_repr_fields` across the whole MRO, so a subclass can only ever ADD fields to redact - it can never silently drop a parent's protection by redeclaring/omitting the set. Prevents the exact failure mode below. - FastApiRpcFetchConfig: redact `rpc_arguments` (a secret-bearing FetcherConfig subclass that was missed - a live leak of the same class as the original bug). - Add http_utils.redact_url() to strip embedded `user:password@` credentials from URLs, and apply it to every `entry.url` log/exception in data/updater.py and data_update_publisher.py (DataSourceEntry.url is a free string and may be a basic-auth URL). - Tests: add nested-container (DataUpdate), RPC-config, diagnose=True, and redact_url cases. - Nits: fix pre-existing "uniqueue" typo, drop unused imports in schemas/data.py. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two leak vectors outside the model layer, found in the expert review:
1. LOG_DIAGNOSE defaulted to True. loguru's diagnose=True renders raw local
variable VALUES in tracebacks (e.g. a bare headers dict, or the Bearer token
returned by opa_client._get_auth_headers), bypassing the model-level repr
redaction entirely. Default flipped to False (matches loguru's own
production guidance); operators can still opt in for local debugging.
2. Git subsystem leaked credentials independently of the data models:
- repo_cloner.py logged the raw clone URL (https://token@github.com/...) and
the GitCommandError text (which embeds the failing command incl. the URL).
- git_fetcher.py (scopes) logged source/remote URLs in many places.
- git_utils/env.py forced GIT_TRACE=1 + GIT_CURL_VERBOSE=1, making git dump
HTTP Authorization headers to captured stderr.
Fix: add http_utils.redact_url_in_text(); redact every git URL log and the
git error text; gate GIT_TRACE/GIT_CURL_VERBOSE behind LOG_DIAGNOSE. The
functional clone_repository() call still receives the real URL.
Adds redact_url_in_text tests.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Second adversarial review pass found residual leaks on the ACTUAL fetch path
that the first sweep missed, plus correctness gaps in redact_url:
- Fetch-path URL logs now redacted: http_fetch_provider._fetch_ (logged
self._url raw), opal_client/data/fetcher.handle_url (3 sites), and
sources/git_policy_source remote_urls.
- redact_url hardened:
* fix IPv6 host (brackets were dropped, producing an invalid URL)
* also mask sensitive query-string params (?token=, ?access_token=, ...),
not just user:password@ userinfo
* username-only userinfo is stripped; non-sensitive URLs returned unchanged
- redact_url_in_text is now regex-based (scrubs any scheme://userinfo@ in the
text), so it catches credentials even when git normalizes/encodes the URL
differently from the value we pass; still also replaces the known URL.
- LOG_DIAGNOSE description now notes it also gates git protocol tracing.
- Tests: IPv6, query-param, username-only, byte-identity, regex-scrub, and
env-gating (GIT_TRACE/GIT_CURL_VERBOSE) cases.
- Nits: drop unused FetcherConfig import; List[str] over list[str]; black/isort.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The docformatter pre-commit hook (v1.7.5) reformats docstrings: capitalizes the summary line and splits summary from body. Pure formatting, no logic change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR hardens credential redaction across OPAL by centralizing model-layer repr()/str() redaction for secret-bearing pydantic models, and by systematically redacting credentialed URLs in log messages/exceptions (plus reducing loguru/git diagnostic leakage vectors).
Changes:
- Introduces
RedactedReprMixinto mask sensitive model fields inrepr()/str()across the MRO (protecting log interpolation +serialize=Truesinks). - Adds URL redaction utilities (
redact_url,redact_url_in_text) and applies them across git/data fetcher logging paths. - Flips
LOG_DIAGNOSEdefault toFalseand gates verbose git tracing env vars behind diagnosis mode; adds regression tests for the above.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/opal-server/opal_server/git_fetcher.py | Redacts repo URLs in debug/info/error logs to prevent credentialed URL leakage. |
| packages/opal-server/opal_server/data/data_update_publisher.py | Avoids logging full entries; logs redacted URL instead. |
| packages/opal-server/opal_server/data/api.py | Stops logging full configuration model on validation failure; logs safer message. |
| packages/opal-common/opal_common/tests/redaction_test.py | Adds regression tests for repr/str redaction, URL redaction, loguru serialize/diagnose, and git env gating. |
| packages/opal-common/opal_common/sources/git_policy_source.py | Redacts remote URLs when reporting repo mismatch / existing remotes. |
| packages/opal-common/opal_common/schemas/data.py | Applies RedactedReprMixin to DataSourceEntry and redacts config/data in repr/str. |
| packages/opal-common/opal_common/logging_utils/redaction.py | Adds the core RedactedReprMixin implementation. |
| packages/opal-common/opal_common/http_utils.py | Adds redact_url() and redact_url_in_text() utilities + sensitive query param handling. |
| packages/opal-common/opal_common/git_utils/repo_cloner.py | Redacts URLs and scrubs credentialed URLs from git error text before logging. |
| packages/opal-common/opal_common/git_utils/env.py | Gates GIT_TRACE/GIT_CURL_VERBOSE behind LOG_DIAGNOSE to avoid auth header leakage. |
| packages/opal-common/opal_common/fetcher/providers/http_fetch_provider.py | Redacts fetch URL in debug logs; marks headers/data as redacted fields on config model. |
| packages/opal-common/opal_common/fetcher/providers/fastapi_rpc_fetch_provider.py | Redacts RPC arguments in repr/str and stops logging them directly. |
| packages/opal-common/opal_common/fetcher/events.py | Applies RedactedReprMixin to FetcherConfig/FetchEvent; redacts config field in event repr/str. |
| packages/opal-common/opal_common/config.py | Changes default LOG_DIAGNOSE to False and clarifies credential-leak rationale. |
| packages/opal-client/opal_client/data/updater.py | Redacts URLs in logs/errors and avoids logging full DataSourceEntry objects. |
| packages/opal-client/opal_client/data/fetcher.py | Redacts URLs in fetcher logs/exceptions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Review: credential-leak sweep is incompleteThe model-layer redaction ( HIGH — still leak credentials:
MEDIUM:
Operational note: flipping |
…ordering, and fragment leaks - redact_url no longer raises from a log/except path: accessing parts.username/.password/.port is what actually validates a URL (urlsplit is lazy), so an out-of-range port raised ValueError straight out of an except block. Guard the credential-detection block. - redact_url_in_text now does the known-URL replacement *before* the userinfo regex; previously the regex rewrote user:pw@ -> ***@ first, destroying the verbatim URL so its query token was never masked. - redact_url now masks sensitive params in the URL *fragment* too, and masks query/fragment values in-place so untouched params keep their exact original encoding (no urlencode %20 -> + normalization). Addresses review comments: - #921 (comment) (@zeevmoney) - #921 (comment) (@zeevmoney) - #921 (comment) (@zeevmoney) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
DataSourceEntry.url and FetchEvent.url can embed credentials
(user:token@host or ?token=...), but RedactedReprMixin only masked
config/data - so whole-model logging (repr/str, loguru serialize) still
leaked the URL. Add a generic _redacted_url_fields set to the mixin that runs
the value through redact_url (keeps host/path visible, unlike blanket
masking), unioned across the MRO like the existing set, and declare
{"url"} on both models. Wire serialization (.dict()/.json()) is untouched.
Addresses review comments:
- #921 (comment) (@zeevmoney)
- #921 (comment) (@zeevmoney)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ond)
Wrap raw URL log/exception sites with redact_url / redact_url_in_text so a
credentialed data-source, policy-repo or callback URL (user:token@host or
?token=...) never reaches the logs:
In-diff inline review:
- fastapi_rpc_fetch_provider: redact self._url in the fetch log line.
- updater._fetch_data: drop the raw {exc} binding (logger.exception already
records the traceback; str(e) on a 3rd-party error can carry a credentialed
URL and would serialize verbatim under LOG_SERIALIZE) and redact the
re-raised message.
- git_policy_source: drop a stray ")" in the "found existing repo" log format.
Out-of-diff sites (PR-level review):
- api_policy_source: redact the 404 bundle URL (log + HTTPException detail)
and the "Fetching changes from remote" log.
- scopes/service + scopes/loader: redact source.url / POLICY_REPO_URL.
- updater: redact entry.url passed to _update_remote_status (it is persisted
into the transaction log) and the error strings; stop logging the raw inline
data payload in calc_hash; redact the data-sources config URL.
- callbacks/reporter: redact the user-supplied callback URLs.
Addresses review comments:
- #921 (comment) (@copilot-pull-request-reviewer)
- #921 (comment) (@copilot-pull-request-reviewer)
- #921 (comment) (@copilot-pull-request-reviewer)
- #921 (comment) (@zeevmoney)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The diagnose test raised a bare ValueError("boom") that never referenced the
model, so loguru's diagnose never rendered it - the assertion held even for an
unprotected model (a tautology). Raise ValueError(entry) so the whole model is
rendered in the traceback. Deliberately do NOT reach into entry.config: loguru
diagnose evaluates that sub-expression and renders the raw dict, the one vector
the mixin documents it cannot cover (verified - the reviewer's suggested
entry.config[...] variant leaks).
Addresses review comments:
- #921 (comment) (@zeevmoney)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
provide_git_ssh_environment early-returns for non-SSH URLs, so it only ever runs for SSH clones - which use key auth, not HTTP Authorization headers. The comment (and the LOG_DIAGNOSE docstring) claimed gating GIT_TRACE/ GIT_CURL_VERBOSE here closes an HTTP Authorization-header leak; it does not (that path is HTTPS-only and is not handled here). Correct the comment, the config docstring and the test docstring to state the real value: reducing verbose SSH protocol/host disclosure in logs. Addresses review comments: - #921 (comment) (@zeevmoney) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Third-party exceptions (most notably aiohttp ClientResponseError) render the full request URL - https://user:pw@host/path?token=... - inside the traceback loguru formats. The model-layer redaction can't reach those (the leaking object isn't an OPAL model), and this fires under default config (backtrace=True). Close the vector at the sink layer: - CredentialScrubbingStream wraps the stdout/stderr sink so the final formatted record (message + exception traceback, and the JSON blob under serialize=True) is run through redact_url_in_text before write. - A loguru patcher additionally scrubs record["message"] for every record, so the optional rotating file sink (which is path-based and can't be wrapped) still gets message-level scrubbing without losing rotation/retention. - Document that OPAL_LOG_DIAGNOSE now defaults to False (operators upgrading opt back in via OPAL_LOG_DIAGNOSE=true). Addresses review comments: - #921 (comment) (@zeevmoney) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@zeevmoney thanks for the out-of-diff sweep — all of it is now addressed. Redacted at the missed call sites (
Third-party exception leak ( LOG_DIAGNOSE default flip: documented in One caveat worth flagging: the sink wrapper scrubs |
CI fixes: - pre-commit / docformatter: reflow the CredentialScrubbingStream summary line so it conforms to docformatter (new file was missed locally because it was untracked when the formatters ran). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Context
Follow-up to #902. A customer reported credentials still leaking into logs after that fix — this time via
opal_client/data/updater.py:logger.exception(...)serialized the fullDataSourceEntry, whoseconfigcarries auth headers anddatathe payload.Rather than patch one line and wait for the next path, this addresses the root cause at the model layer and then sweeps the codebase (with multi-agent adversarial review) for every related vector.
1. Central fix — model-layer redaction
A small
RedactedReprMixin(opal_common/logging_utils/redaction.py) overrides__repr_args__sorepr()/str()— and therefore loguru's serialize path — mask secret-bearing fields. The redaction set is unioned across the MRO, so a subclass can only ever add protection, never silently drop it.FetcherConfig/HttpFetcherConfigheaders,dataFastApiRpcFetchConfigrpc_argumentsDataSourceEntryconfig,dataFetchEventconfigThis protects every current and future log site at once, and propagates transitively to container models (
DataUpdate,DataSourceConfig,DataUpdateReport).Why it's safe (no functional impact)
__repr_args__only feedsrepr()/str(). Every functional path uses something else and is unaffected — verified empirically against pydantic 1.10 and 2.13:update.dict(by_alias=True)/.json(); event reconstruction via.dict().calc_hashusesconfig.json();repos_last_fetchedkeys on source id/url.jsonable_encoder: none call__repr_args__.2. URL credential redaction
DataSourceEntry.urland git repo URLs can embed credentials (https://user:token@host). Addedhttp_utils.redact_url()/redact_url_in_text()and applied them to every URL log/exception indata/updater.py,data_update_publisher.py,repo_cloner.py, andgit_fetcher.py. The functionalclone_repository()call still gets the real URL.3. Traceback leak —
LOG_DIAGNOSEdefault flipped toFalseloguru's
diagnose=Truerenders raw local-variable values in tracebacks (e.g. a bareheadersdict, oropa_client._get_auth_headers()'s Bearer token), which bypasses model-level redaction entirely. Default flippedTrue→False(matches loguru's own production guidance); operators can opt back in for local debugging.4. Verbose git tracing gated
git_utils/env.pyforcedGIT_TRACE=1+GIT_CURL_VERBOSE=1on every SSH clone, making git dump HTTPAuthorizationheaders to captured stderr. Now only enabled whenLOG_DIAGNOSEis on.Spot fixes (belt-and-suspenders)
Sites that logged whole credential-bearing objects now log only safe, redacted fields:
data/updater.py(×3),data_update_publisher.py,data/api.py(dropped the config model),fastapi_rpc_fetch_provider.py(stopped loggingrpc_arguments).The
publish-data-updateCLI (cli/commands.py:188) echoesstr(update), which now rendersconfig='<redacted>' data='<redacted>'. The published payload is unchanged (usesupdate.json()). Redacting the stdout echo is intentional — CLI output lands in shell history / CI logs.Tests
opal-common/.../tests/redaction_test.pycovers: repr/str redaction (incl. f-strings & nestedDataUpdate), the RPC subclass, wire-payload integrity, loguruserialize=Trueanddiagnose=True, the MRO-union footgun, andredact_url/redact_url_in_text. Verified locally against pydantic 1.10 + loguru: 58 tests pass (new + #902 reporter regression + opal-common + git_utils suites).Review
Reviewed by a 4-agent adversarial panel (security/leak-completeness, pydantic-internals, behavioral-regression, code-quality); all findings above were surfaced and addressed.
🤖 Generated with Claude Code