Skip to content

feat(authz): introduce conditional access control via CEL#4040

Open
matheuscscp wants to merge 1 commit intoproject-zot:mainfrom
matheuscscp:conditional-access-control
Open

feat(authz): introduce conditional access control via CEL#4040
matheuscscp wants to merge 1 commit intoproject-zot:mainfrom
matheuscscp:conditional-access-control

Conversation

@matheuscscp
Copy link
Copy Markdown
Contributor

Closes: #4039

Supersedes: #4036

@codecov
Copy link
Copy Markdown

codecov Bot commented May 6, 2026

Codecov Report

❌ Patch coverage is 96.11307% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.68%. Comparing base (ddb6279) to head (bd53acb).

Files with missing lines Patch % Lines
pkg/api/controller.go 37.50% 2 Missing and 3 partials ⚠️
pkg/api/authz.go 99.14% 1 Missing and 1 partial ⚠️
pkg/api/config/config.go 83.33% 1 Missing and 1 partial ⚠️
pkg/common/http_server.go 71.42% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4040      +/-   ##
==========================================
+ Coverage   91.66%   91.68%   +0.02%     
==========================================
  Files         199      199              
  Lines       28602    28828     +226     
==========================================
+ Hits        26217    26432     +215     
- Misses       1535     1540       +5     
- Partials      850      856       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@matheuscscp matheuscscp force-pushed the conditional-access-control branch from 98d31e3 to 8e79bdb Compare May 7, 2026 08:20
@matheuscscp matheuscscp changed the title Introduce conditional access control feat(authz): introduce conditional access control via CEL May 7, 2026
@rchincha rchincha added this to the v2.1.17 milestone May 8, 2026
Copy link
Copy Markdown
Contributor

@andaaron andaaron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very interesting proposal.

Comment thread pkg/api/controller.go Outdated
Comment thread pkg/api/authz.go Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds conditional access control to Zot’s accessControl policies by introducing CEL-based conditions that must evaluate to true for a policy entry to grant access, with request/user/TLS/network/claims context exposed via a req input object.

Changes:

  • Extend access-control policy model to include conditions (CEL expressions + operator message) and compile them at config-load/startup/hot-reload time.
  • Evaluate policy conditions during authorization and surface an operator-provided deny reason in 403 responses when a condition explicitly evaluates to false.
  • Expose authn-time attributes (OIDC token claims) to authz-time conditions via req.claims, and document the new feature with examples.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pkg/requestcontext/user_access_control.go Adds claim storage on request context and makes glob-pattern storage more robust to setter ordering.
pkg/common/http_server.go Adds AuthzFailWithReason to include a deny reason in DENIED error details.
pkg/cli/server/root.go Validates config by compiling access-control policy conditions at load time.
pkg/cli/server/root_test.go Adds config-load tests ensuring policy conditions decode and invalid CEL fails load.
pkg/cel/expression.go Adds option to declare map<string, dyn> CEL variables for richer runtime typing.
pkg/cel/claim_processor.go Carries raw OIDC claim set through claim processing for later authz use.
pkg/api/controller.go Stores compiled condition programs in the controller and refreshes them on hot reload.
pkg/api/config/config.go Extends access-control policy schema with Conditions and documents the req.* inputs.
pkg/api/authz.go Implements condition compilation, lookup, evaluation, and deny-reason propagation; wires middleware to use it.
pkg/api/authz_internal_test.go Adds focused unit tests covering condition semantics, request-field exposure, and reload recompilation.
pkg/api/authn.go Plumbs OIDC bearer auth results (including claims) into request context and updates AccessController construction.
examples/README.md Documents conditional policy configuration, available req.* fields, and deny-message behavior.
examples/config-policy.json Adds an example conditions configuration under a repository policy.
errors/errors.go Adds ErrPolicyConditionNotCompiled for fail-closed lookups.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/api/authz.go
Comment thread pkg/api/authz.go
@matheuscscp matheuscscp force-pushed the conditional-access-control branch 3 times, most recently from 1a92bbe to 92ecdc0 Compare May 8, 2026 14:31
@matheuscscp
Copy link
Copy Markdown
Contributor Author

@andaaron Fixed your comments here, thanks! Can you pls trigger the CI? 🙏

@rchincha
Copy link
Copy Markdown
Contributor

rchincha commented May 8, 2026

#4045
^ for the binary size increase failure

@andaaron
Copy link
Copy Markdown
Contributor

andaaron commented May 8, 2026

@andaaron Fixed your comments here, thanks! Can you pls trigger the CI? 🙏

Can you please rebase? Last commit on main should be fixing the binary size increase.

@matheuscscp matheuscscp force-pushed the conditional-access-control branch from 92ecdc0 to 5ee9f1f Compare May 8, 2026 16:47
@matheuscscp
Copy link
Copy Markdown
Contributor Author

Done 🙏

@matheuscscp
Copy link
Copy Markdown
Contributor Author

Just pushed a commit to attempt fixing the binary size increase 🙏

@matheuscscp matheuscscp force-pushed the conditional-access-control branch from 7984dbd to cc4d996 Compare May 8, 2026 17:24
@matheuscscp
Copy link
Copy Markdown
Contributor Author

Looks like the binary size increase issue is now fixed, only the bats flake failed this time, can you pls retrigger it? 🙏

https://github.com/project-zot/zot/actions/runs/25569525149/job/75061677582?pr=4040

@andaaron andaaron requested a review from Copilot May 8, 2026 19:59
andaaron
andaaron previously approved these changes May 8, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.

Comment thread pkg/api/authz.go
Comment thread pkg/api/authz.go Outdated
Comment thread pkg/api/config/config.go
@rchincha
Copy link
Copy Markdown
Contributor

rchincha commented May 8, 2026

Just a note that we are really betting that CEL is safe and not subject to CEL-injection attacks etc.

"expression": "req.time < timestamp("2099-12-31T23:59:59Z")",

@rchincha
Copy link
Copy Markdown
Contributor

rchincha commented May 8, 2026

@matheuscscp pls also take a look at copilot reviews.

@matheuscscp
Copy link
Copy Markdown
Contributor Author

Just a note that we are really betting that CEL is safe and not subject to CEL-injection attacks etc.

Yeah it's pretty hermetic, we use it in Flux in at least 3 different features and controllers, they run on very privileged Flux controller pods in Kubernetes. The Kubernetes API Server also runs CEL for several built-in features, including CRD custom validation and the ValidatingAdmissionPolicy API. These were probably influenced from Google Cloud IAM security features. CEL is often used for security features, you can see by the influence chain.

@matheuscscp pls also take a look at copilot reviews.

👍 Looking right now

@rchincha
Copy link
Copy Markdown
Contributor

rchincha commented May 8, 2026

@matheuscscp pls also take a look at copilot reviews.

👍 Looking right now

A few outstanding and they look relevant. Will wait for those to be addressed. Otherwise lgtm.

@matheuscscp
Copy link
Copy Markdown
Contributor Author

@matheuscscp pls also take a look at copilot reviews.

👍 Looking right now

A few outstanding and they look relevant. Will wait for those to be addressed. Otherwise lgtm.

👍 I'm fixing one and replying to the other. I also replied to the first one earlier.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.

Comment thread pkg/api/authz.go
Comment thread pkg/api/authz_internal_test.go
Signed-off-by: Matheus Pimenta <matheuscscp@gmail.com>
@matheuscscp
Copy link
Copy Markdown
Contributor Author

matheuscscp commented May 8, 2026

Hey @andaaron sorry, I think I may have mislclicked something here:

Screenshot from 2026-05-08 23-00-57

@rchincha @andaaron All copilot comments addressed. Among the last two, one was repeated from the first three (the one I implemented). For the other two comments from the first batch and the other one from the second batch I just replied, they are not applicable.

Pls approve CI 🙏

@rchincha rchincha requested a review from andaaron May 8, 2026 22:41
@rchincha
Copy link
Copy Markdown
Contributor

rchincha commented May 9, 2026

Potential risks I see in this change:

  1. /v2/_catalog may over-list repositories
    Conditional policies are intentionally included during glob-time filtering without evaluating conditions, so users may see repos in _catalog that they still cannot access later. That can confuse clients/automation that interpret catalog visibility as effective access.

  2. 403 responses now expose operator-authored deny reasons
    Conditional denies surface conditions[].message in the response body under detail.reason. This is useful, but it can leak internal policy logic, naming conventions, trusted proxy assumptions, or other sensitive details if operators write overly specific messages.

  3. Authz now depends on runtime CEL evaluation
    Access decisions now rely on compiled-condition lookup plus runtime evaluation against request data. The code fails closed, which is good for security, but increases the chance of operational regressions where config/cache/type mismatches result in unexpected denies.

  4. req.claims increases authn/authz coupling
    Policies can now depend on OIDC claims. That’s powerful, but claim shape and typing can vary across providers/environments, making policies more fragile and less portable.

  5. Dynamic typing may cause production-only failures
    Because CEL input is dyn-typed, some mistakes won’t be caught at compile time and may only appear at runtime when a field is missing or differently typed.

  6. Admin policy behavior changes
    Admin access can now be conditionally denied. Misconfigured admin conditions could unintentionally block intended admin flows.

  7. Hot reload/cache correctness is important
    Since compiled conditions are cached and refreshed on reload, any sequencing/staleness issue could lead to stale or missing programs and fail-closed authorization behavior.

Overall I’d call this medium risk: strong feature addition, but with meaningful behavior changes around authz, visibility semantics, and denial surfacing.

@rchincha
Copy link
Copy Markdown
Contributor

rchincha commented May 9, 2026

Security-focused review:

Overall this is a positive hardening feature — conditional authz via CEL, load-time compilation, and fail-closed behavior are all good directions. My main security concerns are around disclosure and operator foot-guns:

  1. Conditional deny messages are now client-visible.
    conditions[].message is surfaced in 403 response details. That is useful for UX, but it also creates a disclosure channel for internal policy logic, naming conventions, network trust assumptions, or identity expectations if operators write detailed messages. I’d recommend treating message as public-facing text in docs, or making client exposure optional.

  2. /v2/_catalog may now leak repo existence.
    Conditional policies are intentionally included optimistically during glob-time filtering, so _catalog may list repos that the caller still cannot actually access later. Even if content access is denied, repo enumeration/naming can itself be sensitive. This should be documented clearly as a security/privacy tradeoff.

  3. req.client.forwardedFor is an operator foot-gun.
    The docs correctly say it is untrusted, but exposing it directly in policy conditions makes it easy to write unsafe rules unless req.client.ip is also checked as a trusted proxy. If possible, I’d add stronger guidance/examples for safe vs unsafe usage, or even validation/linting for conditions that reference forwarded-for without a trusted-peer check.

  4. req.claims increases authz power, but also trust/type ambiguity.
    Claim-based authorization depends heavily on issuer semantics and claim shape stability across environments/providers. Missing or differently typed claims appear to fail closed, which is good, but this still deserves explicit documentation so operators don’t over-trust claim contents.

  5. Admin policy conditions are high impact.
    Allowing admin grants to be conditionally denied is useful, but misconfiguration here can lock out intended recovery/admin paths. I’d suggest calling that out explicitly in docs/release notes.

From a pure security standpoint I like the overall fail-closed model and the fact that internal CEL lookup/eval failures are not leaked to clients. My main ask is stronger guardrails/documentation around what is now intentionally exposed and what operators should treat as untrusted input.

@rchincha
Copy link
Copy Markdown
Contributor

rchincha commented May 9, 2026

| /v2/_catalog may over-list repositories

This is worrisome. Also, does this happen if there is no CEL in the config?

@andaaron
Copy link
Copy Markdown
Contributor

andaaron commented May 9, 2026

Note on /v2/_catalog with conditional policies:

  1. Pattern/user/group “read” semantics are unchanged — _catalog is still filtered by the same repo pattern + action checks as before.
  2. CEL conditions act as an additional per-request gate, but they generally cannot be enforced meaningfully at _catalog time because the evaluation inputs for _catalog differ from real repo operations: there is no tag/digest reference (so req.reference* is empty) and many conditions are only well-defined when authorizing a concrete HEAD/GET/PUT against a specific repo+reference.

Result: _catalog may over-list repositories that will later be denied by conditions (i.e., potential repo-name disclosure depending on how conditions are used).

WRT Authz depending on runtime CEL evaluation - this is true, but it is a decision of the server admin, if he wants to use this feature, he needs to properly test the expressions he configures.

@matheuscscp
Copy link
Copy Markdown
Contributor Author

matheuscscp commented May 9, 2026

| /v2/_catalog may over-list repositories

This is worrisome.

I had long discussions about this with Opus 4.7, and Copilot posted two comments about it. I think the option I chose is the best trade off considering UX, performance, implementation complexity and security. See the summary from Opus 4.7 below. Essentially, users will only see repos but will not have access to them. This is much better than not seeing repos they do have access to. (Exactly what Andrei posted above!)

Also, does this happen if there is no CEL in the config?

Not at all. This is only a thing when using conditions. The current behavior is 100% unchanged and conditions are 100% optional!

Again, no need to worry about CEL, they are safe and designed for security features like this by Google! Google, Kubernetes and Flux deeply trust CEL for features like this.


Summary from Opus 4.7 about the discussion.

● Setup: alice's policy is users:[alice], actions:[read],
conditions:[req.repository.startsWith('prod/')] matching pattern **. The registry has prod/api,
prod/db, staging/web. Alice should only see/pull prod/*.

When alice hits /v2/_catalog, the server has to filter the listing. Three ways to do it:

A — evaluate per-repo at listing time. For each repo, build a fresh evalRequest with that repo
name, evaluate alice's conditions, include if permitted.

  • alice sees exactly prod/api, prod/db. Never staging/web.
  • Cost: O(repos × policies × conditions) per /v2/_catalog call. For a registry with thousands of
    repos this becomes a real latency problem.

B — skip conditions at listing time, include optimistically (what zot does). Alice's policy matches
on user+action so all three repos are candidate-included; conditions are not evaluated.

  • alice sees prod/api, prod/db, staging/web in the catalog.
  • When she runs pull staging/web → 403 with the operator's deny message ("only prod allowed" or
    whatever).
  • Real enforcement happens in ac.can per request, against the real repo/reference. The catalog is
    just a hint.

C — evaluate at listing time with empty placeholders. Set req.repository = "", evaluate
''.startsWith('prod/') → false. All alice's repos hidden.

  • alice sees an empty catalog even though pull prod/api would succeed.
  • Silent under-listing. No 403, no message — alice probably concludes she has no access at all and
    gives up.

Trade-off:

  • A is correct but expensive. Acceptable for small deploys, painful at scale.
  • B is the chosen approach: over-list at worst, never under-list. Every actual operation is
    correctly authorized; the catalog is approximate.
  • C is broken — it actively hides accessible repos with zero feedback.

The two copilot comments worry about A→B (you lost precision) and B→C (you're underspecifying
conditions). The answer to both is the same: per-request enforcement is the source of truth; the
catalog is an optimization, and the failure mode that's been chosen is "alice sometimes sees a repo
she can't actually pull, and gets a clear 403 when she tries" rather than "alice silently can't
see things she can pull".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Conditional access control

4 participants