Skip to content

RFC: Orthogonal Authorization #2921

@jshearer

Description

@jshearer

Context

The current authorization graph is represented by user_grants and role_grants. Each row is an edge:

user_grants: user_id      -> object_role
role_grants: subject_role -> object_role

The current capability column is a scalar read/write/admin value over that edge:

read < write < admin

SQL/RLS/PostgREST authorization logic is largely composed of policies that compare capability >= 'read' or capability >= 'admin'. GraphQL and other control-plane API paths increasingly authorize through the Rust snapshot instead.

This proposal keeps the grant graph as-is (i.e., still uses user_grants and role_grants) and introduces a GraphQL-only capability set on each existing grant edge. The scalar capability column remains the representation used by SQL/RLS/PostgREST. The new capability set is the authorization source for GraphQL authorization.

During migration, the GraphQL capability set is additive: GraphQL authorization combines capabilities derived from the scalar capability with explicit capabilities from capability_set, while SQL/RLS/PostgREST continue to ignore capability_set.

Objectives

  • Preserve existing authorization behavior for current SQL/RLS/PostgREST and existing GraphQL checks.
  • Avoid duplicate grant tables or a second grant graph.
  • Allow new GraphQL APIs to authorize billing, spec editing, grant management, and future powers independently.
  • Avoid a flag day: the GraphQL authorization path can read the new column while existing callers continue to ask for the access they require.
  • Preserve the current security invariant that read/write grants do not become transitive.

Non-Objectives

  • This does not replace SQL/RLS authorization in the initial migration.
  • This does not make PostgREST understand the new capability set.
  • This does not require backfilling every existing grant before use.

Data Model

Add a GraphQL-only capability set to the existing grant rows.

CREATE TYPE internal.authz_capability AS ENUM (
  'catalog_read',
  'journal_append',
  'spec_edit',
  'billing',
  'manage_grants',
  'assume'
);

ALTER TABLE public.user_grants
  ADD COLUMN capability_set internal.authz_capability[] NOT NULL DEFAULT '{}';

ALTER TABLE public.role_grants
  ADD COLUMN capability_set internal.authz_capability[] NOT NULL DEFAULT '{}';

capability_set is the explicit set of orthogonal capabilities stored on the grant edge. This additive schema change allows new GraphQL APIs to authorize against explicit capabilities on rows that already have a current capability. For example, a user can keep current read access while GraphQL also authorizes billing from capability_set.

Some grants should carry authority only in capability_set and should have no SQL/RLS/PostgREST effect. A billing user, for example, may need billing without catalog_read, and therefore should not receive current read. Before writing grants of that shape, make the current capability nullable:

ALTER TABLE public.user_grants
  ALTER COLUMN capability DROP NOT NULL;

ALTER TABLE public.role_grants
  ALTER COLUMN capability DROP NOT NULL;

The existing valid_capability check constraints continue to allow only read, write, and admin. Making capability nullable allows a row to carry authority only in capability_set without adding a neutral enum value.

NOTE: Do not use x_00 as the capability value for a grant whose authority is carried only in capability_set. Some existing SQL calls auth_roles() with the default minimum. A row with capability = 'x_00' would be reachable by those callers. A row with capability = NULL is not reachable through capability >= ... comparisons.

The meanings are:

capability = read/write/admin
  SQL/RLS/PostgREST capability for the row.

capability = NULL
  No SQL/RLS/PostgREST authorization effect.

capability_set = '{}'
  No GraphQL-visible capabilities beyond those derived from `capability`.

capability_set = non-empty array
  Explicit GraphQL-visible capabilities on this grant row, in addition to any
  capabilities derived from `capability`.

The GraphQL-visible capability set for a row is the union of the scalar capability projection and capability_set. This means new GraphQL-only capabilities can be added to existing grant rows without replacing or backfilling the current capability.

PostgREST Isolation

The new column must not be exposed through PostgREST.

Today authenticated has broad table-level privileges on user_grants and role_grants. The migration must replace those table-level privileges with column-level privileges for the columns that PostgREST should continue to expose. The capability_set column should not be granted to authenticated.

Current RLS policies on user_grants and role_grants should also exclude rows that have no current capability. This prevents grants whose authority is carried only in capability_set from appearing in current grant-management views and prevents existing PostgREST mutation paths from updating or deleting them. Insert and update policies should also reject capability IS NULL rows for current PostgREST callers; otherwise a hidden GraphQL-only row could still be created through an old path and occupy the unique grant key. The service role and GraphQL server retain access to the full row.

Capability Set Values

The new values are operation-specific capabilities, not product roles.

Initial values:

catalog_read
journal_append
spec_edit
billing
manage_grants
assume

Checks are literal membership tests:

read catalog/spec/data metadata  -> catalog_read
append through low-level paths    -> journal_append
create or modify specs            -> spec_edit
view or modify billing state      -> billing
modify grants                     -> manage_grants
traverse role_grants              -> assume

Product roles expand into these individual capabilities. They are not stored as roles in capability_set.

Example product profile expansions:

Viewer  = {catalog_read}
Editor  = {catalog_read, spec_edit}
Billing = {billing}
Manager = {manage_grants, billing}
Owner   = {catalog_read, journal_append, spec_edit,
           billing, manage_grants, assume}

The exact profile names and expansions are product/API decisions. The storage model stores the expanded capability list.

manage_grants is an intentionally separate capability. A manager can change grants and may be able to grant additional capabilities to themselves, but the initial grant does not directly authorize catalog reads, journal appends, spec edits, etc. This distinction is useful for product presentation, audit trails, and APIs that want to check whether a user currently has direct catalog or billing authority without treating grant management as an implicit read/edit/billing capability.

Whether a grant-management API allows a manager to escalate themselves is a policy decision for that API. The important storage distinction is that manage_grants does not itself authorize non-grant operations.

Canonical Explicit Capability Sets

Some capability combinations are not valid product states. For example, spec_edit without catalog_read is not a useful editor state for authoring derivations: testing a derivation normally requires flowctl preview, which must read source data. It also does not provide a meaningful data-access boundary, because the same user could publish a materialization that sends the data to a destination they control.

The design does not model these as read-time implications. It also does not use the scalar capability projection to repair an incomplete capability_set array. Instead, the explicit capability set must be canonical on its own.

Initial rule:

spec_edit      requires catalog_read

These rules apply at write time. Authorization reads do not expand or imply capabilities. We should enforce these invariants at the API layer, and possibly also as a check constraint in the database. A non-canonical explicit set such as {spec_edit} is not inherently impossible to evaluate, but it is not a valid product state for this authorization model.

Why normalize at write-time, not read-time

I wasn't able to come up with a hard correctness argument against read-time widening of capabilities: It is technically possible to define an authorization evaluator where spec_edit implies catalog_read at read time. OTOH, it certainly is not required, and the arguments against it come down to the following:

The data is not self-describing

With write-time validation, a grant row contains the complete set of explicit capabilities it authorizes. To answer "what does this grant authorize?", you only need the row itself.

With read-time implication, a grant row contains a partial set. The complete set is the stored capabilities plus whatever implication rules produce at evaluation time. To answer "what does this grant authorize?", you must read the row and apply the implication rules that are current for that evaluator. You cannot determine what the explicit grant authorizes by inspecting the grant alone, nor can you easily answer what it authorized at some point in the past.

Separation of concerns

The traversal algorithm should only handle graph structure: which edges exist, how to follow them, and how to intersect capabilities. The API layer should handle product rules: which explicit capability sets are valid product states.

Read-time normalization breaks that boundary. Traversal must combine an incoming capability set with an edge capability set. If spec_edit implies catalog_read during evaluation, the traversal code must also decide when to apply that rule: to each side before intersection, to the intersection result, or only later when checking a requested operation. Those choices can be specified, but they are extra authorization semantics inside graph traversal. With write-time normalization, there is no phase choice: every edge already stores the capabilities it can carry, and traversal computes only incoming ∩ edge.

Lock-in

Write-time validation is easy to relax later because it constrains what we store now without constraining what we can decide later. If we decide that spec_edit no longer requires catalog_read, we remove the write-time validation rule. Existing grants that already contain both capabilities are unaffected. New grants can omit catalog_read if the product allows it.

Read-time implication is hard to remove later. If spec_edit implies catalog_read at read time and we decide to stop implying it, every grant that relied on the implication to provide catalog_read silently loses that capability. The removal is a breaking change to every affected grant, and there is no way to know which grants were written with the expectation that the implication would supply catalog_read versus which ones happen to also have catalog_read explicitly. A migration would need to scan every grant, apply the old implication rules, and backfill the implied capabilities into the stored set before the implication can be safely removed.

GraphQL Capability Set

GraphQL authorization reads each grant row as a set of capabilities. During the migration, that set is computed by combining the row's current scalar capability with the explicit capability_set stored on the same row.

Scalar capability projection:

capability = read
  -> {catalog_read}

capability = write
  -> {catalog_read, journal_append}

capability = admin
  -> {catalog_read, journal_append, spec_edit}

capability = NULL
  -> {}

GraphQL row capabilities:

graphql_capabilities(row) =
  scalar_projection(row.capability) ∪ row.capability_set

This is a compatibility adapter for existing rows plus an additive extension point for new GraphQL capabilities. It is not a read-time implication rule. After the row's GraphQL capability set has been computed, authorization uses that set exactly. It does not further expand spec_edit into catalog_read, or any other capability into another capability.

The union is what allows a new GraphQL API to add a capability to an existing grant row without migrating the row's scalar capability:

capability = read
capability_set = {catalog_read, spec_edit}

SQL/RLS/PostgREST sees read.
GraphQL authorization sees catalog_read + spec_edit.

catalog_read appears in both inputs to the union in this example. The computed set de-duplicates it, while the explicit capability_set value remains canonical on its own.

As long as a row still has capability = admin, GraphQL authorization checks that read the new column will include the capabilities that admin maps to: catalog_read, journal_append, spec_edit, billing, manage_grants, and assume. Values in capability_set can add GraphQL-visible authority to the row, but they cannot remove authority that still comes from capability. Changing that behavior later requires either changing the row's scalar capability or first backfilling capability_set with the projected capabilities and then removing the scalar projection from GraphQL evaluation.

For a grant whose authority should be carried only in capability_set:

capability = NULL
capability_set = {billing}

SQL/RLS/PostgREST sees no grant.
GraphQL authorization sees billing access.

Critical Path: Direct GraphQL checks first

The existing Rust authorization logic uses the existing transitivity rule: start from the user's direct user_grants, treat each reached grant as effective for names under its object_role, and continue through role_grants only from reached grants whose scalar capability is admin. Reached read, write, and NULL grants remain terminal results; they do not become traversal states.

The first part of this proposal "just" changes how capabilities are computed on each reached row. It computes scalar_projection(capability) ∪ capability_set and tests for the capability required by the caller. Existing callers can continue to ask whether the user has the access they require for a prefix; new callers can ask for orthogonal capabilities such as billing or manage_grants. A capability = NULL row can therefore authorize a GraphQL-only operation such as billing, while still having no SQL/RLS/PostgREST effect and no ability to carry traversal.

Follow-On: Assume Traversal

This proposal reserves the capability assume for future traversal through role_grants. assume is the answer to a different problem: bounded delegation through roles without collapsing the delegated authority back into scalar admin. It is not needed to support direct billing or manage_grants checks.

assume should not mean "take all capabilities of the target role." That interpretation collapses back into the current admin bundle.

The proposed meaning is:

assume permits the current bounded capability set to be carried through role_grants.

Traversal rule:

if incoming capabilities do not contain assume:
  do not traverse role_grants from this state

next capabilities = incoming capabilities ∩ edge capabilities

Examples:

user -> contractor/:  {catalog_read, assume}
contractor/ -> acme/: {catalog_read, spec_edit}

effective on acme/:   {catalog_read}

The user can act through contractor/, but only with the width already present on the incoming grant.

user -> contractor/:  {catalog_read, spec_edit, assume}
contractor/ -> acme/: {catalog_read, spec_edit}

effective on acme/:   {catalog_read, spec_edit}

An editor-shaped assumption can carry editor-shaped authority.

user -> acme/:        current admin projection, including assume
acme/ -> shared/:     {catalog_read}

effective on shared/: {catalog_read}
traversal stops because assume was not on the shared edge

This matches the current behavior where traversal continues only through admin edges but terminal read/write grants can still be effective results.

Traversal Is Per Capability State

Once assume traversal exists, the traversal state must include both the prefix and the capabilities carried to that prefix:

(prefix, capabilities)

The evaluator must not union capabilities from multiple paths to the same prefix before deciding whether traversal can continue.

Otherwise two independent paths could combine depth and width:

path A to prefix: {assume}
path B to prefix: {spec_edit}

If these are unioned before traversal, the algorithm creates {assume, spec_edit} even though no single path granted that combination.

After traversal, effective capabilities for answering "what can the user do at this prefix?" can be unioned across all reached states. During traversal, a state may be pruned only when another state at the same prefix has a superset of its capabilities.

Role Edges Narrow Capability Width

When assume traversal follows a role edge, the next state carries only the capabilities present on both the incoming state and the edge:

incoming: {catalog_read, assume}
edge:     {catalog_read, spec_edit, billing}
next:     {catalog_read}

This rule prevents a role edge from widening authority that was not already present on the incoming path. Future capability additions to a role edge should not automatically affect every existing assumer of that edge. assume controls depth; the remaining capabilities control width.

Migration Plan

Phase 1: Database migration

  • Add capability_set columns, make them invisible to PostgREST, etc.

No existing authorization behavior changes in this phase.

Phase 2: Direct GraphQL Authorization

  • Extend snapshot loading to include capability_set.
  • Update GraphQL authorization to compute scalar_projection(capability) ∪ capability_set.
  • Preserve existing GraphQL check behavior for rows with empty capability_set.
  • Add parity tests showing rows with empty capability_set produce the GraphQL capabilities expected from their scalar capability.
  • Make capability nullable on both grant tables.
  • Write GraphQL-only grants, such as billing-only grants, as capability = NULL and capability_set = {billing}.
  • Keep existing billing/PostgREST surfaces on current admin until migrated or removed.

After this phase, product/API work can migrate independently onto the new authorization path. Billing APIs can ask for billing, publication APIs can ask for spec_edit, and grant-management APIs can ask for manage_grants and write normalized capability_set arrays. Those are consumers of the authorization model, not separate authorization migration seams.

Phase 3: GraphQL-Owned Traversal

  • Add assume traversal with per-state intersection semantics when GraphQL is ready to own grant traversal end-to-end, or earlier if another API needs bounded role delegation.
  • The scalar projection of admin includes assume.
  • The scalar projections of read and write do not include assume.
  • Add graph tests covering current admin-chain behavior, read/write terminal behavior, multi-path non-composition, and edge narrowing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions