Skip to content

fix(rbac): reduce pod permissions and add cert validity opt-in#892

Open
d-padmanabhan wants to merge 2 commits intoaws:mainfrom
d-padmanabhan:fix/rbac-webhook-cert-validity-optin
Open

fix(rbac): reduce pod permissions and add cert validity opt-in#892
d-padmanabhan wants to merge 2 commits intoaws:mainfrom
d-padmanabhan:fix/rbac-webhook-cert-validity-optin

Conversation

@d-padmanabhan
Copy link
Copy Markdown
Contributor

What type of PR is this?

bug

Which issue does this PR fix:

None linked. Split from the original hardening PR to isolate RBAC and webhook behavior changes.

What does this PR do / Why do we need it:

This PR focuses on the higher-risk changes that need deeper validation.

  • Reduces controller pod RBAC permissions to least privilege
    • removes broad pod create/delete and finalizer rules
    • keeps required pods/status update and patch access
  • Keeps webhook behavior behind explicit enablement
    • aligns Helm templates and controller config for WEBHOOK_ENABLED
  • Makes webhook certificate validity configurable and backward-compatible
    • adds webhookTLS.validityDays in Helm values
    • default is 36500 days to preserve existing behavior
    • supports opt-in 365 days for annual rotation
  • Updates webhook cert generation script
    • scripts/gen-webhook-secret.sh now supports WEBHOOK_CERT_VALIDITY_DAYS override
    • default remains 36500
  • Documents annual rotation guidance for 365 day opt-in in
    docs/guides/pod-readiness-gates.md

If an issue # is not available please add repro steps and logs from aws-gateway-controller showing the issue:

Not applicable. No controller-runtime bug reproduction. This is a scoped hardening and configuration change.

Testing done on this change:

  • go test ./...
  • bash -n scripts/gen-webhook-secret.sh
  • helm template gateway-api-controller ./helm
  • make e2e-test attempted but blocked by environment
  • make webhook-e2e-test attempted but blocked by environment

e2e blocker output:

CLUSTER_VPC_ID environment variable must be set to run integration tests
An error occurred (InvalidClientTokenId) when calling the GetCallerIdentity operation:
The security token included in the request is invalid

Automation added to e2e:

No.

Will this PR introduce any new dependencies?:

No.

Will this break upgrades or downgrades. Has updating a running cluster been tested?:

Default behavior remains unchanged because generated cert validity still defaults to 36500 days.
Clusters that opt in to webhookTLS.validityDays=365 must plan annual certificate rotation.

Does this PR introduce any user-facing change?:

Yes. Helm users can now configure webhook cert validity via webhookTLS.validityDays.

Add webhookTLS.validityDays Helm value for generated webhook certificates.
Default remains 36500 days for backward compatibility.
Users can set 365 days to opt in to annual certificate rotation.

Do all end-to-end tests successfully pass when running make e2e-test?:

No in this environment due missing cluster context and invalid AWS auth token.

CLUSTER_VPC_ID environment variable must be set to run integration tests
InvalidClientTokenId for sts:GetCallerIdentity

- Limit pods to get/list/watch
- Keep only pods/status update/patch where needed
- Remove unused pods/finalizers rules
Use webhookTLS.validityDays for generated webhook certificates and default
to 36500 days to preserve existing behavior. Document 365-day opt-in and
annual rotation guidance. Allow gen-webhook-secret.sh validity override
via WEBHOOK_CERT_VALIDITY_DAYS.
@d-padmanabhan d-padmanabhan marked this pull request as ready for review February 16, 2026 01:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant