Skip to content

[OTAGENT-980] Reference v1.0.5 WorkloadAllowlist exemption when OTel collector is enabled#3022

Open
songy23 wants to merge 3 commits into
mainfrom
yang.song/OTAGENT-980-ddot-collector-allowlist
Open

[OTAGENT-980] Reference v1.0.5 WorkloadAllowlist exemption when OTel collector is enabled#3022
songy23 wants to merge 3 commits into
mainfrom
yang.song/OTAGENT-980-ddot-collector-allowlist

Conversation

@songy23
Copy link
Copy Markdown
Member

@songy23 songy23 commented May 15, 2026

Summary

Mirror of DataDog/helm-charts#2667 for the operator-managed AllowlistSynchronizer. When the OTel collector is enabled on GKE Autopilot, the otel-agent (ddot-collector image) container's hostPath mounts (e.g. /var/run/containerd) are rejected by Warden because no current partner exemption permits them.

This PR has CreateAllowlistSynchronizer append the v1.0.5 exemption path whenever the rendered pod template contains the otel-agent container. The matching v1.0.5 exemption YAML is published separately in datadog-gke-workload-allowlist.

Follow-up to OTAGENT-448. Issue: OTAGENT-980.

Changes

  • pkg/allowlistsynchronizer/allowlistsynchronizer.go:
    • New ComputeAllowlistPaths(otelCollectorEnabled bool) returns the right set of exemption paths.
    • CreateAllowlistSynchronizer takes the otelCollectorEnabled flag and uses it.
  • internal/controller/datadogagent/experimental/autopilot.go:
    • New hasOtelAgentContainer helper inspects the rendered PodTemplateSpec for the otel-agent container name and passes the flag into CreateAllowlistSynchronizer.
  • New unit tests cover ComputeAllowlistPaths, newAllowlistSynchronizer, and hasOtelAgentContainer.

Note: The operator only references v1.0.1 today (helm-charts already references v1.0.1/v1.0.2/v1.0.3). Bringing the operator fully in sync with helm-charts is out of scope here — this PR only addresses OTAGENT-980 by adding v1.0.5.

Test plan

  • go test ./internal/controller/datadogagent/experimental/... ./pkg/allowlistsynchronizer/... -v passes
  • go build ./... succeeds
  • CI green
  • End-to-end: deploy a DatadogAgent with features.otelCollector.enabled: true on GKE Autopilot and verify Warden admits the pod (depends on the matching v1.0.5 partner allowlist YAML being published)

🤖 Generated with Claude Code

…l collector is enabled

When the OTel collector is enabled on GKE Autopilot, the otel-agent
(ddot-collector image) container's hostPath mounts (e.g. /var/run/containerd)
are rejected by Warden because no current partner exemption permits them. Add
the v1.0.5 exemption path to the AllowlistSynchronizer that the operator
provisions whenever the DatadogAgent renders an otel-agent container.

The matching v1.0.5 exemption YAML is published separately in the partner
allowlist repo. Mirrors the helm-charts change in OTAGENT-980 (follow-up to
OTAGENT-448).

Changes:
- ComputeAllowlistPaths returns the set of exemption paths for the
  synchronizer, appending v1.0.5 when otelCollectorEnabled is true.
- CreateAllowlistSynchronizer takes an otelCollectorEnabled flag.
- applyExperimentalAutopilotOverrides detects the otel-agent container in
  the rendered pod template and passes the flag through.
- New unit tests cover ComputeAllowlistPaths and hasOtelAgentContainer.

Refs: OTAGENT-980

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 15, 2026

Codecov Report

❌ Patch coverage is 93.93939% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 41.75%. Comparing base (20ecb9e) to head (9437b3c).

Files with missing lines Patch % Lines
pkg/allowlistsynchronizer/allowlistsynchronizer.go 92.59% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3022      +/-   ##
==========================================
+ Coverage   41.50%   41.75%   +0.24%     
==========================================
  Files         335      335              
  Lines       28714    28735      +21     
==========================================
+ Hits        11919    11998      +79     
+ Misses      16001    15936      -65     
- Partials      794      801       +7     
Flag Coverage Δ
unittests 41.75% <93.93%> (+0.24%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
.../controller/datadogagent/experimental/autopilot.go 40.67% <100.00%> (+40.67%) ⬆️
pkg/allowlistsynchronizer/allowlistsynchronizer.go 79.71% <92.59%> (+79.71%) ⬆️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 20ecb9e...9437b3c. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@datadog-datadog-prod-us1
Copy link
Copy Markdown

datadog-datadog-prod-us1 Bot commented May 15, 2026

Code Coverage

🎯 Code Coverage (details)
Patch Coverage: 96.30%
Overall Coverage: 42.06% (+0.24%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 9437b3c | Docs | Datadog PR Page | Give us feedback!

@songy23
Copy link
Copy Markdown
Member Author

songy23 commented May 15, 2026

GKE Autopilot plumbing test

Tested on a fresh GKE Autopilot cluster (otagent-980-test in datadog-sandbox, GKE 1.35.3-gke.1737000, rapid channel) by invoking allowlistsynchronizer.CreateAllowlistSynchronizer(...) directly from a small program built against this branch, pointed at the cluster's kubeconfig. Goal was to verify the operator-side plumbing — the matching v1.0.5 partner exemption YAML has not been published yet, so Ready pods are out of scope.

CreateAllowlistSynchronizer(otelCollectorEnabled=true):

$ kubectl get allowlistsynchronizer datadog-synchronizer -o jsonpath='{.spec.allowlistPaths}'
[
  "Datadog/datadog/datadog-datadog-daemonset-exemption-v1.0.1.yaml",
  "Datadog/datadog/datadog-datadog-daemonset-exemption-v1.0.5.yaml"
]

$ kubectl get allowlistsynchronizer datadog-synchronizer -o jsonpath='{.status.conditions[0]}'
{
  "reason": "SyncError",
  "status": "False",
  "message": "some allowlist paths failed to sync: path Datadog/datadog/datadog-datadog-daemonset-exemption-v1.0.5.yaml not found"
}

As expected, v1.0.5 is the only path that fails because the partner YAML is not yet published — that's the prerequisite called out in the PR description. v1.0.1 syncs (phase: Installed).

CreateAllowlistSynchronizer(otelCollectorEnabled=false):

$ kubectl get allowlistsynchronizer datadog-synchronizer -o jsonpath='{.spec.allowlistPaths}'
["Datadog/datadog/datadog-datadog-daemonset-exemption-v1.0.1.yaml"]

v1.0.5 correctly absent; matches the operator's prior baseline (v1.0.1 only).

Result: PASS. The refactor correctly threads the otelCollectorEnabled flag through, and ComputeAllowlistPaths produces the right path set in both branches. Final end-to-end (Warden admitting the otel-agent pod) is blocked on the matching v1.0.5 YAML being published in datadog-gke-workload-allowlist.

Cluster has been deleted.

@songy23 songy23 added this to the v1.29.0 milestone May 15, 2026
Extract reconcileAllowlistSynchronizer from CreateAllowlistSynchronizer so the
Get/Create reconciliation can be exercised against a fake client (the k8s
plumbing wrapper that loads kubeconfig and builds the rest client remains
out of scope for unit tests).

In the experimental autopilot package, expose a package-level
createAllowlistSynchronizer var so tests can stub the synchronizer creator
and cover applyExperimentalAutopilotOverrides without hitting kubeconfig.

New tests:
- TestReconcileAllowlistSynchronizer: covers the create-when-absent,
  no-op-when-exists, Get-error, AlreadyExists, and other-Create-error
  branches (100% line coverage).
- TestCreateAllowlistSynchronizerResource: verifies the rendered resource
  for both the OTel-enabled and disabled cases.
- TestApplyExperimentalAutopilotOverrides_SynchronizerInvocation: asserts
  the autopilot override path invokes the synchronizer creator with the
  correct otelCollectorEnabled flag derived from the pod template.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@songy23 songy23 marked this pull request as ready for review May 15, 2026 17:12
@songy23 songy23 requested a review from a team May 15, 2026 17:12
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7f3b538ac1

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread pkg/allowlistsynchronizer/allowlistsynchronizer.go Outdated
Address PR review (#3022): the previous early-return on Get success meant
that an AllowlistSynchronizer left over from an older operator version (or
from a prior install with OTel disabled) would never get the v1.0.5 path
added when OTel is enabled later. Warden would then keep rejecting the
otel-agent pod even though the operator had reconciled.

reconcileAllowlistSynchronizer now compares the existing
spec.allowlistPaths against ComputeAllowlistPaths(otelCollectorEnabled) and
issues an Update when they differ. Same direction in reverse: if OTel is
disabled later, v1.0.5 is dropped from the spec.

New test cases:
- no-op when existing paths match desired
- update when stale paths are missing v1.0.5
- update when stale paths still have v1.0.5 after OTel is disabled
- silent return when Update fails

Refs: OTAGENT-980

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants