Skip to content

feat(k8s): add kubernetes deployment manifests and documentation#531

Open
b-mackenzie-alexander wants to merge 3 commits intoplastic-labs:mainfrom
b-mackenzie-alexander:feat/kubernetes-deployment
Open

feat(k8s): add kubernetes deployment manifests and documentation#531
b-mackenzie-alexander wants to merge 3 commits intoplastic-labs:mainfrom
b-mackenzie-alexander:feat/kubernetes-deployment

Conversation

@b-mackenzie-alexander
Copy link
Copy Markdown
Contributor

@b-mackenzie-alexander b-mackenzie-alexander commented Apr 8, 2026

Summary

  • Adds /k8s directory with Kustomize-based manifests for deploying all Honcho services on Kubernetes
  • Adds tests/test_k8s_manifests.py — a self-contained pytest suite that validates the manifests without a running cluster

Motivation

Honcho ships a docker-compose.yml.example but has no supported path for teams already running Kubernetes infrastructure. This is the gap tracked in #514.

What's included

Services

  • postgres — StatefulSet (pgvector/pgvector:pg15, 10 Gi PVC) + headless Service; differentiated readiness (exec pg_isready) and liveness (tcpSocket) probes
  • redis — StatefulSet (redis:8.2, 2 Gi PVC) + headless Service; differentiated probes
  • honcho-api — Deployment (no static replicas — HPA-managed), ClusterIP Service, HPA (1–5 replicas at 70% CPU), PDB (maxUnavailable: 1); httpGet readiness + tcpSocket liveness to avoid the identical-probe anti-pattern
  • honcho-deriver — Deployment with no HTTP probe — the deriver is a queue worker, not an HTTP server; health is managed by the Deployment restart policy

Security

  • NetworkPolicies: default-deny ingress + explicit allow rules; postgres and redis reachable only from API and deriver pods
  • Pod hardening: runAsNonRoot, readOnlyRootFilesystem, allowPrivilegeEscalation: false, capabilities.drop: ALL, seccompProfile: RuntimeDefault, automountServiceAccountToken: false
  • Init containers (busybox) run as UID/GID 65534 (nobody/nogroup)
  • Sensitive values (DB_CONNECTION_URI, AUTH_JWT_SECRET, LLM keys) in a gitignored Secret; non-sensitive config in a ConfigMap

Documentation

  • k8s/README.md — prerequisites, per-distro image loading (kind, k3s, Docker Desktop), secrets setup, deploy, verify, useful commands, autoscaling, networking, and production notes (managed DB/cache, external secrets, Ingress, TLS, observability)

Test plan

  • Manifest unit tests (no cluster required — requires kubectl on PATH):

    pytest tests/test_k8s_manifests.py --noconftest -p no:xdist --override-ini="addopts=" -v

    31 tests: namespace isolation, NetworkPolicy exact source/port assertions, headless Services, HPA contract, PDB semantics, deriver probe absence, API probe differentiation, security contexts, init container hardening, secret/ConfigMap split, and resource limits.

  • Schema validation (requires kubeconform):

    kubectl kustomize k8s/ | kubeconform -strict -summary
  • Security scoring (requires kube-score):

    kubectl kustomize k8s/ | kube-score score -
  • End-to-end (requires a local k8s cluster):

    1. docker build -t honcho:latest .
    2. Load image per README instructions (kind / k3s / Docker Desktop)
    3. cp k8s/secrets.yaml.example k8s/secrets.yaml and fill in values
    4. kubectl apply -f k8s/namespace.yaml && kubectl apply -f k8s/secrets.yaml && kubectl apply -k k8s/
    5. kubectl get pods -n honcho --watch — all pods reach Running
    6. kubectl port-forward svc/honcho-api 8000:80 -n honcho + curl http://localhost:8000/openapi.json
    7. kubectl logs deploy/honcho-deriver -n honcho — queue worker starts, no health probe errors

References #514

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Full Kubernetes deployment for Honcho: API, deriver, Postgres, Redis, Services, StatefulSets, HPA (1–5), and PodDisruptionBudget
    • Namespace-wide NetworkPolicies enforcing default-deny ingress with scoped allow rules
    • ConfigMap-driven runtime config and a secrets template for local secret management
  • Documentation

    • Add end-to-end k8s README with quickstart, verification, and production guidance
  • Tests

    • Added manifest validation tests to verify structure, security posture, and networking
  • Chores

    • Ignore local k8s secrets in .gitignore

b-mackenzie-alexander and others added 2 commits April 8, 2026 19:10
Adds /k8s with Kustomize-based manifests for all Honcho services,
addressing the k8s deployment gap documented in plastic-labs#514:

  - postgres: StatefulSet (pgvector/pgvector:pg15, 10 Gi PVC) with
    headless Service, differentiated readiness (exec pg_isready) and
    liveness (tcpSocket) probes
  - redis: StatefulSet (redis:8.2, 2 Gi PVC) with headless Service,
    differentiated probes
  - honcho-api: Deployment (HPA-managed replicas), ClusterIP Service,
    HPA (1–5 replicas at 70% CPU), PDB (maxUnavailable: 1); httpGet
    readiness + tcpSocket liveness
  - honcho-deriver: Deployment with no HTTP probe — the deriver is a
    queue worker; health managed by the Deployment restart policy

Security posture:
  - NetworkPolicies: default-deny ingress + explicit allow rules;
    postgres and redis reachable only from API and deriver pods
  - Pod security: runAsNonRoot, readOnlyRootFilesystem,
    allowPrivilegeEscalation: false, capabilities.drop: ALL,
    seccompProfile: RuntimeDefault, automountServiceAccountToken: false
  - Init containers (busybox wait-for-deps) run as UID/GID 65534
  - Secrets (DB URI, JWT secret, LLM keys) in a gitignored Secret;
    non-sensitive config in a ConfigMap

README covers prerequisites, per-distro image loading (kind, k3s,
Docker Desktop), secrets setup, deploy, verify, useful commands,
autoscaling, networking, and production notes.

References plastic-labs#514

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds tests/test_k8s_manifests.py — a pytest suite that renders the
Kustomize output with kubectl kustomize and asserts structural
correctness, security posture, and architectural invariants without
requiring a running cluster.

Run with:
  pytest tests/test_k8s_manifests.py --noconftest -p no:xdist \
    --override-ini="addopts=" -v

31 tests covering: namespace isolation, all 4 NetworkPolicies with
exact source/port assertions, headless Services, HPA exact contract
(minReplicas=1, maxReplicas=5, CPU=70%), PDB maxUnavailable semantics,
deriver has no HTTP probes, API uses differentiated probe types,
pod/container security contexts, init containers run as nobody,
no service account tokens, ConfigMap contains no secret keys, DB URI
sourced from Secret, and all containers have resource limits.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 8, 2026

Walkthrough

This PR introduces complete Kubernetes infrastructure for Honcho deployment: namespace, Kustomize entrypoint, ConfigMap/Secret templates, NetworkPolicies, API and deriver Deployments, Postgres/Redis StatefulSets, autoscaling/PDB, manifest tests, and supporting docs and examples.

Changes

Cohort / File(s) Summary
Kustomize entry & namespace
k8s/kustomization.yaml, k8s/namespace.yaml
Add Kustomize entrypoint composing all honcho resources and a dedicated honcho Namespace. Secrets are excluded from kustomize and gitignored.
Configuration & Secrets
k8s/configmap.yaml, k8s/secrets.yaml.example, .gitignore
Add honcho-config ConfigMap, a honcho-secrets example template, and gitignore rule to omit an exported k8s/secrets.yaml.
API workload
k8s/api/deployment.yaml, k8s/api/service.yaml, k8s/api/hpa.yaml, k8s/api/pdb.yaml
Introduce honcho-api Deployment (no replicas set), hardened security context, init containers waiting for Postgres/Redis, probes, ClusterIP Service, HPA (1–5, CPU 70%), and PDB (maxUnavailable: 1).
Deriver workload
k8s/deriver/deployment.yaml
Add honcho-deriver Deployment (1 replica) running python -m src.deriver, init containers for dependencies, and security/resource settings aligned with API.
Datastores
k8s/postgres/statefulset.yaml, k8s/postgres/service.yaml, k8s/postgres/configmap.yaml, k8s/redis/statefulset.yaml, k8s/redis/service.yaml
Provision Postgres StatefulSet (pgvector image) with 10Gi PVC, init SQL ConfigMap, headless Service; add Redis StatefulSet with 2Gi PVC and headless Service; both include health probes and resource requests/limits.
Networking & Security
k8s/network-policies.yaml
Namespace-wide default-deny ingress plus scoped allow policies: API (TCP 8000), Postgres (TCP 5432 from API/deriver), Redis (TCP 6379 from API/deriver).
Tests & Docs
tests/test_k8s_manifests.py, k8s/README.md
Add pytest suite to validate kustomize-rendered manifests (structure, security posture, policy semantics, probes, resource declarations) and a detailed README documenting local-to-cluster workflow, testing, verification, and production notes.

Sequence Diagram(s)

sequenceDiagram
    participant Kubectl as Kubectl/kustomize
    participant Kustomize as Kustomize (render)
    participant API as honcho-api Pod
    participant Deriver as honcho-deriver Pod
    participant Postgres as Postgres StatefulSet
    participant Redis as Redis StatefulSet

    Kubectl->>Kustomize: render k8s/
    Kustomize-->>Kubectl: YAML manifests

    Kubectl->>API: apply Deployment
    API->>Postgres: initContainer polls TCP 5432
    Postgres-->>API: TCP ACK
    API->>Redis: initContainer polls TCP 6379
    Redis-->>API: TCP ACK
    API->>API: start main container (probes configured)
    API-->>Kubectl: Pod Ready
Loading
sequenceDiagram
    participant Client as External Client
    participant NP as NetworkPolicy
    participant API as honcho-api
    participant Deriver as honcho-deriver
    participant PG as postgres
    participant RS as redis

    Client->>NP: Request -> API:8000
    NP->>API: allow-api-ingress (TCP 8000)
    API-->>Client: Respond

    API->>NP: Request -> PG:5432
    NP->>PG: allow-postgres-from-honcho (from API/Deriver)
    PG-->>API: Respond

    Deriver->>NP: Request -> RS:6379
    NP->>RS: allow-redis-from-honcho (from API/Deriver)
    RS-->>Deriver: Respond

    Client->>NP: Direct Request -> PG:5432
    NP-->>Client: Deny (default-deny-ingress)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • dr-frmr
  • VVoruganti

Poem

🐰 Hopped in with manifests tidy and bright,

Pods tuck their tails in secure, cozy night.
StatefulSets hum and policies sing,
Honcho wakes up — ready to bring spring!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 68.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding Kubernetes deployment manifests and documentation to the k8s/ directory. It accurately reflects the primary objective without being vague or misleading.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (3)
k8s/deriver/deployment.yaml (1)

82-82: Prefer an immutable image reference over latest.

Line 82 uses a mutable tag, which can produce non-deterministic rollouts.

Suggested fix
-          image: honcho:latest
+          image: honcho:<immutable-tag-or-digest>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@k8s/deriver/deployment.yaml` at line 82, Replace the mutable image tag
"honcho:latest" in the Deployment spec's container image reference with an
immutable reference (a specific version tag or an image digest) so deployments
are deterministic; update the container image field (where image: honcho:latest
is set) to use either honcho:<semver> or honcho@sha256:<digest> and ensure any
CI/image-build output is synchronized to provide that exact tag/digest when
applied.
tests/test_k8s_manifests.py (1)

511-519: Extend the resource assertions to cover ephemeral-storage.

The manifests in this PR set ephemeral-storage requests/limits on workloads, but this helper only guards CPU and memory. A regression there would currently pass the suite.

♻️ Proposed fix
 def _assert_has_resource_limits(container: dict[str, Any], label: str):
     resources = container.get("resources", {})
     assert "limits" in resources, f"{label}: missing resource limits"
     assert "requests" in resources, f"{label}: missing resource requests"
     for kind in ("limits", "requests"):
-        for field in ("memory", "cpu"):
+        for field in ("memory", "cpu", "ephemeral-storage"):
             assert field in resources[kind], (
                 f"{label}: missing resources.{kind}.{field}"
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_k8s_manifests.py` around lines 511 - 519, The helper
_assert_has_resource_limits currently checks only "memory" and "cpu"; update it
to also require "ephemeral-storage" in both resources["limits"] and
resources["requests"]. In the function _assert_has_resource_limits(container:
dict[str, Any], label: str) add "ephemeral-storage" to the tuple of fields
iterated (or otherwise assert its presence for both "limits" and "requests") so
the test fails if ephemeral-storage is missing from either limits or requests.
k8s/api/deployment.yaml (1)

95-101: Pin the API image to an immutable tag or digest.

honcho:latest with IfNotPresent makes rollouts non-reproducible and can leave nodes on stale bits after the same tag is rebuilt. Prefer a versioned tag or digest here, then override it via Kustomize for local workflows if needed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@k8s/api/deployment.yaml` around lines 95 - 101, The deployment uses an
mutable image tag "honcho:latest" with imagePullPolicy IfNotPresent; update the
container image field in the Deployment to a fixed, immutable identifier (e.g. a
versioned tag like honcho:v2.1.1 or an image digest like honcho@sha256:...) and
retain or set imagePullPolicy appropriately (IfNotPresent is fine for immutable
tags); if you need local overrides, handle environment-specific image
substitution via Kustomize overlays or tooling rather than leaving the manifest
pinned to :latest.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@k8s/api/deployment.yaml`:
- Around line 208-214: The securityContext currently sets runAsNonRoot: true but
the image switches to a named user ("app") created with adduser --system, which
can cause pod rejection; either set an explicit numeric UID via runAsUser in the
deployment's securityContext (e.g., runAsUser: 1001) to match the image, or
change the Dockerfile to create and switch to a numeric user (use adduser --uid
1001 ... and USER 1001) so runAsNonRoot is reliable; update references to
securityContext, runAsNonRoot, runAsUser, and the Dockerfile USER/adduser lines
accordingly.

In `@k8s/api/hpa.yaml`:
- Around line 4-6: The two lines about k3s in the metrics-server comment
conflict; edit the comment block in k8s/api/hpa.yaml to keep one authoritative
statement (preferably "k3s: included by default") and remove the conflicting
"apply metrics-server" instruction so operators only see a single clear
prerequisite for k3s; update the surrounding comment text accordingly to
preserve Docker Desktop and kind guidance.

In `@k8s/deriver/deployment.yaml`:
- Around line 29-33: The init container wait loops using "until nc -z postgres
5432; do" (and the second similar loop at lines 58-62) must be converted to
bounded waits that fail fast: add a timeout mechanism (e.g., a MAX_RETRIES
counter or use the timeout command) that increments each loop iteration and, if
exceeded, prints an error and exits non-zero so the pod fails instead of hanging
in Init; update both occurrences (the loop starting with "until nc -z postgres
5432; do" and the later identical wait loop) to implement this behavior and
return a non-zero exit when the timeout is reached.

In `@k8s/postgres/statefulset.yaml`:
- Around line 19-21: The Postgres StatefulSet currently allows automatic
mounting of the service-account token; add automountServiceAccountToken: false
to the pod spec to disable it. Locate the StatefulSet pod template (under
spec.template.spec) that contains the containers list with name: postgres and
add the field automountServiceAccountToken: false at that same spec level so the
Postgres pods do not receive Kubernetes API credentials.

In `@k8s/README.md`:
- Line 140: The sentence "Kustomize applies all resources in dependency order."
overstates kubectl apply -k guarantees; change that line to state that Kustomize
assembles and applies resources but does not guarantee dependency ordering, and
suggest concrete alternatives such as using kubectl apply -k with --wait or
kubectl wait (or readiness probes/InitContainers/helm hooks) to ensure pods come
up in the desired order; update the text that follows "Watch pods come up:" to
reflect these recommendations.

In `@k8s/redis/statefulset.yaml`:
- Around line 19-23: The Pod spec for the Redis StatefulSet currently leaves
automounting the service account token enabled; update the Pod spec (under
spec.template.spec) to explicitly disable it by setting
automountServiceAccountToken to false so the Redis pod (container name: redis,
image: redis:8.2) does not mount API credentials. Locate the StatefulSet pod
template (spec.template.spec) that contains the containers array and add the
automountServiceAccountToken: false field at that same spec level to ensure
tokens are not mounted.

In `@k8s/secrets.yaml.example`:
- Around line 25-27: The comment incorrectly says the hostname 'postgres'
resolves to a ClusterIP Service while the manifest actually creates a headless
Service (clusterIP: None); update the comment above DB_CONNECTION_URI to reflect
that 'postgres' resolves via the headless Service DNS (direct pod endpoints) or
simply state "resolves to the headless Service 'postgres' (clusterIP: None)" so
it correctly matches the Service configuration and expected DNS behavior.
- Around line 35-39: The documentation is inconsistent: optional secret keys
LLM_ANTHROPIC_API_KEY, LLM_GEMINI_API_KEY, and LLM_OPENAI_API_KEY are shown as
empty strings (which still inject empty env vars) while the text says "unset
keys will simply not be passed." Update k8s/secrets.yaml.example to either
remove/comment out those key lines (or use YAML null) so unset keys are truly
omitted, and change the explanatory sentence to instruct users to remove or
comment the LLM_ANTHROPIC_API_KEY / LLM_GEMINI_API_KEY / LLM_OPENAI_API_KEY
entries to avoid injecting empty environment variables.

In `@tests/test_k8s_manifests.py`:
- Around line 27-37: The subprocess.run call that invokes kubectl kustomize
should include a timeout and explicitly handle subprocess.TimeoutExpired; update
the call in tests/test_k8s_manifests.py (the code around subprocess.run(...)
that assigns result) to pass a sensible timeout (e.g., timeout=30) and add an
except subprocess.TimeoutExpired block that fails the test with
pytest.fail("kubectl kustomize timed out") (or similar), while still keeping the
existing FileNotFoundError handling and the final assert on result.returncode.

---

Nitpick comments:
In `@k8s/api/deployment.yaml`:
- Around line 95-101: The deployment uses an mutable image tag "honcho:latest"
with imagePullPolicy IfNotPresent; update the container image field in the
Deployment to a fixed, immutable identifier (e.g. a versioned tag like
honcho:v2.1.1 or an image digest like honcho@sha256:...) and retain or set
imagePullPolicy appropriately (IfNotPresent is fine for immutable tags); if you
need local overrides, handle environment-specific image substitution via
Kustomize overlays or tooling rather than leaving the manifest pinned to
:latest.

In `@k8s/deriver/deployment.yaml`:
- Line 82: Replace the mutable image tag "honcho:latest" in the Deployment
spec's container image reference with an immutable reference (a specific version
tag or an image digest) so deployments are deterministic; update the container
image field (where image: honcho:latest is set) to use either honcho:<semver> or
honcho@sha256:<digest> and ensure any CI/image-build output is synchronized to
provide that exact tag/digest when applied.

In `@tests/test_k8s_manifests.py`:
- Around line 511-519: The helper _assert_has_resource_limits currently checks
only "memory" and "cpu"; update it to also require "ephemeral-storage" in both
resources["limits"] and resources["requests"]. In the function
_assert_has_resource_limits(container: dict[str, Any], label: str) add
"ephemeral-storage" to the tuple of fields iterated (or otherwise assert its
presence for both "limits" and "requests") so the test fails if
ephemeral-storage is missing from either limits or requests.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4cf8d485-f70d-43f1-b5ac-d954c5a51838

📥 Commits

Reviewing files that changed from the base of the PR and between 5b6bd59 and d0e24c2.

📒 Files selected for processing (18)
  • .gitignore
  • k8s/README.md
  • k8s/api/deployment.yaml
  • k8s/api/hpa.yaml
  • k8s/api/pdb.yaml
  • k8s/api/service.yaml
  • k8s/configmap.yaml
  • k8s/deriver/deployment.yaml
  • k8s/kustomization.yaml
  • k8s/namespace.yaml
  • k8s/network-policies.yaml
  • k8s/postgres/configmap.yaml
  • k8s/postgres/service.yaml
  • k8s/postgres/statefulset.yaml
  • k8s/redis/service.yaml
  • k8s/redis/statefulset.yaml
  • k8s/secrets.yaml.example
  • tests/test_k8s_manifests.py

Comment thread k8s/api/deployment.yaml
Comment on lines +208 to +214
securityContext:
# The Dockerfile creates a non-root 'app' user via adduser --system
# (UID in the 100–999 range). runAsNonRoot enforces this without
# hardcoding the UID, which varies by image build.
# Note: kube-score recommends UID > 10000; achieving that requires
# a Dockerfile change (adduser --uid 10001 app).
runAsNonRoot: true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

fd -i 'dockerfile' | while read -r f; do
  echo "== $f =="
  nl -ba "$f" | rg -n '\b(FROM|USER|adduser|useradd)\b'
  echo
done

Repository: plastic-labs/honcho

Length of output: 137


🏁 Script executed:

fd -i 'dockerfile' --type f

Repository: plastic-labs/honcho

Length of output: 73


🏁 Script executed:

cat -n k8s/api/deployment.yaml | sed -n '208,214p'

Repository: plastic-labs/honcho

Length of output: 529


🏁 Script executed:

# Find Dockerfile and extract USER/adduser lines more carefully
find . -iname 'dockerfile' -o -iname 'dockerfile.*' | head -20

Repository: plastic-labs/honcho

Length of output: 75


🏁 Script executed:

cat -n Dockerfile

Repository: plastic-labs/honcho

Length of output: 2368


Either set runAsUser in the security context or change the Dockerfile to use a numeric USER.

The Dockerfile creates the app user via adduser --system (line 39) and switches to it with USER app (line 50). However, runAsNonRoot: true is only guaranteed safe if the image's final USER is numeric. Non-numeric user names can cause Kubernetes to reject the pod in certain configurations. Set an explicit runAsUser here, or update the Dockerfile to use USER 1001 (or another numeric UID > 1000) instead of USER app.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@k8s/api/deployment.yaml` around lines 208 - 214, The securityContext
currently sets runAsNonRoot: true but the image switches to a named user ("app")
created with adduser --system, which can cause pod rejection; either set an
explicit numeric UID via runAsUser in the deployment's securityContext (e.g.,
runAsUser: 1001) to match the image, or change the Dockerfile to create and
switch to a numeric user (use adduser --uid 1001 ... and USER 1001) so
runAsNonRoot is reliable; update references to securityContext, runAsNonRoot,
runAsUser, and the Dockerfile USER/adduser lines accordingly.

Comment thread k8s/api/hpa.yaml Outdated
Comment thread k8s/deriver/deployment.yaml
Comment thread k8s/postgres/statefulset.yaml
Comment thread k8s/README.md Outdated
Comment thread k8s/redis/statefulset.yaml
Comment thread k8s/secrets.yaml.example Outdated
Comment thread k8s/secrets.yaml.example Outdated
Comment thread tests/test_k8s_manifests.py
Manifests:
- hpa.yaml: remove conflicting k3s comment; k3s bundles metrics-server
  by default, only kind needs a manual install
- api/deployment.yaml, deriver/deployment.yaml: replace unbounded
  init-container wait loops with bounded retries (60 × 2 s = 120 s
  max); exit non-zero if the dependency is still unreachable so the pod
  fails fast instead of hanging in Init indefinitely
- postgres/statefulset.yaml, redis/statefulset.yaml: add
  automountServiceAccountToken: false — these pods don't call the k8s API
- secrets.yaml.example: correct "ClusterIP Service" → "headless Service
  (clusterIP: None)"; comment out empty LLM key lines so omitted keys
  are truly absent rather than injected as empty strings

README:
- Correct the claim that "Kustomize applies resources in dependency
  order" — Kustomize does not guarantee ordering; startup ordering is
  handled by the init containers in the API and Deriver Deployments

Tests:
- manifests fixture: add timeout=30 to subprocess.run; catch
  subprocess.TimeoutExpired with pytest.fail
- _assert_has_resource_limits: add ephemeral-storage to the checked
  fields (all containers already set it; the test now enforces it)
- test_deployments_no_service_account_token: renamed to
  test_all_workloads_no_service_account_token and extended to cover
  postgres and redis StatefulSets

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
tests/test_k8s_manifests.py (1)

45-56: Consider returning None instead of asserting in helper function.

_by_kind_name uses assert to enforce exactly one match, which raises AssertionError if zero or multiple matches are found. While this works for the current tests, returning None for zero matches would allow tests to use more descriptive assertions with custom messages.

However, this is a minor style preference—the current implementation is functionally correct and the assertion messages are clear.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_k8s_manifests.py` around lines 45 - 56, The helper _by_kind_name
currently asserts exactly one match; change it to return None when no matches
are found (so callers/tests can make explicit assertions/messages) and keep a
clear error for multiple matches (raise a ValueError or AssertionError with the
existing message like f"Expected exactly 1 {kind}/{name}, got {len(matches)}").
Update the function _by_kind_name to check len(matches) and return matches[0]
for one match, return None for zero, and raise for >1 so callers can handle the
None case and write descriptive assertions.
k8s/api/deployment.yaml (1)

220-233: Documented trade-off: Consider setting explicit runAsUser for stricter enforcement.

The inline comments (lines 221-225) document that runAsNonRoot: true is used without hardcoding a UID because the Dockerfile creates the app user with adduser --system (UID in 100-999 range). This works but has caveats:

  1. Some admission controllers may reject pods where the image's final USER is non-numeric
  2. kube-score recommends UID > 10000 to avoid conflicts with host system users

For stricter compliance, consider updating the Dockerfile to use a fixed UID (e.g., adduser --uid 10001 app) and then setting runAsUser: 10001 here.

The current approach is functional and the trade-off is documented, so this is informational rather than blocking.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@k8s/api/deployment.yaml` around lines 220 - 233, Update the image and
manifest to enforce a fixed non-root UID: modify the Dockerfile to create the
app user with a deterministic high UID (e.g., use adduser --uid 10001 app) and
then set securityContext.runAsUser: 10001 in the Deployment (keep runAsNonRoot:
true as an extra guard); this ensures admission controllers that require numeric
USERs and kube-score recommendations (UID > 10000) are satisfied while retaining
the existing allowPrivilegeEscalation/readOnlyRootFilesystem/capabilities
settings.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@k8s/api/deployment.yaml`:
- Around line 220-233: Update the image and manifest to enforce a fixed non-root
UID: modify the Dockerfile to create the app user with a deterministic high UID
(e.g., use adduser --uid 10001 app) and then set securityContext.runAsUser:
10001 in the Deployment (keep runAsNonRoot: true as an extra guard); this
ensures admission controllers that require numeric USERs and kube-score
recommendations (UID > 10000) are satisfied while retaining the existing
allowPrivilegeEscalation/readOnlyRootFilesystem/capabilities settings.

In `@tests/test_k8s_manifests.py`:
- Around line 45-56: The helper _by_kind_name currently asserts exactly one
match; change it to return None when no matches are found (so callers/tests can
make explicit assertions/messages) and keep a clear error for multiple matches
(raise a ValueError or AssertionError with the existing message like f"Expected
exactly 1 {kind}/{name}, got {len(matches)}"). Update the function _by_kind_name
to check len(matches) and return matches[0] for one match, return None for zero,
and raise for >1 so callers can handle the None case and write descriptive
assertions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b0a5abcb-4d4b-4e89-b727-4ea4284b9a73

📥 Commits

Reviewing files that changed from the base of the PR and between d0e24c2 and 39f9126.

📒 Files selected for processing (8)
  • k8s/README.md
  • k8s/api/deployment.yaml
  • k8s/api/hpa.yaml
  • k8s/deriver/deployment.yaml
  • k8s/postgres/statefulset.yaml
  • k8s/redis/statefulset.yaml
  • k8s/secrets.yaml.example
  • tests/test_k8s_manifests.py
✅ Files skipped from review due to trivial changes (2)
  • k8s/api/hpa.yaml
  • k8s/secrets.yaml.example
🚧 Files skipped from review as they are similar to previous changes (1)
  • k8s/deriver/deployment.yaml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant