Skip to content
Open
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
1f9c816
feat(opal-server): gated /internal git-fetcher cache stats endpoint
dshoen619 Jun 23, 2026
c176ce9
test(git-leak): add OPAL git leak/resilience test bed
dshoen619 Jun 23, 2026
bd49676
test(git-leak): add GiteaAdmin and make_repo_unreachable helpers
dshoen619 Jun 23, 2026
afeb969
test(git-leak): correct postgres-bounce framing (passes on master)
dshoen619 Jun 23, 2026
92353f6
style(git-leak): apply black/isort/docformatter (pre-commit)
dshoen619 Jun 23, 2026
db54ae6
test: scope root pytest collection to packages/ (exclude git-leak bed)
dshoen619 Jun 23, 2026
6f9a089
test(git-leak): address Copilot review feedback
dshoen619 Jun 23, 2026
1b23ac0
test(git-leak): isolate scopes per test and fix false repeat-sync gate
dshoen619 Jun 23, 2026
6046a10
test(git-leak): make the regression gates trustworthy (address PR rev…
dshoen619 Jun 24, 2026
f810db8
style(git-leak): apply black/isort/docformatter (pre-commit)
dshoen619 Jun 24, 2026
82ff33a
test(git-leak): tighten stat polling and pin test-bed images (PR review)
dshoen619 Jun 24, 2026
d719c34
Merge branch 'master' into david/per-15155-pr1-git-leakresilience-tes…
dshoen619 Jun 28, 2026
75ad43a
test(git-leak): isolate offline-hang healthy probe to a never-cloned …
dshoen619 Jun 28, 2026
15f3cfe
test(git-leak): harden harness teardown and tighten assertions (PR re…
dshoen619 Jun 28, 2026
a502f2e
Merge branch 'master' into david/per-15155-pr1-git-leakresilience-tes…
dshoen619 Jun 30, 2026
8e24cb0
test(git-leak): make PR4/PR5 tests genuine gates (address PR review)
dshoen619 Jul 1, 2026
6f002fd
test(git-leak): address PR review round 3 (fixture robustness + cleanup)
dshoen619 Jul 1, 2026
5aae85c
test(git-leak): harden gates per multi-agent review (H1 + M1-M6)
dshoen619 Jul 1, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -137,3 +137,6 @@ dmypy.json
*.iml

.DS_Store

# Private Claude Code working artifacts (plans/specs) — never commit
.claude/
55 changes: 55 additions & 0 deletions app-tests/git-leak/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# OPAL git-leak / resilience test bed

Reproduces (as failing tests on `master`) the four issues fixed by PR2–PR5:
memory leak, offline-repo hang, slow serial boot, broadcaster no-reconnect.

## Stack
- `opal_server` (2 workers, scopes on, Postgres broadcaster, built from `docker/Dockerfile`)
- `redis`, `postgres`, `gitea` (+ one-shot `gitea-admin` and `seed` sidecars)

Only `opal_server` (`:7002`) and `gitea` (`:13000` on the host) are published;
Postgres is internal to the compose network.

## Helpers (`helpers.py`)
- `OpalServerClient` — drive opal over HTTP (`stats`, `put_scope`, `delete_scope`, `refresh_all`).
- `GiteaAdmin` — host-side Gitea admin client (`list_repos`, `repo_exists`,
`create_repo`, `delete_repo`); also exposed as the `gitea_admin` pytest fixture.
- `make_repo_unreachable(name)` — git URL on a routable-but-dead host (TEST-NET-1) for the offline-repo test.
- `bounce_postgres(down_seconds)` — stop/start Postgres to simulate a broadcaster outage.

## Run
```bash
cd app-tests/git-leak
python -m pytest -v --boot-scopes=50 # full set
python -m pytest test_leak.py -v --boot-scopes=20 # just the leak gates
```
Useful flags: `--boot-scopes=N` (any N), `--keep-stack` (skip teardown),
env `BOOT_TARGET_SECONDS=120` (tighten the boot gate).

## Expected on master
The churn leak test (`test_churn_releases_caches`) and the offline-repo test
Comment thread
dshoen619 marked this conversation as resolved.
Outdated
FAIL on master — they target unfixed bugs and become the regression gates for
PR2/PR3. The boot test passes but only fails when `BOOT_TARGET_SECONDS` is set
low (PR4's gate).

Two tests are guards that PASS on master rather than reproducing a current
failure:
- `test_repeat_sync_does_not_grow` — clone paths are keyed by the repo URL, so
re-syncing identical scopes reuses cache entries and does not grow them. It
guards against a regression that would make repeat sync allocate per-sync.
(The unbounded-growth-then-no-purge leak is the *churn* test's job.)
- `test_server_recovers_after_postgres_bounce` — when the broadcaster drops, the
affected worker is respawned by gunicorn while the sibling keeps serving, so
the HTTP surface recovers. Guards that property against regression.

## Requires
Docker + docker compose v2, plus host Python with `pytest pytest-timeout requests GitPython`.

## Notes
- Auth is disabled in the stack: `OPAL_AUTH_PUBLIC_KEY` is left unset so the JWT
verifier is disabled and the harness can call scope routes without minting JWTs.
Local test bed only; never a production setting.
- The server runs 2 uvicorn workers with a Postgres broadcaster, mirroring a
realistic multi-worker deployment. The `GitPolicyFetcher` caches read by the
`/internal/git-fetcher-cache-stats` endpoint are per-process, so the harness
polls with generous timeouts to let the leader worker converge.
63 changes: 63 additions & 0 deletions app-tests/git-leak/conftest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
import os
import shutil

import pytest
from helpers import GiteaAdmin, OpalServerClient, compose


def pytest_addoption(parser):
parser.addoption(
"--boot-scopes",
action="store",
default="50",
help="number of repos to seed/boot (default 50)",
)
parser.addoption(
"--keep-stack",
action="store_true",
default=False,
help="do not tear the compose stack down after the run",
)


@pytest.fixture(scope="session")
def repo_count(request) -> int:
return int(request.config.getoption("--boot-scopes"))


@pytest.fixture(scope="session")
def stack(request, repo_count):
# Defense-in-depth: this docker-compose suite is already excluded from the
# repo's default `pytest` run via `testpaths = packages` in pytest.ini, so
# the unit-test CI matrix never collects it. If it is ever collected in an
# environment without docker, skip cleanly instead of erroring.
if shutil.which("docker") is None:
pytest.skip("docker (compose) is required for the git-leak test bed")
os.environ["REPO_COUNT"] = str(repo_count)
# build + start infra; seed runs to completion then exits
compose("up", "-d", "--build")
# block until seeding sidecar has finished creating repos
compose("wait", "seed")
Comment thread
dshoen619 marked this conversation as resolved.
Comment thread
dshoen619 marked this conversation as resolved.
client = OpalServerClient()
client.wait_healthy()
yield client
if not request.config.getoption("--keep-stack"):
compose("down", "-v")


@pytest.fixture()
def opal(stack) -> OpalServerClient:
# The compose stack is session-scoped (one server for the whole run), but
# scopes must not leak between tests: clone paths are keyed by repo URL, so
# a scope left behind by one test shares a cache entry with any later test
# using the same seeded repo and would pollute its drain assertions. Delete
# whatever this test created on teardown to keep each test isolated.
stack._created_scopes.clear()
Comment thread
dshoen619 marked this conversation as resolved.
Outdated
yield stack
stack.cleanup_created_scopes()


@pytest.fixture(scope="session")
def gitea_admin(stack) -> GiteaAdmin:
"""Host-side Gitea admin client (depends on `stack` so Gitea is up)."""
return GiteaAdmin()
106 changes: 106 additions & 0 deletions app-tests/git-leak/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
name: opal-git-leak-test

services:
redis:
image: redis:7-alpine
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 2s
timeout: 3s
retries: 30

postgres:
image: postgres:16-alpine
environment:
POSTGRES_USER: opal
POSTGRES_PASSWORD: opal
POSTGRES_DB: opal
# not published to the host: only opal_server reaches it over the compose
# network, and bounce_postgres() uses `docker compose stop/start`. Publishing
# 5432 would collide with any Postgres already running on the host.
healthcheck:
test: ["CMD-SHELL", "pg_isready -U opal"]
interval: 2s
timeout: 3s
retries: 30

gitea:
image: gitea/gitea:1.21
environment:
GITEA__security__INSTALL_LOCK: "true"
GITEA__server__ROOT_URL: "http://gitea:3000/"
GITEA__database__DB_TYPE: "sqlite3"
# published on 13000 (not 3000) for the host-side GiteaAdmin helper; the
# uncommon port avoids the usual :3000 clash. opal_server and the seed
# sidecar still reach it over the compose network via http://gitea:3000.
ports:
- "13000:3000"
volumes:
- gitea-data:/data
healthcheck:
test: ["CMD-SHELL", "wget -qO- http://localhost:3000/api/v1/version || exit 1"]
interval: 3s
timeout: 5s
retries: 40

gitea-admin:
# creates the admin user once gitea is healthy
image: gitea/gitea:1.21
depends_on:
gitea:
condition: service_healthy
user: git
entrypoint: ["/bin/sh", "-c"]
command:
- >
gitea admin user create --username opaladmin --password opaladmin
--email admin@example.com --admin --must-change-password=false
--config /data/gitea/conf/app.ini || true
Comment thread
dshoen619 marked this conversation as resolved.
Outdated
volumes:
- gitea-data:/data
restart: "no"

seed:
build: ./seed
depends_on:
gitea:
condition: service_healthy
gitea-admin:
condition: service_completed_successfully
environment:
GITEA_URL: "http://gitea:3000"
GITEA_ADMIN_USER: "opaladmin"
GITEA_ADMIN_PASSWORD: "opaladmin"
REPO_COUNT: "${REPO_COUNT:-50}"
volumes:
- seed-output:/seed-output
restart: "no"

opal_server:
build:
context: ../..
dockerfile: docker/Dockerfile
target: server
environment:
UVICORN_NUM_WORKERS: "2"
OPAL_SCOPES: "1"
OPAL_REDIS_URL: "redis://redis:6379"
OPAL_BROADCAST_URI: "postgres://opal:opal@postgres:5432/opal"
OPAL_BASE_DIR: "/opal"
OPAL_POLICY_REFRESH_INTERVAL: "0"
OPAL_DEBUG_INTERNAL_STATS: "1"
# OPAL_AUTH_PUBLIC_KEY is intentionally left unset: with no public key the
# JWT verifier is disabled, so the harness can call scope routes without
# minting JWTs. Local test bed only; never a production setting.
OPAL_LOG_FORMAT_INCLUDE_PID: "true"
ports:
- "7002:7002"
depends_on:
redis:
condition: service_healthy
postgres:
condition: service_healthy

volumes:
gitea-data:
seed-output:
Loading
Loading