From 339d84b8a0b3257f907061ceacbd39dc8ed8a5cf Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Fri, 8 May 2026 00:32:49 +0000
Subject: [PATCH 1/2] =?UTF-8?q?docs(roadmap):=20add=20Tier=202=20plans=20?=
 =?UTF-8?q?=E2=80=94=20IPC=20hardening=20+=20malloc-injection=20harness?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Two self-contained implementation plans for the natural follow-ups to
PRs #6 (TLS OCSP stapling) and #8 (cert/script IPC retain/fd leak fix):

* roadmap/plan-ipc-hardening.md — consolidate the path-join, send-failure,
  and mp-pool retain audits that were declined as out-of-scope on PR #6
  and PR #8.  Concrete site list, command bag, ~2 days estimate.

* roadmap/plan-malloc-injection.md — LD_PRELOAD fault-injection harness
  so the leaks fixed by PR #6 / PR #8 stop being review-only verified.
  Three first-consumer tests cover both PRs' leak paths.  Design notes
  on shim layout, pytest fixture, CI workflow, ~5–6 day estimate.

* roadmap/README.md — index so the folder is discoverable in isolation.

These docs only; no source changes.  Each plan is self-contained so a
fresh session can pick one up and execute without prior context.
---
 roadmap/README.md                |  17 ++
 roadmap/plan-ipc-hardening.md    | 253 +++++++++++++++++++++++++++
 roadmap/plan-malloc-injection.md | 283 +++++++++++++++++++++++++++++++
 3 files changed, 553 insertions(+)
 create mode 100644 roadmap/README.md
 create mode 100644 roadmap/plan-ipc-hardening.md
 create mode 100644 roadmap/plan-malloc-injection.md

diff --git a/roadmap/README.md b/roadmap/README.md
new file mode 100644
index 000000000..8d369882a
--- /dev/null
+++ b/roadmap/README.md
@@ -0,0 +1,17 @@
+# Roadmap & implementation plans
+
+Detailed plans for upcoming work. Each `plan-*.md` is self-contained: a
+new session can pick one up and execute it without prior context.
+
+## Current
+
+- [`plan-ipc-hardening.md`](plan-ipc-hardening.md) — finish the IPC layer
+  cleanup pass started by PRs #6 and #8 (path-join helper, audit of
+  `(void) nxt_port_socket_write(...)` sites, leaf-name validation).
+- [`plan-malloc-injection.md`](plan-malloc-injection.md) — `LD_PRELOAD`
+  fault-injection harness so the leak fixes from PRs #6 and #8 are
+  regression-fenced.
+
+These two plans are designed to be tackled as separate PRs; the malloc
+harness is a natural follow-up to the IPC hardening pass and would
+backfill regression coverage for both PRs after the fact.
diff --git a/roadmap/plan-ipc-hardening.md b/roadmap/plan-ipc-hardening.md
new file mode 100644
index 000000000..79812c940
--- /dev/null
+++ b/roadmap/plan-ipc-hardening.md
@@ -0,0 +1,253 @@
+# Plan: IPC layer hardening pass
+
+## Context
+
+PRs #6 (TLS OCSP stapling) and #8 (mp-pool retain & fd/buffer leaks in cert/script
+IPC) closed the most urgent leaks but left several siblings open. Reviewers
+(Gemini code-assist + the user-supplied audit plan that drove #8) flagged a
+broader pattern: `(void) nxt_port_socket_write(...)` discarding return values
+across the IPC layer, plus implicit "ends-with-`/`" path-join assumptions in the
+cert/script/OCSP store handlers.
+
+This PR consolidates those follow-ups into one cohesive pass so the codebase
+doesn't carry split conventions across cert/script/OCSP/conf/socket reply paths.
+
+## Goal
+
+When this lands:
+
+- Every main-process IPC reply that ships an fd via `NXT_PORT_MSG_CLOSE_FD`
+  closes the fd explicitly on `nxt_port_socket_write() != NXT_OK`.
+- Every main-process IPC reply that ships a `nxt_buf_t *b` runs the buf's
+  `completion_handler` manually on send failure.
+- Every router-side `nxt_port_socket_write(...)` that retains the temp_conf
+  mp pool retains *after* the send succeeds (matches PR #8 pattern; matches
+  the existing-correct `nxt_router_access_log_reopen` shape).
+- The cert/script/OCSP store path-join uses a small helper that tolerates
+  `rt->{certs,scripts}.start` not ending in `/`.
+
+## Scope
+
+### Sender-side mp-pool retain (matches PR #8 finding 1)
+
+PR #8 fixed `nxt_cert_store_get` and `nxt_script_store_get`. PR #6 fixed the
+OCSP twin `nxt_cert_store_get_ocsp`. Audit the remaining sites:
+
+```bash
+git grep -n 'nxt_mp_retain' src/
+```
+
+Known sites at master (re-verify before editing):
+
+- [x] `src/nxt_cert.c:1117` `nxt_cert_store_get`        — fixed in PR #8
+- [x] `src/nxt_script.c:483` `nxt_script_store_get`     — fixed in PR #8
+- [x] `src/nxt_router_access_log.c:579` reopen path      — already correct upstream
+- [ ] (post-PR #6) `src/nxt_cert.c` `nxt_cert_store_get_ocsp` — fixed in PR #6 (`52c9b54`)
+- [ ] **Audit:** any other site that pairs `nxt_buf_mem_alloc(mp, ...)` →
+      `nxt_mp_retain(mp)` → `b->completion_handler = nxt_xxx_buf_completion`.
+      Likely none, but confirm.
+
+### Receiver-side fd-close-on-send-failure (matches PR #8 finding 4)
+
+PR #8 fixed cert/script/socket/access-log handlers. Remaining sites in the
+main process that send fds with `NXT_PORT_MSG_CLOSE_FD` and discard the
+return value:
+
+```bash
+git grep -n 'NXT_PORT_MSG_CLOSE_FD' src/
+git grep -n '(void) nxt_port_socket_write' src/
+```
+
+Known candidates (re-verify line numbers before edit; numbers from the audit
+done during PR #8):
+
+- [x] `src/nxt_cert.c:1227` `nxt_cert_store_get_handler`          — fixed in PR #8
+- [x] `src/nxt_script.c:587` `nxt_script_store_get_handler`        — fixed in PR #8
+- [x] `src/nxt_main_process.c:1156` `nxt_main_port_socket_handler` — fixed in PR #8
+- [x] `src/nxt_main_process.c:1724` `nxt_main_port_access_log_handler` — fixed in PR #8
+- [ ] (post-PR #6) `src/nxt_cert.c` `nxt_cert_store_get_ocsp_handler` — fixed in PR #6
+- [ ] **Audit:** anything else that builds `type = ... | NXT_PORT_MSG_CLOSE_FD`
+      and sends via `nxt_port_socket_write`. Probably nothing left, but verify.
+
+### Buffer-completion-on-send-failure (matches PR #8 reviewer finding 4 detail)
+
+Anywhere `nxt_port_socket_write(..., b)` is called with a non-NULL buf whose
+`completion_handler` releases an mp retain (or any other resource), and the
+return value is discarded — invoke the handler manually on the failure path.
+
+Known candidate after the PR #8 fixes:
+
+- [x] `src/nxt_main_process.c` `nxt_main_port_socket_handler` `out` buffer
+      — fixed in PR #8.
+
+Audit:
+
+```bash
+git grep -nE '\(void\) *nxt_port_socket_write|nxt_port_socket_write\([^)]*,\s*b\b'
+```
+
+Sites mentioned during PR #8 review as low-impact shutdown leaks:
+
+- `src/nxt_runtime.c:511` `nxt_runtime_stop_app_processes()` cascade
+- `src/nxt_runtime.c:533` `nxt_runtime_stop_all_processes()` cascade
+- `src/nxt_application.c:716` `nxt_proto_quit_children()` cascade
+
+Classify each:
+
+1. No fd, no buffer with refcount-bearing completion → safe to ignore (note
+   in commit message).
+2. Fd with `CLOSE_FD` → close fd on `!= NXT_OK`.
+3. Buffer with refcount-bearing completion → run completion on `!= NXT_OK`.
+4. Both → do both, completion first (matches `nxt_port_error_handler`
+   ordering; see PR #8 commit `5a9f37d`).
+
+### Path-join helper (gemini PR #6 finding 3, deferred)
+
+Today every `*_store_*_handler` open does:
+
+```c
+file.name = nxt_malloc(rt->certs.length + name.length + 1);
+p = nxt_cpymem(file.name, rt->certs.start, rt->certs.length);
+p = nxt_cpymem(p, name.start, name.length + 1);
+```
+
+Three implicit assumptions:
+
+1. `rt->{certs,scripts}.start` ends with `/`.
+2. `name.start` is null-terminated and doesn't contain `/` or `..`.
+3. Caller-provided `name.length` is the byte count *excluding* the NUL.
+
+Today's invariant comes from `nxt_runtime_state_directory()`, which is fine
+but undocumented at the call sites. Suggested helper next to `nxt_runtime`:
+
+```c
+/*
+ * Build "<dir>/<name><suffix>" into a freshly nxt_malloc'd buffer.
+ * dir is taken from rt->certs / rt->scripts and may or may not have a
+ * trailing slash.  name is taken from a port message and is treated as
+ * opaque bytes (no '/' / '..' rejection here; callers must validate).
+ * suffix may be empty.  Returns NULL on alloc failure.
+ */
+nxt_file_name_t *nxt_runtime_resolve_store_path(const nxt_str_t *dir,
+    const nxt_str_t *name, const nxt_str_t *suffix);
+```
+
+Concrete callers to convert (these are the ones whose path construction the
+gemini comment flagged):
+
+- `nxt_cert_store_get_handler`        (`src/nxt_cert.c`)
+- `nxt_cert_store_get_ocsp_handler`   (`src/nxt_cert.c`, post-PR #6)
+- `nxt_script_store_get_handler`      (`src/nxt_script.c`)
+- `nxt_cert_store_delete_handler`     (`src/nxt_cert.c`) — uses the same shape
+
+Not required:
+
+- `nxt_main_port_access_log_handler`  — `path` comes from msg buf, not
+  joined to a base dir; keep as-is.
+- `nxt_main_port_conf_store_handler`  — uses `rt->conf` / `rt->conf_tmp`
+  directly without joining, no leaf-name from IPC.
+
+### Cert-name validation (security adjacency)
+
+While touching the cert/script handlers, also validate that the IPC-supplied
+`name` doesn't contain path separators or escape sequences. This is implicit
+today — the controller validates cert/script names before storing — but
+defense-in-depth is cheap once the helper exists. Add a `nxt_name_safe()`
+predicate alongside the path helper:
+
+```c
+/*
+ * True if name is a pure leaf: ASCII alnum + ['-_.'], no '/', no NUL,
+ * no leading dot.  Used as a defensive check at the trust boundary
+ * between the controller (validated input) and the main process
+ * (privileged file operations).
+ */
+nxt_bool_t nxt_name_is_safe_leaf(const nxt_str_t *name);
+```
+
+Reject in handlers; alert; reply with `NXT_PORT_MSG_RPC_ERROR`.
+
+## Out of scope (call out explicitly)
+
+- The `nxt_mp_retain` audit in **non**-cert/script paths (router request
+  pipeline, etc.). Those mp pools have different lifetime contracts and
+  should be a separate pass.
+- IPC layer **structural** changes (unifying the reply pattern into a single
+  helper). Tempting but expands the diff and risks behavioral drift; defer.
+- Cleaning up `nxt_runtime.c:511`/`:533` shutdown cascades unless they
+  classify into category 2/3/4 above. Most likely category 1 (purely
+  best-effort process-exit signaling).
+
+## Suggested commit shape
+
+One commit, one diff, single PR. Title:
+
+```
+fix(port): finish cert/script/OCSP IPC hardening pass
+```
+
+Body summarizes the four bullets above with the exact site list.
+
+## Test plan
+
+- [ ] Build clean (`./configure --openssl && ./configure python && make -j`).
+- [ ] `pytest test/test_tls.py test/test_tls_sni.py test/test_tls_ocsp.py`
+      — TLS reload paths exercise cert_store_get sender + receiver.
+- [ ] `pytest test/test_configuration.py test/test_access_log.py`
+      — listener / log paths exercise main_process handlers.
+- [ ] `pytest test/test_njs_modules.py` (if present) — script_store paths.
+- [ ] Manual: `nxt_router_access_log_reopen()` triggered via `SIGUSR1`.
+- [ ] Manual: cert delete via `DELETE /certificates/<name>` exercises
+      `nxt_cert_store_delete_handler` if its path-join was changed.
+
+The leak paths themselves still require fault injection to exercise
+deterministically. See `roadmap/plan-malloc-injection.md` — the natural
+companion PR — for the harness.
+
+## Effort estimate
+
+- Sender-side mp-pool audit: 2 hours (mostly grep + verify, likely zero
+  remaining sites).
+- Receiver-side fd/buffer audit + edits: 4 hours.
+- Path-join helper + 4 call-site conversions: 4 hours.
+- Name-safety predicate + 4 reject-tests: 3 hours.
+- Tests + CHANGES + commit message: 2 hours.
+
+Total: **~2 days** for one focused session.
+
+## Forwarding
+
+Both leak shapes are pre-existing in upstream `freeunitorg/freeunit`. Once
+this lands and is exercised here, the same diff should be filed as an
+upstream PR — same content, retitled, no FreeUnit-specific framing in the
+body. (Same posture as PR #6 and PR #8 from this branch.)
+
+---
+
+## Quick-reference command bag
+
+For the next session, here are the exact commands to start from:
+
+```bash
+# 1. Set up branch.
+git fetch origin master
+git checkout -b claude/ipc-hardening-pass origin/master
+
+# 2. Map the work surface.
+git grep -n 'nxt_mp_retain'                       src/
+git grep -n 'NXT_PORT_MSG_CLOSE_FD'               src/
+git grep -nE '\(void\) *nxt_port_socket_write'    src/
+git grep -n 'rt->certs\|rt->scripts'              src/
+
+# 3. Sanity-check the existing fixes that landed in PR #6 / PR #8 are
+#    already on master before adding the new helper:
+git log --oneline --grep='cert/script' origin/master
+git log --oneline --grep='OCSP'        origin/master
+
+# 4. After implementing, the regression set:
+./configure --openssl
+./configure python --config=python3-config
+make -j$(nproc)
+pytest test/test_tls.py test/test_tls_sni.py test/test_tls_ocsp.py \
+       test/test_configuration.py test/test_access_log.py -q
+```
diff --git a/roadmap/plan-malloc-injection.md b/roadmap/plan-malloc-injection.md
new file mode 100644
index 000000000..5474bb921
--- /dev/null
+++ b/roadmap/plan-malloc-injection.md
@@ -0,0 +1,283 @@
+# Plan: malloc-failure injection test harness
+
+## Context
+
+Both PR #6 (TLS OCSP stapling) and PR #8 (cert/script IPC leak fixes) closed
+leaks that are reachable only when `malloc()` (or `nxt_port_msg_alloc()`,
+which is a `malloc` wrapper) returns `NULL`. Today there's no way to drive
+that path in CI, so the fixes are review-verified but not regression-fenced.
+The next time someone refactors the cert/script/OCSP IPC the leak can come
+back silently.
+
+This PR adds a small allocator fault-injection facility specifically for
+pytest, with the OCSP and cert-store reply paths as the first consumers.
+
+## Goal
+
+A pytest fixture that lets a test say:
+
+```python
+@pytest.mark.malloc_inject
+def test_cert_get_handler_send_failure(malloc_inject):
+    """If nxt_port_msg_alloc fails inside chk_insert during the cert reply,
+    the cert fd in the main process must not leak."""
+
+    fd_before = count_main_process_fds()
+
+    with malloc_inject(fail_when='nxt_port_msg_alloc',
+                       call_count=2,
+                       then='succeed'):
+        # Trigger a cert-store-get that drives the receiver into the
+        # failing send path.  Use the existing /certificates upload +
+        # listener add flow.
+        upload_cert_bundle(name='leak_test')
+        apply_listener({'tls': {'certificate': 'leak_test'}})
+
+    fd_after = count_main_process_fds()
+    assert fd_after == fd_before, "main-process fd leaked on send failure"
+```
+
+Out of scope for this PR: actually building any fault-injection backend
+beyond what's needed for the cert/script/OCSP tests. Keep it minimal,
+extensible later.
+
+## Design choices
+
+### Backend: `LD_PRELOAD` shim, not `malloc_hook`, not allocator patches
+
+Three options were considered:
+
+| Option | Pros | Cons |
+|---|---|---|
+| `LD_PRELOAD=malloc_inject.so` | Zero source changes; works with system Unit binary; isolated per-test | Linux-only; needs careful pthread/glibc interaction |
+| glibc `__malloc_hook` (deprecated) | Simple API | Removed from glibc 2.34+ |
+| Wrap `nxt_malloc` in `src/nxt_malloc.{c,h}` with a build-time toggle | Cross-platform; deterministic | Touches the codebase; tests would need a special build |
+
+**Pick `LD_PRELOAD`.** Linux-only is acceptable for an internal test
+harness (Unit's CI is Linux already). It keeps the Unit binary unchanged
+and matches what most production-grade fault-injection libraries
+(`libfiu`, `libfaketime`, `libfailmalloc`) do.
+
+### Counter semantics: per-symbol, per-test
+
+The shim exports a control interface via two environment variables and a
+named-pipe / unix-socket, so a pytest harness can reconfigure injection
+between test phases:
+
+```
+MALLOC_INJECT_TARGETS    "malloc:5,nxt_port_msg_alloc:2"
+MALLOC_INJECT_CONTROL    "/tmp/malloc-inject-<pid>.sock"
+```
+
+Format: `<symbol>:<call-number-to-fail>` — fail the Nth call to that symbol
+for this process. Subsequent calls succeed normally. Tests run one targeted
+failure per assertion.
+
+If `MALLOC_INJECT_TARGETS` is unset, the shim is a no-op.
+
+### Symbol coverage
+
+For these PRs only two need wrappers; design extensibility but ship
+narrowly:
+
+- `malloc` — covers `nxt_port_msg_alloc` (which calls `nxt_malloc` → `malloc`)
+  and most of `nxt_mp_alloc`'s lazy-region allocations.
+- `mmap` — covers `nxt_port_mmap_*` paths if they get added later.
+- `posix_memalign` — covers aligned allocations in `nxt_buf_mem_alloc` etc.
+
+For PR scope: ship only `malloc` + `posix_memalign`. The rest can come as
+needed.
+
+### Per-symbol vs per-call-site
+
+Failing "the second call to malloc anywhere in the process" is too coarse
+— Unit calls malloc thousands of times per request. The shim should
+support targeting via a stack-walk filter:
+
+```
+MALLOC_INJECT_TARGETS  "malloc@nxt_port_msg_alloc:1"
+```
+
+Implementation: when targeted-mode is active, walk one frame up the
+stack with `__builtin_return_address(1)` and resolve via `dladdr()`; if
+the symbol matches, count and possibly fail. This costs ~1 µs per call
+in injection mode, zero when disabled (the shim early-returns when
+`MALLOC_INJECT_TARGETS` is empty).
+
+## Layout
+
+```
+tools/
+└── malloc_inject/
+    ├── README.md
+    ├── Makefile
+    ├── malloc_inject.c          # the LD_PRELOAD shim
+    └── malloc_inject_test.c     # standalone unit test for the shim itself
+
+test/
+├── conftest.py                  # add the malloc_inject fixture
+└── test_cert_store_inject.py    # first consumer; tests the PR #8 + PR #6 leaks
+```
+
+The shim builds with the regular Unit toolchain:
+
+```
+$(CC) -shared -fPIC -ldl -o build/malloc_inject.so \
+    tools/malloc_inject/malloc_inject.c
+```
+
+`Makefile` integration: add a `make malloc-inject` target that's not part
+of the default build. CI invokes it explicitly before running the
+`test_*_inject.py` tests.
+
+## Pytest integration
+
+`test/conftest.py` gets:
+
+```python
+@pytest.fixture
+def malloc_inject(unit, tmp_path):
+    """Yield a context manager that activates allocator fault injection
+    against the running Unit instance.  Requires the Unit binary to have
+    been started under LD_PRELOAD=build/malloc_inject.so (handled by the
+    `unit` fixture when the test is decorated with @pytest.mark.malloc_inject).
+    """
+    sock = tmp_path / 'malloc-inject.sock'
+
+    @contextmanager
+    def _activate(fail_when, call_count, then='succeed'):
+        spec = f'{fail_when}:{call_count}'
+        _send_to_shim(unit.pid, spec)
+        try:
+            yield
+        finally:
+            _send_to_shim(unit.pid, 'reset')
+
+    yield _activate
+```
+
+The `unit` fixture's `start()` path checks for the `malloc_inject` marker
+on the requesting test and adds `LD_PRELOAD=$BUILD/malloc_inject.so` to
+the spawned daemon's env if present. Tests without the marker run
+unchanged.
+
+## First test consumers
+
+Three concrete tests to ship in this PR:
+
+### 1. `test_cert_send_failure_no_fd_leak`
+
+Force `malloc` to fail at the second call inside `nxt_port_msg_alloc`,
+then upload a cert and apply a listener that references it. Assert that
+`/proc/$main_pid/fd` does not grow.
+
+### 2. `test_script_send_failure_no_mp_leak`
+
+Same shape but for `nxt_script_store_get`'s sender side. The leak is mp-
+pool retain rather than fd, so the assertion is "router process VSZ does
+not grow by more than the new app's actual config size after the failed
+apply."
+
+### 3. `test_ocsp_send_failure_no_fd_leak`
+
+OCSP twin of test 1 — drive `nxt_cert_store_get_ocsp` past the
+`malloc`-fail point in `nxt_port_msg_alloc`, assert main-process fd
+count is stable after multiple retries.
+
+These three tests together fence both PR #6 and PR #8.
+
+## CI integration
+
+```yaml
+# .github/workflows/malloc-inject.yml  (new)
+name: malloc-inject
+
+on: [pull_request, push]
+
+jobs:
+  inject:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - run: ./configure --openssl
+      - run: ./configure python --config=python3-config
+      - run: make -j$(nproc)
+      - run: make malloc-inject
+      - run: pytest test/test_cert_store_inject.py -v
+```
+
+Doesn't replace the regular `test` matrix — runs separately so a shim
+regression is visible as its own red check.
+
+## Effort estimate
+
+- Shim implementation (`malloc_inject.c`, control protocol, build): 2 days.
+- Unit test for the shim itself (no Unit involved, just calls `malloc`
+  in a contrived program and asserts the right call fails): 0.5 day.
+- Pytest fixture + `unit` fixture wiring: 1 day.
+- Three first-consumer tests: 1 day.
+- CI workflow: 0.5 day.
+- Documentation in `tools/malloc_inject/README.md`: 0.5 day.
+
+Total: **~5–6 days** (one focused week).
+
+## Risks / things to think about up front
+
+- **`LD_PRELOAD` and setuid:** Unit's main process drops privs, but doesn't
+  setuid in the test environment. Shim should still gracefully no-op if it
+  detects setuid (defensive).
+- **Thread safety:** the call counter and stack-walk filter both need to be
+  thread-safe. Use `__thread` for per-thread call counts where it makes
+  sense, atomic compare-exchange for global counters.
+- **Symbol collision:** shim must `dlsym(RTLD_NEXT, ...)` to forward
+  non-failing calls. The first call to the wrapper has to bootstrap the
+  dlsym lookup without itself calling malloc — use a static buffer for the
+  bootstrap path (this is the standard `LD_PRELOAD` malloc-shim trick).
+- **glibc tcmalloc/jemalloc swap:** if Unit ever links to a non-glibc
+  malloc, the `LD_PRELOAD` approach still works but the symbol filter
+  changes. Document this in the shim's README.
+
+## Forwarding
+
+This harness is a pure test-infrastructure addition; it doesn't touch any
+production code path. Worth forwarding to `freeunitorg/freeunit` because
+the leaks it tests are upstream leaks. Same posture as the prior PRs.
+
+---
+
+## Quick-reference command bag
+
+```bash
+# 1. Set up branch.
+git fetch origin master
+git checkout -b claude/malloc-inject-harness origin/master
+
+# 2. Skeleton.
+mkdir -p tools/malloc_inject
+$EDITOR tools/malloc_inject/malloc_inject.c
+
+# 3. Build standalone:
+make -C tools/malloc_inject
+# or after Makefile integration:
+make malloc-inject
+
+# 4. Smoke the shim against any process:
+LD_PRELOAD=./build/malloc_inject.so \
+MALLOC_INJECT_TARGETS='malloc:1' \
+ls /
+
+# 5. Run the new test set:
+pytest test/test_cert_store_inject.py -v
+```
+
+## Suggested follow-on uses (out of scope for this PR)
+
+Once the harness exists, the same pattern fences:
+
+- The cert-rotation flow under heavy reconfigure churn.
+- `nxt_router_access_log_reopen` retry on `nxt_port_socket_write` failure.
+- `nxt_main_port_modules_handler` (modules discovery) reply path.
+- Any future code that does the `nxt_port_socket_write(..., b)` pattern.
+
+These should be added incrementally as features touching those paths
+land — not as one big rollout.

From f79e64f14e8a1bc50a73e4e048bdadd2b366be81 Mon Sep 17 00:00:00 2001
From: Claude <noreply@anthropic.com>
Date: Fri, 8 May 2026 01:07:53 +0000
Subject: [PATCH 2/2] docs(roadmap): address Gemini review + cross-link
 security audit

Gemini code-assist (PR #9) flagged four issues; all fixed:

- plan-ipc-hardening.md classify-step #4 had completion-first ordering;
  nxt_port_error_handler at src/nxt_port_socket.c:1361 actually closes
  the fd before queueing completion handlers. Reordered to FD-first
  with the canonical line citation.
- plan-malloc-injection.md symbol-coverage list contradicted the design
  intent. Clarified that nxt_port_msg_alloc and other Unit-level helpers
  are reached via the stack-walk filter, not via direct wrappers; and
  bumped mmap from "future" to v1 ship so audit V11 can use it.
- plan-malloc-injection.md stack-walk used a fixed __builtin_return_address(1).
  Replaced with a configurable-depth walk (default 8, per-target /N
  override), with rationale that nxt_port_msg_alloc -> nxt_malloc ->
  malloc puts the logical caller two frames up, not one. Added
  pre-resolution of target symbols at shim init to dodge dladdr() locks.
- plan-malloc-injection.md fixture snippet had an unused tmp_path/sock
  variable and a control-socket-path mismatch. Dropped the unused arg
  and documented the /tmp/malloc-inject-<pid>.sock convention plus the
  unit-fixture's responsibility to export MALLOC_INJECT_CONTROL.

Cross-link to the wider security audit
(gist andypost/e04a4a642e168de2b8435a593f03b84b):

- README.md gets a "See also" pointing at the audit and explaining
  these plans sit outside the audit's PR-A..PR-I tracker (they're
  follow-ups to the audit's "Known/Already-Fixed" precedent, PR #56).
- plan-ipc-hardening.md "Out of scope" now calls out audit slot PR-E
  (general FD-lifetime hygiene) so reviewers don't ask why it isn't
  rolled in.
- plan-malloc-injection.md "Suggested follow-on uses" lists audit V11
  (compression mmap FD leak) as the natural second consumer once the
  mmap wrapper ships.

No source code touched.
---
 roadmap/README.md                | 11 ++++++++
 roadmap/plan-ipc-hardening.md    | 10 +++++--
 roadmap/plan-malloc-injection.md | 48 +++++++++++++++++++++++++-------
 3 files changed, 57 insertions(+), 12 deletions(-)

diff --git a/roadmap/README.md b/roadmap/README.md
index 8d369882a..590f953ae 100644
--- a/roadmap/README.md
+++ b/roadmap/README.md
@@ -15,3 +15,14 @@ new session can pick one up and execute it without prior context.
 These two plans are designed to be tackled as separate PRs; the malloc
 harness is a natural follow-up to the IPC hardening pass and would
 backfill regression coverage for both PRs after the fact.
+
+## See also
+
+The repo-wide security audit
+([gist `andypost/e04a4a642e168de2b8435a593f03b84b`](https://gist.github.com/andypost/e04a4a642e168de2b8435a593f03b84b))
+catalogues 45+ findings across 14 vectors and slots them into PRs
+PR-A through PR-I. The plans in this folder sit **outside** that
+tracker — they're follow-on cleanup of PR #56's precedent (see the
+audit appendix "Known/Already-Fixed"). The malloc-injection harness is
+also useful for fencing audit findings that need allocator failures to
+trigger (e.g. V11 — compression mmap FD leak).
diff --git a/roadmap/plan-ipc-hardening.md b/roadmap/plan-ipc-hardening.md
index 79812c940..75123ae8f 100644
--- a/roadmap/plan-ipc-hardening.md
+++ b/roadmap/plan-ipc-hardening.md
@@ -98,8 +98,9 @@ Classify each:
    in commit message).
 2. Fd with `CLOSE_FD` → close fd on `!= NXT_OK`.
 3. Buffer with refcount-bearing completion → run completion on `!= NXT_OK`.
-4. Both → do both, completion first (matches `nxt_port_error_handler`
-   ordering; see PR #8 commit `5a9f37d`).
+4. Both → do both, FD first (matches `nxt_port_error_handler` ordering
+   at `src/nxt_port_socket.c:1361` — `nxt_port_msg_close_fd(msg)` runs
+   before each buf's `completion_handler` is queued).
 
 ### Path-join helper (gemini PR #6 finding 3, deferred)
 
@@ -177,6 +178,11 @@ Reject in handlers; alert; reply with `NXT_PORT_MSG_RPC_ERROR`.
 - Cleaning up `nxt_runtime.c:511`/`:533` shutdown cascades unless they
   classify into category 2/3/4 above. Most likely category 1 (purely
   best-effort process-exit signaling).
+- General FD-lifetime hygiene across the rest of the codebase (audit
+  slot PR-E: accept-CLOEXEC, pipe-CLOEXEC, compression mmap FD leak,
+  plain `accept()` without CLOEXEC, etc.). The scope overlaps
+  conceptually but the bugs are pre-existing and unrelated to the IPC
+  reply-failure path; track and ship separately.
 
 ## Suggested commit shape
 
diff --git a/roadmap/plan-malloc-injection.md b/roadmap/plan-malloc-injection.md
index 5474bb921..2a24ccc40 100644
--- a/roadmap/plan-malloc-injection.md
+++ b/roadmap/plan-malloc-injection.md
@@ -85,8 +85,16 @@ narrowly:
 - `mmap` — covers `nxt_port_mmap_*` paths if they get added later.
 - `posix_memalign` — covers aligned allocations in `nxt_buf_mem_alloc` etc.
 
-For PR scope: ship only `malloc` + `posix_memalign`. The rest can come as
-needed.
+For PR scope: ship `malloc`, `posix_memalign`, and `mmap` wrappers. The
+`mmap` wrapper costs almost nothing extra at the shim layer and lets the
+harness fence audit V11 (compression mmap FD leak) as a natural second
+consumer.
+
+Direct wrappers for `nxt_port_msg_alloc` and similar Unit-level helpers
+are intentionally **not** shipped — the stack-walk filter (next
+subsection) lets tests target a logical site like
+`malloc@nxt_port_msg_alloc:1` by walking up from `malloc` itself, so
+each Unit-level helper does not need its own wrapper.
 
 ### Per-symbol vs per-call-site
 
@@ -98,11 +106,24 @@ support targeting via a stack-walk filter:
 MALLOC_INJECT_TARGETS  "malloc@nxt_port_msg_alloc:1"
 ```
 
-Implementation: when targeted-mode is active, walk one frame up the
-stack with `__builtin_return_address(1)` and resolve via `dladdr()`; if
-the symbol matches, count and possibly fail. This costs ~1 µs per call
-in injection mode, zero when disabled (the shim early-returns when
-`MALLOC_INJECT_TARGETS` is empty).
+Implementation: when targeted-mode is active, walk up the stack via
+`__builtin_return_address(N)` for `N = 1..MAX_DEPTH` (default 8) and
+resolve each frame via `dladdr()`; if any frame's symbol matches the
+targeted caller, count and possibly fail. Single-frame lookup is too
+shallow for Unit's allocator stack — e.g.
+`nxt_port_msg_alloc → nxt_malloc → malloc` puts the logical caller two
+frames up, and `nxt_mp_alloc → nxt_malloc → malloc` puts it two frames
+up too. Depth is configurable per target via a `/N` suffix:
+`malloc@nxt_port_msg_alloc/4:1` caps the walk at 4 frames for that
+target, useful for keeping the cost bounded on hot allocations.
+
+Pre-resolve target symbols at shim init via one-time `dlsym()` so the
+per-call hot path only does `dladdr()` (or, better, an address-range
+compare against pre-resolved symbol bounds). This avoids `dladdr()`'s
+per-call internal lock on glibc versions where it isn't fully reentrant.
+
+Cost: a few µs per call in injection mode, zero when disabled (the shim
+early-returns when `MALLOC_INJECT_TARGETS` is empty).
 
 ## Layout
 
@@ -136,14 +157,18 @@ of the default build. CI invokes it explicitly before running the
 
 ```python
 @pytest.fixture
-def malloc_inject(unit, tmp_path):
+def malloc_inject(unit):
     """Yield a context manager that activates allocator fault injection
     against the running Unit instance.  Requires the Unit binary to have
     been started under LD_PRELOAD=build/malloc_inject.so (handled by the
     `unit` fixture when the test is decorated with @pytest.mark.malloc_inject).
-    """
-    sock = tmp_path / 'malloc-inject.sock'
 
+    The shim's control socket path is `/tmp/malloc-inject-<pid>.sock` by
+    convention; `unit.pid` is set after the daemon starts.  The `unit`
+    fixture is responsible for exporting `MALLOC_INJECT_CONTROL` to the
+    daemon's environment so the path matches what `_send_to_shim()` here
+    expects.
+    """
     @contextmanager
     def _activate(fail_when, call_count, then='succeed'):
         spec = f'{fail_when}:{call_count}'
@@ -277,6 +302,9 @@ Once the harness exists, the same pattern fences:
 - The cert-rotation flow under heavy reconfigure churn.
 - `nxt_router_access_log_reopen` retry on `nxt_port_socket_write` failure.
 - `nxt_main_port_modules_handler` (modules discovery) reply path.
+- **Audit V11** — `nxt_http_compression.c:280` FD leak when `mmap()` of
+  the source file fails. Natural second-consumer test for the `mmap`
+  wrapper shipped in v1.
 - Any future code that does the `nxt_port_socket_write(..., b)` pattern.
 
 These should be added incrementally as features touching those paths