docs(roadmap): plans for IPC hardening pass + malloc-injection harness by andypost · Pull Request #9 · andypost/unit

andypost · 2026-05-08T00:33:21Z

Summary

Two self-contained implementation plans for the natural Tier 2 follow-ups to PRs #6 (TLS OCSP stapling) and #8 (cert/script IPC leak fix). Docs only — no source changes. Designed so a fresh session can pick either one up and execute without prior context from the PR #6 / #8 review threads.

Files

`roadmap/plan-ipc-hardening.md` — IPC layer cleanup pass

Consolidates the items declined as out-of-scope on PR #6 and PR #8:

Sender-side nxt_mp_retain audit — verify no remaining sites repeat the pre-fix shape.
Receiver-side fd-close-on-send-failure audit — every (void) nxt_port_socket_write(..., file.fd, ...) site that ships NXT_PORT_MSG_CLOSE_FD.
Buffer-completion-on-send-failure audit — every (void) nxt_port_socket_write(..., b) site whose b->completion_handler releases a refcount-bearing resource.
Path-join helper for cert/script/OCSP store handlers (Gemini PR Add TLS OCSP stapling support with certificate store integration #6 finding 3, declined). Defends against the implicit "ends with /" assumption on rt->{certs,scripts}.start.
Leaf-name validation at the controller→main-process trust boundary — defense in depth.

Includes the exact site lists from the PR #6 / PR #8 audits, scope-out callouts, test plan, and a quick-reference command bag for the next session. ~2 days estimated.

`roadmap/plan-malloc-injection.md` — fault-injection harness

LD_PRELOAD shim + pytest fixture so the leaks fixed by PR #6 and PR #8 are regression-fenced (currently they're review-verified only because the trigger requires malloc() to fail, which CI can't drive).

Design notes cover:

Why LD_PRELOAD and not __malloc_hook (deprecated in glibc 2.34+) or build-time wrappers.
Per-symbol + per-call-site targeting via stack-walk filter (MALLOC_INJECT_TARGETS=malloc@nxt_port_msg_alloc:1).
File layout (tools/malloc_inject/), pytest fixture wiring, and a separate CI workflow.
Three first-consumer tests covering both PRs' leak paths exactly.
Risk callouts (setuid, thread safety, dlsym bootstrap, glibc-vs-jemalloc).

~5–6 days estimated. Designed to be incremental — once shipped, future PRs can add new fault-injection consumers without re-litigating the harness design.

`roadmap/README.md`

Index so the folder is discoverable in isolation.

Suggested execution order

PR fix(port): plug mp-pool retain and fd/buffer leaks in IPC reply paths #8 merges → forwards upstream to freeunitorg/freeunit.
PR Add TLS OCSP stapling support with certificate store integration #6 rebases → merges → forwards upstream.
plan-ipc-hardening PR opens against post-merge master (so the audit operates on canonical line numbers).
plan-malloc-injection PR opens — backfills regression coverage for the leak fixes plus the new IPC-hardening fixes.

Each plan PR can be picked up by a fresh Claude session via /loop or invoked directly; both prompts are self-contained.

Test plan

Both files render correctly as markdown.
No source code touched (git diff origin/master --stat shows only roadmap/ additions).

Out of scope

Actually implementing either plan. That's the next session's job.

Generated by Claude Code

…rness Two self-contained implementation plans for the natural follow-ups to PRs #6 (TLS OCSP stapling) and #8 (cert/script IPC retain/fd leak fix): * roadmap/plan-ipc-hardening.md — consolidate the path-join, send-failure, and mp-pool retain audits that were declined as out-of-scope on PR #6 and PR #8. Concrete site list, command bag, ~2 days estimate. * roadmap/plan-malloc-injection.md — LD_PRELOAD fault-injection harness so the leaks fixed by PR #6 / PR #8 stop being review-only verified. Three first-consumer tests cover both PRs' leak paths. Design notes on shim layout, pytest fixture, CI workflow, ~5–6 day estimate. * roadmap/README.md — index so the folder is discoverable in isolation. These docs only; no source changes. Each plan is self-contained so a fresh session can pick one up and execute without prior context.

gemini-code-assist

Code Review

This pull request introduces a roadmap directory with detailed implementation plans for IPC layer hardening and a malloc failure injection test harness. The IPC plan aims to prevent resource leaks during communication failures and improve path resolution safety. The test harness plan outlines an LD_PRELOAD-based shim for simulating allocation failures. Review feedback identifies a need to align the IPC error handling order with existing logic, resolve contradictions in the injection shim's symbol coverage, and address technical concerns regarding stack-walking fragility and socket path consistency in the test fixture.

gemini-code-assist · 2026-05-08T00:35:39Z

+4. Both → do both, completion first (matches `nxt_port_error_handler`
+   ordering; see PR #8 commit `5a9f37d`).


The plan suggests running the completion handler before closing the file descriptor to match nxt_port_error_handler. However, in the current implementation of nxt_port_error_handler in src/nxt_port_socket.c (lines 1361-1371), nxt_port_msg_close_fd(msg) is called before the completion handlers are added to the work queue. The plan should be updated to maintain consistency with the existing error handling logic.

Suggested change

4. Both → do both, completion first (matches `nxt_port_error_handler`

ordering; see PR #8 commit `5a9f37d`).

4. Both → do both, FD first (matches `nxt_port_error_handler`

ordering; see `src/nxt_port_socket.c:1361`).

gemini-code-assist · 2026-05-08T00:35:39Z

+For PR scope: ship only `malloc` + `posix_memalign`. The rest can come as
+needed.


There is a contradiction between the "Symbol coverage" section and the examples provided earlier in the document. The coverage section states that only malloc and posix_memalign will be shipped, but the examples in the "Goal" and "Counter semantics" sections use nxt_port_msg_alloc as a target symbol. To support direct targeting of nxt_port_msg_alloc without complex stack walking on every malloc call, it should be included in the wrapped symbols.

Suggested change

For PR scope: ship only `malloc` + `posix_memalign`. The rest can come as

needed.

For PR scope: ship `malloc`, `posix_memalign`, and `nxt_port_msg_alloc` wrappers.

The rest can come as needed.

gemini-code-assist · 2026-05-08T00:35:39Z

+stack with `__builtin_return_address(1)` and resolve via `dladdr()`; if
+the symbol matches, count and possibly fail. This costs ~1 µs per call


Using a fixed frame offset of 1 with __builtin_return_address is likely too fragile for Unit's architecture. Most allocations in the codebase are wrapped through multiple layers (e.g., nxt_mp_alloc -> nxt_malloc -> malloc). A frame index of 1 will often point to an internal Unit wrapper rather than the logical call site intended for targeting. The shim should ideally walk the stack to find the target symbol or support a configurable search depth.

gemini-code-assist · 2026-05-08T00:35:39Z

+    been started under LD_PRELOAD=build/malloc_inject.so (handled by the
+    `unit` fixture when the test is decorated with @pytest.mark.malloc_inject).
+    """
+    sock = tmp_path / 'malloc-inject.sock'


The sock variable is defined in the fixture snippet but is not used in the subsequent _activate function or _send_to_shim call. Additionally, if the shim expects a control socket at /tmp/malloc-inject-<pid>.sock as stated in the design, using tmp_path (which is a per-test directory) might cause a mismatch unless the MALLOC_INJECT_CONTROL environment variable is also dynamically updated in the unit fixture.

andypost · 2026-05-08T00:47:29Z

https://gist.github.com/andypost/e04a4a642e168de2b8435a593f03b84b

Gemini code-assist (PR #9) flagged four issues; all fixed: - plan-ipc-hardening.md classify-step #4 had completion-first ordering; nxt_port_error_handler at src/nxt_port_socket.c:1361 actually closes the fd before queueing completion handlers. Reordered to FD-first with the canonical line citation. - plan-malloc-injection.md symbol-coverage list contradicted the design intent. Clarified that nxt_port_msg_alloc and other Unit-level helpers are reached via the stack-walk filter, not via direct wrappers; and bumped mmap from "future" to v1 ship so audit V11 can use it. - plan-malloc-injection.md stack-walk used a fixed __builtin_return_address(1). Replaced with a configurable-depth walk (default 8, per-target /N override), with rationale that nxt_port_msg_alloc -> nxt_malloc -> malloc puts the logical caller two frames up, not one. Added pre-resolution of target symbols at shim init to dodge dladdr() locks. - plan-malloc-injection.md fixture snippet had an unused tmp_path/sock variable and a control-socket-path mismatch. Dropped the unused arg and documented the /tmp/malloc-inject-<pid>.sock convention plus the unit-fixture's responsibility to export MALLOC_INJECT_CONTROL. Cross-link to the wider security audit (gist andypost/e04a4a642e168de2b8435a593f03b84b): - README.md gets a "See also" pointing at the audit and explaining these plans sit outside the audit's PR-A..PR-I tracker (they're follow-ups to the audit's "Known/Already-Fixed" precedent, PR nginx#56). - plan-ipc-hardening.md "Out of scope" now calls out audit slot PR-E (general FD-lifetime hygiene) so reviewers don't ask why it isn't rolled in. - plan-malloc-injection.md "Suggested follow-on uses" lists audit V11 (compression mmap FD leak) as the natural second consumer once the mmap wrapper ships. No source code touched.

gemini-code-assist Bot reviewed May 8, 2026

View reviewed changes

andypost mentioned this pull request May 8, 2026

fix(port): plug mp-pool retain and fd/buffer leaks in IPC reply paths freeunitorg/freeunit#56

Merged

3 tasks

This was referenced May 8, 2026

fix(conn): route to error_handler on partial-write-then-error (P3) #16

Draft

fix(port): plug NULL-deref in port read handlers under buf-alloc OOM #19

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(roadmap): plans for IPC hardening pass + malloc-injection harness#9

docs(roadmap): plans for IPC hardening pass + malloc-injection harness#9
andypost wants to merge 2 commits into
masterfrom
claude/plan-ipc-hardening-tier2

andypost commented May 8, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Uh oh!

gemini-code-assist Bot May 8, 2026

Uh oh!

gemini-code-assist Bot May 8, 2026

Uh oh!

gemini-code-assist Bot May 8, 2026

Uh oh!

andypost commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		4. Both → do both, completion first (matches `nxt_port_error_handler`
		ordering; see PR #8 commit `5a9f37d`).

		For PR scope: ship only `malloc` + `posix_memalign`. The rest can come as
		needed.

-For PR scope: ship only `malloc` + `posix_memalign`. The rest can come as
-needed.
+For PR scope: ship `malloc`, `posix_memalign`, and `nxt_port_msg_alloc` wrappers.
+The rest can come as needed.

		stack with `__builtin_return_address(1)` and resolve via `dladdr()`; if
		the symbol matches, count and possibly fail. This costs ~1 µs per call

Conversation

andypost commented May 8, 2026

Summary

Files

roadmap/plan-ipc-hardening.md — IPC layer cleanup pass

roadmap/plan-malloc-injection.md — fault-injection harness

roadmap/README.md

Suggested execution order

Test plan

Out of scope

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 8, 2026

Choose a reason for hiding this comment

Uh oh!

andypost commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`roadmap/plan-ipc-hardening.md` — IPC layer cleanup pass

`roadmap/plan-malloc-injection.md` — fault-injection harness

`roadmap/README.md`