From f6cfa6126e06d7a9335474a2010255b8b8f7bea0 Mon Sep 17 00:00:00 2001 From: Claude Date: Fri, 8 May 2026 10:44:37 +0000 Subject: [PATCH] fix(port): plug NULL-deref in port read handlers under buf-alloc OOM MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fixes three TODO/FIXME placeholders in src/nxt_port_socket.c that left port-read paths with a latent NULL-pointer crash under sustained malloc pressure. Identified during the P3 (Pattern D′) audit on roadmap/plan-graceful-shutdown.md but classified as IPC-resilience, not write-path D′ — split into its own PR. Site A — nxt_port_read_handler (line 745-750) --------------------------------------------- The receive loop was: for ( ;; ) { b = nxt_port_buf_alloc(port); if (nxt_slow_path(b == NULL)) { /* TODO: disable event for some time */ } iov[1].iov_base = b->mem.pos; /* SIGSEGV if b == NULL */ ... } When nxt_port_buf_alloc()'s nxt_buf_mem_alloc() failed (port->free_bufs empty + mp_alloc returned NULL) the empty TODO body fell through and dereferenced b at line 755. Worker SIGSEGV under sustained memory pressure on the port read side. New behaviour: log at INFO via nxt_socket_error_level conventions and nxt_fd_event_block_read() the port socket, then return. The read event stays in engine data structures so a subsequent call to nxt_fd_event_enable_read() (when buf pressure abates) can revive it without a fresh nxt_fd_event_add(). This matches the agent-recommended "disable event for some time" intent of the original TODO without needing to invent a new self-recovery timer in this PR. Recovery wiring (auto-rearm when buf-pool drains) is intentionally deferred — see "Out of scope" below. Site B — nxt_port_queue_read_handler (line 889-893) --------------------------------------------------- Identical NULL-deref shape after the queue dequeue sub-block. Same fix, plus the existing pattern at lines 884-887 of decrementing queue->nitems on early return — preserved here: nxt_atomic_fetch_add(&queue->nitems, -1); nxt_fd_event_block_read(task->thread->engine, &port->socket); return; The two log lines distinguish the message-mode ("read-buf alloc failed; blocking read event") from queue-mode ("blocking queue read event") so operators can tell which path tripped. Site C — nxt_port_error_handler (line 1345) ------------------------------------------- A bare /* TODO */ with no actionable fix path attached. The handler itself already does the correct cleanup (close fds, run completion handlers, walk the message queue). Comment removed for clarity; no behavioural change. Scope notes ----------- This PR is the read-side IPC-resilience portion of the P3 plan. The write-path D′ contract (the named scope of P3) ships separately in PR #16 — different code path, different bug class, no overlap. Out of scope ------------ - Auto-rearm of the read event when port->mem_pool buf pressure abates. Today, after buf-alloc OOM the port stays inactive until external action (timer, restart, manual reload). Wiring an enable-on-buf-free callback would be a multi-file refactor of nxt_port_buf_completion + the engine's wakeup discipline; non-trivial and orthogonal to plugging the crash. Tracked as a follow-up under the IPC-hardening tier 2 plan (roadmap/plan-ipc-hardening.md, PR #9). - The deterministic regression test for malloc-failure cases — needs the LD_PRELOAD allocator-failure injection harness designed in roadmap/plan-malloc-injection.md (PR #9). Once that harness lands, a test that exercises nxt_port_buf_alloc() returning NULL N times in a row becomes possible; today the path is review-verified only. Tests ----- ./configure --tests make -j$(nproc) # clean build python3 -m pytest test/test_idle_close_wait.py test/test_static.py \ test/test_configuration.py # 18 pass, 33 skip # skips: missing # python module(s); # unrelated to this # change. The configuration test suite drives heavy port-IPC traffic via the control API (controller ↔ router ↔ main) on every conf write — no regression in normal-path behaviour observed. Independence ------------ Does not depend on PR #7 (P1 signal split), PR #11 (P2 listener drain), or PR #16 (P3 write-path D′). Branched off master; touches one file. --- src/nxt_port_socket.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/src/nxt_port_socket.c b/src/nxt_port_socket.c index 5752d5ab0..408df4fbf 100644 --- a/src/nxt_port_socket.c +++ b/src/nxt_port_socket.c @@ -746,7 +746,13 @@ nxt_port_read_handler(nxt_task_t *task, void *obj, void *data) b = nxt_port_buf_alloc(port); if (nxt_slow_path(b == NULL)) { - /* TODO: disable event for some time */ + nxt_log(task, NXT_LOG_INFO, + "port{%d,%d} %d: read-buf alloc failed; " + "blocking read event", + (int) port->pid, (int) port->id, port->socket.fd); + + nxt_fd_event_block_read(task->thread->engine, &port->socket); + return; } iov[0].iov_base = &msg.port_msg; @@ -889,7 +895,14 @@ nxt_port_queue_read_handler(nxt_task_t *task, void *obj, void *data) b = nxt_port_buf_alloc(port); if (nxt_slow_path(b == NULL)) { - /* TODO: disable event for some time */ + nxt_log(task, NXT_LOG_INFO, + "port{%d,%d} %d: read-buf alloc failed; " + "blocking queue read event", + (int) port->pid, (int) port->id, port->socket.fd); + + nxt_atomic_fetch_add(&queue->nitems, -1); + nxt_fd_event_block_read(task->thread->engine, &port->socket); + return; } if (n >= (ssize_t) sizeof(nxt_port_msg_t)) { @@ -1342,7 +1355,6 @@ nxt_port_error_handler(nxt_task_t *task, void *obj, void *data) nxt_port_send_msg_t *msg; nxt_debug(task, "port error handler %p", obj); - /* TODO */ port = nxt_container_of(obj, nxt_port_t, socket);