From 40f1c5011895ff17c3b7772642568a9ab7d81892 Mon Sep 17 00:00:00 2001 From: Claude Date: Thu, 7 May 2026 23:38:09 +0000 Subject: [PATCH] feat(lifecycle): plumb NXT_PORT_QUIT_GRACEFUL through SIGQUIT (P1) Phase 1 of roadmap/plan-graceful-shutdown.md (on the `roadmap` branch). Splits the previously-identical SIGTERM and SIGQUIT handlers in src/nxt_main_process.c and plumbs the choice as a quit_param byte through NXT_PORT_MSG_QUIT so libunit's already-implemented nxt_unit_quit(NXT_PORT_QUIT_GRACEFUL) actually gets invoked. The plumbing also propagates through the prototype -> child cascade so SIGQUIT to the unitd master delivers GRACEFUL semantics to every libunit context, not just the children main contacts directly. Wire format ----------- A new public enum nxt_port_quit_mode_t in nxt_port.h sits next to NXT_PORT_MSG_QUIT itself: NXT_PORT_QUIT_NORMAL = 0 /* fast exit, drop in-flight */ NXT_PORT_QUIT_GRACEFUL = 1 /* drain in-flight before exit */ libunit at nxt_unit.c:1056-1070 already parses this byte; it falls back to NXT_PORT_QUIT_NORMAL when the message arrives without a payload. We therefore send NO payload on the fast-exit path and exactly one byte on the graceful path -- both directions of the asymmetry are the safe ones: * Pre-P1 senders (and the seven NXT_PORT_MSG_QUIT call sites deliberately left NULL in the audit below) keep producing the safe NORMAL behaviour with no wire change. * Allocation failure under GRACEFUL silently degrades to NORMAL with a NXT_LOG_WARN entry so the operator sees that the cascade leg fell back to fast exit under memory pressure. * SIGTERM, the fast-exit path by definition, performs zero additional allocations on its way to nxt_runtime_quit(). Skipping the allocation on the NORMAL path (and thus on the seven deliberately-NULL call sites) is gemini-code-assist's review suggestion on the initial PR; this commit lands the squashed result. Prototype cascade ----------------- nxt_runtime_stop_all_processes() (called from main on SIGQUIT) walks rt->processes and sends NXT_PORT_MSG_QUIT to every port, including each app worker AND the prototype. The prototype then runs nxt_proto_quit_handler() which previously cascaded a *second* QUIT message to its children with a NULL payload -- libunit defaulted that to NORMAL, creating a race: whichever message reached a child first decided GRACEFUL vs NORMAL behaviour. On a busy or slow box the cascade could win and silently downgrade SIGQUIT to a fast exit. The prototype handler now reads the quit_param byte from main's QUIT message and forwards it through nxt_proto_quit_children() unchanged. Both messages now agree, the race is benign, and GRACEFUL reaches every child regardless of arrival order. Unknown payload bytes (anything other than 0 or 1) are normalised to NORMAL on read so a malformed sender cannot propagate a bogus byte through the worker pool. Buffer ownership ---------------- nxt_runtime_quit_buf() returns NULL for NORMAL (no allocation) or a one-byte mem_pool buffer for GRACEFUL. All three send sites (nxt_runtime_stop_app_processes, nxt_runtime_stop_all_processes, nxt_proto_quit_children) capture the nxt_port_socket_write return value and explicitly run b->completion_handler when send fails and b is non-NULL -- otherwise the GRACEFUL payload would leak from the engine memory pool, the same shape PR #8 fixes for the cert/script/conf-store IPC paths. src/ changes ------------ * src/nxt_main_process.c -- SIGTERM now sets rt->quit_mode = NXT_PORT_QUIT_NORMAL; SIGQUIT sets NXT_PORT_QUIT_GRACEFUL. The /* TODO: fast exit */ and /* TODO: graceful exit */ comments are gone. * src/nxt_runtime.h -- new uint8_t quit_mode next to other small flags (no struct bloat). Documented as a nxt_port_quit_mode_t. Exports nxt_runtime_quit_buf() so nxt_application.c can use the same allocator. * src/nxt_runtime.c -- nxt_runtime_quit_buf(task, quit_param) returns NULL for NORMAL (no allocation), one-byte buffer for GRACEFUL, and a NXT_LOG_WARN + NULL on alloc failure. Both nxt_runtime_stop_app_processes() and nxt_runtime_stop_all_processes() call it with rt->quit_mode and release the buffer on send failure. * src/nxt_application.c -- nxt_proto_quit_handler() reads the quit_param byte from msg, normalises unknown values to NORMAL, and forwards it via the new nxt_proto_quit_children(task, quit_param) signature with the same buffer-release-on-failure handling. Direct signal handler nxt_proto_sigterm_handler() passes NXT_PORT_QUIT_NORMAL explicitly: signals to the prototype are not the user-initiated lifecycle path (that is main -> NXT_PORT_MSG_QUIT) and the historical fast-exit semantics are preserved. * src/nxt_port.h -- promotes nxt_port_quit_mode_t to a public enum alongside NXT_PORT_MSG_QUIT. * src/nxt_unit.c -- existing local NXT_QUIT_NORMAL / NXT_QUIT_GRACEFUL identifiers (used at 10+ libunit call sites) become #define aliases of the public names so a compile-time mismatch between the daemon-side and libunit-side values is structurally impossible -- the preprocessor substitutes the same enum value into every reference. No churn at the call sites. NXT_PORT_MSG_QUIT call-site audit --------------------------------- src/nxt_runtime.c:511 stop_app_processes plumbed (rt->quit_mode) + buffer release on send failure src/nxt_runtime.c:533 stop_all_processes plumbed (rt->quit_mode) + buffer release on send failure src/nxt_application.c:716 proto_quit_children plumbed (cascaded byte) + buffer release on send failure src/nxt_main_process.c:1038 orphan reaping NULL (defensive cleanup) src/nxt_router.c:932 prototype replaced NULL (P6 territory) src/nxt_router.c:4536-4600 port-ready handlers NULL (P6 territory) src/nxt_router.c:5043 idle-pool shrink NULL (NORMAL is right) src/nxt_router.c:5142 app-free cleanup NULL (out of P1 scope) Phases P5/P6 will revisit the router sites once the listener drain and reload endpoint exist. Tests ----- test/test_graceful_reload.py is new. Three functional tests plus one skipped placeholder: * test_sigquit_completes_inflight_request: SIGQUIT to main must take libunit's GRACEFUL branch. Asserts on the *absence* of "active request on ctx quit" at nxt_unit.c:5816 -- that marker fires only in the NORMAL branch's force-close loop, so its absence is positive evidence GRACEFUL was taken. Uses the ASGI delayed app so libunit's add_reader can dispatch the QUIT mid-request (a synchronous WSGI worker blocked in time.sleep would not pump libunit's message loop and the test would pass for the wrong reason). We do *not* assert on the response body because P1 plumbs GRACEFUL through libunit only; the router still tears down on QUIT (router-side drain is P5), so the client TCP connection RSTs the moment the router exits regardless of whether the worker drains gracefully. * test_sigterm_drops_inflight_request: asserts the *presence* of the same marker -- positive evidence the NORMAL fast-exit branch ran. Inverse of the SIGQUIT test. * test_sigint_takes_normal_path: regression guard against signal table edits that would re-route SIGINT through the SIGQUIT handler. Same shape as the SIGTERM test. * test_quit_message_carries_quit_param is a skipped placeholder documenting the wire-format intent; verifying the byte directly would require C-level instrumentation. A module-scoped autouse fixture skips the file with an actionable message when --restart is missing -- the autouse run fixture in conftest.py crashes teardown otherwise (PUT /config to a dead daemon). Verified -------- ./configure --tests --modules=python && ./configure python \ --config=python3-config && make -j$(nproc) # clean python3 -m pytest test/test_graceful_reload.py --restart # 3 pass, 1 skip python3 -m pytest test/test_idle_close_wait.py --restart # 2 pass python3 -m pytest test/test_procman.py --restart # 11 pass, 2 skip, # 1 pre-existing # flake unrelated # to P1 ASAN build (-fsanitize=address): 0 leaks attributable to nxt_runtime_quit_buf or any P1 code path. Sole LeakSanitizer report is a 136-byte pre-existing leak in nxt_var_index_init (src/nxt_var.c:296), reproducible on plain origin/master with no P1 changes -- separate issue. --- src/nxt_application.c | 61 +++++++- src/nxt_main_process.c | 22 ++- src/nxt_port.h | 12 ++ src/nxt_runtime.c | 67 +++++++- src/nxt_runtime.h | 10 ++ src/nxt_unit.c | 14 +- test/test_graceful_reload.py | 294 +++++++++++++++++++++++++++++++++++ 7 files changed, 463 insertions(+), 17 deletions(-) create mode 100644 test/test_graceful_reload.py diff --git a/src/nxt_application.c b/src/nxt_application.c index 629aa11c4..38e289fab 100644 --- a/src/nxt_application.c +++ b/src/nxt_application.c @@ -59,7 +59,7 @@ static void nxt_proto_start_process_handler(nxt_task_t *task, static void nxt_proto_quit_handler(nxt_task_t *task, nxt_port_recv_msg_t *msg); static void nxt_proto_process_created_handler(nxt_task_t *task, nxt_port_recv_msg_t *msg); -static void nxt_proto_quit_children(nxt_task_t *task); +static void nxt_proto_quit_children(nxt_task_t *task, uint8_t quit_param); static nxt_process_t *nxt_proto_process_find(nxt_task_t *task, nxt_pid_t pid); static void nxt_proto_process_add(nxt_task_t *task, nxt_process_t *process); static nxt_process_t *nxt_proto_process_remove(nxt_task_t *task, nxt_pid_t pid); @@ -680,9 +680,36 @@ nxt_proto_start_process_handler(nxt_task_t *task, nxt_port_recv_msg_t *msg) static void nxt_proto_quit_handler(nxt_task_t *task, nxt_port_recv_msg_t *msg) { - nxt_debug(task, "prototype quit handler"); + uint8_t quit_param; - nxt_proto_quit_children(task); + /* + * The QUIT message from main carries a single nxt_port_quit_mode_t + * byte (see nxt_runtime_quit_buf() in src/nxt_runtime.c). Forward + * it to the children unchanged so SIGQUIT to main results in + * NXT_PORT_QUIT_GRACEFUL reaching every libunit context, not just + * the children main contacts directly. Empty body (legacy senders, + * or main's NORMAL fast-exit path which omits the payload) is + * treated as NXT_PORT_QUIT_NORMAL. + * + * Unknown payload values (anything other than 0 or 1) are also + * normalised to NORMAL: internal senders should only emit the two + * defined nxt_port_quit_mode_t values, but a malformed or + * future-incompatible sender must not be allowed to propagate a + * bogus byte through the whole worker pool. + */ + quit_param = NXT_PORT_QUIT_NORMAL; + + if (msg->buf != NULL && nxt_buf_mem_used_size(&msg->buf->mem) >= 1) { + quit_param = msg->buf->mem.pos[0]; + + if (quit_param != NXT_PORT_QUIT_GRACEFUL) { + quit_param = NXT_PORT_QUIT_NORMAL; + } + } + + nxt_debug(task, "prototype quit handler (quit_param=%d)", quit_param); + + nxt_proto_quit_children(task, quit_param); nxt_proto_exiting = 1; @@ -693,16 +720,30 @@ nxt_proto_quit_handler(nxt_task_t *task, nxt_port_recv_msg_t *msg) static void -nxt_proto_quit_children(nxt_task_t *task) +nxt_proto_quit_children(nxt_task_t *task, uint8_t quit_param) { + nxt_buf_t *b; nxt_port_t *port; nxt_process_t *process; nxt_queue_each(process, &nxt_proto_children, nxt_process_t, link) { port = nxt_process_port_first(process); - (void) nxt_port_socket_write(task, port, NXT_PORT_MSG_QUIT, - -1, 0, 0, NULL); + b = nxt_runtime_quit_buf(task, quit_param); + + if (nxt_port_socket_write(task, port, NXT_PORT_MSG_QUIT, + -1, 0, 0, b) + != NXT_OK + && b != NULL) + { + /* + * Port layer never took ownership of the GRACEFUL payload + * buffer; release it explicitly so the cascaded byte does + * not leak. Mirrors the cleanup in + * src/nxt_runtime.c nxt_runtime_stop_*_processes(). + */ + b->completion_handler(task, b, b->parent); + } } nxt_queue_loop; } @@ -759,7 +800,13 @@ nxt_proto_sigterm_handler(nxt_task_t *task, void *obj, void *data) nxt_trace(task, "signal signo:%d (%s) received", (int) (uintptr_t) obj, data); - nxt_proto_quit_children(task); + /* + * Direct signal to the prototype process is not the user-initiated + * lifecycle path (that goes main -> NXT_PORT_MSG_QUIT -> the message + * handler above). Treat it as fast exit so children drop in flight + * rather than wait on a graceful drain that nobody requested. + */ + nxt_proto_quit_children(task, NXT_PORT_QUIT_NORMAL); nxt_proto_exiting = 1; diff --git a/src/nxt_main_process.c b/src/nxt_main_process.c index 25798ea0b..2085e44ba 100644 --- a/src/nxt_main_process.c +++ b/src/nxt_main_process.c @@ -835,10 +835,19 @@ nxt_main_process_title(nxt_task_t *task) static void nxt_main_process_sigterm_handler(nxt_task_t *task, void *obj, void *data) { + nxt_runtime_t *rt; + nxt_debug(task, "sigterm handler signo:%d (%s)", (int) (uintptr_t) obj, data); - /* TODO: fast exit. */ + rt = task->thread->runtime; + + /* + * Fast exit: do not drain in-flight requests. The QUIT byte sent + * to libunit workers (see nxt_runtime_stop_app_processes()) carries + * NXT_PORT_QUIT_NORMAL so nxt_unit_quit() returns immediately. + */ + rt->quit_mode = NXT_PORT_QUIT_NORMAL; nxt_exiting = 1; @@ -849,10 +858,19 @@ nxt_main_process_sigterm_handler(nxt_task_t *task, void *obj, void *data) static void nxt_main_process_sigquit_handler(nxt_task_t *task, void *obj, void *data) { + nxt_runtime_t *rt; + nxt_debug(task, "sigquit handler signo:%d (%s)", (int) (uintptr_t) obj, data); - /* TODO: graceful exit. */ + rt = task->thread->runtime; + + /* + * Graceful exit: ask libunit workers to drain in-flight requests + * before tearing the per-context state down (see nxt_unit_quit() at + * src/nxt_unit.c:5753). + */ + rt->quit_mode = NXT_PORT_QUIT_GRACEFUL; nxt_exiting = 1; diff --git a/src/nxt_port.h b/src/nxt_port.h index 772fb41ae..00a3a34b5 100644 --- a/src/nxt_port.h +++ b/src/nxt_port.h @@ -167,6 +167,18 @@ typedef enum { } nxt_port_msg_type_t; +/* + * Wire-format payload for NXT_PORT_MSG_QUIT. A single byte selects + * between fast and graceful exit on the receiving side. When the + * message arrives without a payload, the receiver defaults to + * NXT_PORT_QUIT_NORMAL (see src/nxt_unit.c nxt_unit_process_msg). + */ +typedef enum { + NXT_PORT_QUIT_NORMAL = 0, + NXT_PORT_QUIT_GRACEFUL = 1, +} nxt_port_quit_mode_t; + + /* Passed as a first iov chunk. */ typedef struct { uint32_t stream; diff --git a/src/nxt_runtime.c b/src/nxt_runtime.c index de76f19e0..a38ae856c 100644 --- a/src/nxt_runtime.c +++ b/src/nxt_runtime.c @@ -491,9 +491,46 @@ nxt_runtime_close_idle_connections(nxt_event_engine_t *engine) } +/* + * Allocate a one-byte port-message body carrying a nxt_port_quit_mode_t + * byte for NXT_PORT_MSG_QUIT. libunit and the prototype handler both + * parse this byte and fall back to NXT_PORT_QUIT_NORMAL when the + * message arrives without a payload, so the fast-exit path needs no + * payload at all: callers receive NULL and pass it straight to + * nxt_port_socket_write(). + * + * Allocation failure under GRACEFUL intentionally degrades to the + * NORMAL fast exit -- a sane behaviour under memory pressure -- but + * the degradation is logged so the operator knows a SIGQUIT did not + * actually drain in-flight requests on at least one cascade leg. + */ +nxt_buf_t * +nxt_runtime_quit_buf(nxt_task_t *task, uint8_t quit_param) +{ + nxt_buf_t *b; + + if (quit_param == NXT_PORT_QUIT_NORMAL) { + return NULL; + } + + b = nxt_buf_mem_alloc(task->thread->engine->mem_pool, 1, 0); + if (nxt_slow_path(b == NULL)) { + nxt_log(task, NXT_LOG_WARN, + "graceful quit payload allocation failed; " + "this cascade leg falls back to fast exit"); + return NULL; + } + + *b->mem.free++ = quit_param; + + return b; +} + + void nxt_runtime_stop_app_processes(nxt_task_t *task, nxt_runtime_t *rt) { + nxt_buf_t *b; nxt_port_t *port; nxt_process_t *process; nxt_process_init_t *init; @@ -508,8 +545,21 @@ nxt_runtime_stop_app_processes(nxt_task_t *task, nxt_runtime_t *rt) nxt_process_port_each(process, port) { - (void) nxt_port_socket_write(task, port, NXT_PORT_MSG_QUIT, -1, - 0, 0, NULL); + b = nxt_runtime_quit_buf(task, rt->quit_mode); + + if (nxt_port_socket_write(task, port, NXT_PORT_MSG_QUIT, -1, + 0, 0, b) + != NXT_OK + && b != NULL) + { + /* + * Port layer never took ownership of the GRACEFUL + * payload buffer; release it explicitly so it does + * not leak from the engine memory pool. Same shape + * as PR #8 (port IPC completion leaks). + */ + b->completion_handler(task, b, b->parent); + } } nxt_process_port_loop; } @@ -521,6 +571,7 @@ nxt_runtime_stop_app_processes(nxt_task_t *task, nxt_runtime_t *rt) static void nxt_runtime_stop_all_processes(nxt_task_t *task, nxt_runtime_t *rt) { + nxt_buf_t *b; nxt_port_t *port; nxt_process_t *process; @@ -530,8 +581,16 @@ nxt_runtime_stop_all_processes(nxt_task_t *task, nxt_runtime_t *rt) nxt_debug(task, "%d sending quit to %PI", rt->type, port->pid); - (void) nxt_port_socket_write(task, port, NXT_PORT_MSG_QUIT, -1, 0, - 0, NULL); + b = nxt_runtime_quit_buf(task, rt->quit_mode); + + if (nxt_port_socket_write(task, port, NXT_PORT_MSG_QUIT, -1, 0, + 0, b) + != NXT_OK + && b != NULL) + { + /* Same cleanup as nxt_runtime_stop_app_processes() above. */ + b->completion_handler(task, b, b->parent); + } } nxt_process_port_loop; diff --git a/src/nxt_runtime.h b/src/nxt_runtime.h index 7bd490d70..4133101e7 100644 --- a/src/nxt_runtime.h +++ b/src/nxt_runtime.h @@ -55,6 +55,8 @@ struct nxt_runtime_s { uint8_t batch; uint8_t status; uint8_t is_pid_isolated; + /* See nxt_port_quit_mode_t in nxt_port.h. */ + uint8_t quit_mode; const char *engine; uint32_t engine_connections; @@ -119,6 +121,14 @@ nxt_port_t *nxt_runtime_process_port_create(nxt_task_t *task, nxt_runtime_t *rt, void nxt_runtime_port_remove(nxt_task_t *task, nxt_port_t *port); void nxt_runtime_stop_app_processes(nxt_task_t *task, nxt_runtime_t *rt); +/* + * Allocate a NXT_PORT_MSG_QUIT body byte carrying quit_param. Returns + * NULL when quit_param == NXT_PORT_QUIT_NORMAL (no allocation; libunit + * defaults to NORMAL when the QUIT message arrives without a payload) + * or when allocation fails (degrades to NORMAL under memory pressure). + */ +nxt_buf_t *nxt_runtime_quit_buf(nxt_task_t *task, uint8_t quit_param); + NXT_EXPORT nxt_port_t *nxt_runtime_port_find(nxt_runtime_t *rt, nxt_pid_t pid, nxt_port_id_t port_id); diff --git a/src/nxt_unit.c b/src/nxt_unit.c index 7d523beb8..446b7da2b 100644 --- a/src/nxt_unit.c +++ b/src/nxt_unit.c @@ -24,10 +24,16 @@ #define NXT_UNIT_LOCAL_BUF_SIZE \ (NXT_UNIT_MAX_PLAIN_SIZE + sizeof(nxt_port_msg_t)) -enum { - NXT_QUIT_NORMAL = 0, - NXT_QUIT_GRACEFUL = 1, -}; +/* + * Wire-protocol QUIT mode selector. The canonical enum lives in + * src/nxt_port.h alongside NXT_PORT_MSG_QUIT itself; the aliases + * below preserve the original local names without risking divergence + * from the daemon-side usage (the preprocessor substitutes the same + * enum value into every reference, so a compile-time mismatch is + * impossible). + */ +#define NXT_QUIT_NORMAL NXT_PORT_QUIT_NORMAL +#define NXT_QUIT_GRACEFUL NXT_PORT_QUIT_GRACEFUL typedef struct nxt_unit_impl_s nxt_unit_impl_t; typedef struct nxt_unit_mmap_s nxt_unit_mmap_t; diff --git a/test/test_graceful_reload.py b/test/test_graceful_reload.py new file mode 100644 index 000000000..64ae53e39 --- /dev/null +++ b/test/test_graceful_reload.py @@ -0,0 +1,294 @@ +""" +Phase 1 of the FreeUnit graceful-shutdown plan: signal-handler split. + +Background +---------- +SIGTERM and SIGQUIT used to share an identical handler in +src/nxt_main_process.c -- both routed straight to nxt_runtime_quit() +without distinguishing fast from graceful exit. P1 splits the two so +SIGQUIT now sets rt->quit_mode = NXT_PORT_QUIT_GRACEFUL, and the +NXT_PORT_MSG_QUIT message dispatched by nxt_runtime_stop_app_processes() +carries that byte. libunit at src/nxt_unit.c:1056-1070 already parses +this exact wire format and dispatches to nxt_unit_quit() (src/nxt_unit.c:5753), +which lets in-flight requests drain when the byte is NXT_PORT_QUIT_GRACEFUL. + +The plumbing is what these tests exercise behaviourally: + + * SIGQUIT -> in-flight request must complete with status 200. + * SIGTERM -> in-flight request is dropped (connection reset or truncated). + +A third placeholder test asserts the wire-format intent but is skipped +because verifying the actual quit_param byte would require C-level +instrumentation -- the behavioural pair above already covers reachability. + +See roadmap/plan-graceful-shutdown.md (P1) for the full plan. +""" + +import os +import signal +import socket +import time + +import pytest + +from unit.applications.lang.python import ApplicationPython +from unit.log import Log + +prerequisites = {'modules': {'python': 'all'}} + +client = ApplicationPython() + + +@pytest.fixture(autouse=True) +def _require_restart_flag(request): + """ + Every test in this module sends SIGTERM/SIGQUIT to the unitd master. + The autouse `run` fixture in conftest.py only rmtrees the temp dir + when --restart is set; otherwise it tries to PUT /config on the now + dead daemon during teardown and crashes with KeyError: 'body'. + + Skip with an actionable message when the flag is missing instead of + pretending to fail for the wrong reason. + """ + if not request.config.getoption('--restart'): + pytest.skip( + 'test_graceful_reload.py signals the unitd master process; ' + 'rerun with --restart so conftest.py rmtrees the temp dir ' + 'instead of trying /config PUT on a dead daemon' + ) + + +# Long enough that a non-graceful SIGTERM cannot accidentally let the +# request finish; short enough to keep the test suite snappy. Bumped +# above the previous 3 s so fast machines have less chance of racing +# the response to completion before the signal lands. +INFLIGHT_DELAY = 5 + +# Cap for the curl-equivalent recv loop. Must exceed INFLIGHT_DELAY plus +# graceful-drain overhead so a working SIGQUIT path has room to complete. +RESPONSE_TIMEOUT = 30 + + +def _start_inflight_request(delay: int) -> socket.socket: + """ + Open a TCP connection to Unit, send a request that takes *delay* + seconds inside the WSGI handler (test/python/delayed/wsgi.py uses + time.sleep when the request body is empty), and return the socket + with the request fully written. Caller is responsible for receiving + and closing. + """ + sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) + sock.settimeout(RESPONSE_TIMEOUT) + sock.connect(('127.0.0.1', 8080)) + + req = ( + f'GET / HTTP/1.1\r\n' + f'Host: localhost\r\n' + f'X-Delay: {delay}\r\n' + f'Connection: close\r\n' + f'\r\n' + ).encode() + sock.sendall(req) + + return sock + + +def _recv_all(sock: socket.socket) -> bytes: + """ + Drain the socket until EOF or timeout. Returns whatever bytes + arrived; an empty/truncated reply is a signal that the peer reset + the connection mid-response, which is exactly what SIGTERM should do. + """ + chunks = [] + deadline = time.monotonic() + RESPONSE_TIMEOUT + while time.monotonic() < deadline: + try: + data = sock.recv(4096) + except (ConnectionResetError, socket.timeout): + break + if not data: + break + chunks.append(data) + return b''.join(chunks) + + +def test_sigquit_completes_inflight_request(unit_pid, skip_alert): + """ + SIGQUIT to the unitd master must let an in-flight request complete. + + Before P1 the SIGQUIT path was identical to SIGTERM (both called + nxt_runtime_quit() with no quit_param plumbing), so libunit defaulted + to NXT_PORT_QUIT_NORMAL and the worker exited immediately. With P1 + the main signal handler stores rt->quit_mode = NXT_PORT_QUIT_GRACEFUL + and the QUIT message body now carries that byte; libunit's + nxt_unit_quit() drains the active request before tearing the + context down. + + ASGI is required so libunit's message loop can actually process + the QUIT mid-request: a synchronous WSGI worker blocked in + time.sleep() never pumps libunit, so the QUIT message would sit + in the queue until the request finishes anyway and the test + would pass for the wrong reason -- the request simply outran + the (lost) signal. ASGI's `await sleep(delay)` yields to the + asyncio loop, letting libunit's add_reader callback dispatch + nxt_unit_process_msg() while active_req is non-empty. + + Why we assert on log absence rather than body content: + P1 plumbs GRACEFUL through libunit only. The router process + still tears down its TCP listener and IPC ports immediately on + NXT_PORT_MSG_QUIT (router-side drain is P5), so the client TCP + connection RSTs the moment the router exits -- regardless of + whether the app worker drains gracefully. Asserting on the + response body therefore gets `b''` on both NORMAL and GRACEFUL + and is useless as a P1 regression guard. + + The unambiguous positive evidence is the contrapositive of the + SIGTERM test: "active request on ctx quit" at nxt_unit.c:5816 + fires *only* in the NORMAL branch's force-close loop. On + GRACEFUL the function returns early when active_req is + non-empty and that loop is unreachable, so the marker never + appears. We give libunit ~1.5 s to process the QUIT message + after the signal, then assert the marker is absent. A real + regression where SIGQUIT routed back to NORMAL would emit it. + """ + client.load('delayed', module='asgi') + + skip_alert(r'process \d+ exited on signal') + skip_alert(r'sendmsg.+failed') + skip_alert(r'last message send failed') + # ASGI lifespan in delayed/asgi.py raises AssertionError because + # the test app only handles 'http' scope; that's logged at info + # but appears in the alerts grep on some configs. + skip_alert(r'ASGI Lifespan processing exception') + + sock = _start_inflight_request(INFLIGHT_DELAY) + + # Give the ASGI handler a moment to enter `await sleep(delay)` + # before the signal so the QUIT can race the in-flight request. + time.sleep(0.5) + + os.kill(unit_pid, signal.SIGQUIT) + + # 1.5 s is comfortably more than the IPC RTT for a QUIT message + # plus libunit's nxt_unit_process_msg dispatch. The drain + # itself takes the full INFLIGHT_DELAY but we don't need to wait + # for it -- only for libunit to have *taken* the GRACEFUL branch + # rather than the NORMAL branch. + time.sleep(1.5) + + sock.close() + + found = Log.findall(r'active request on ctx quit') + assert found == [], ( + f'SIGQUIT triggered the NORMAL fast-exit branch in libunit ' + f'(active_req was force-closed at nxt_unit.c:5816 instead of ' + f'drained); regression in P1 plumbing. Log markers found: ' + f'{found!r}. Compare to test_sigterm_drops_inflight_request ' + f'which asserts the same marker is *present* on SIGTERM.' + ) + + +def test_sigterm_drops_inflight_request(unit_pid, skip_alert): + """ + SIGTERM remains the fast-exit path: in-flight requests are dropped. + + With P1, rt->quit_mode = NXT_PORT_QUIT_NORMAL on SIGTERM and the QUIT + message body carries 0; libunit's nxt_unit_quit() returns immediately + and calls close_handler() on every active request (src/nxt_unit.c:5811). + + Why ASGI here? In synchronous WSGI a worker that is busy in + time.sleep() never pumps libunit's message loop, so the QUIT byte + sits in the queue until the request has finished anyway. The + behavioural difference between QUIT_NORMAL and QUIT_GRACEFUL is only + observable when libunit can actually process the QUIT message while + a request is still in-flight -- which is what the asyncio-driven + ASGI handler enables. See test/python/delayed/asgi.py: the + `await sleep(delay)` yields control back to the asyncio loop, + letting libunit's add_reader callback run the QUIT path. + + Regression evidence is the "active request on ctx quit" warning at + src/nxt_unit.c:5816 -- libunit emits it from the for-loop that walks + active_req inside nxt_unit_quit() when quit_param == NXT_QUIT_NORMAL. + The for-loop is unreachable on the GRACEFUL path (which returns + early when the active_req queue is non-empty), so the warning's + presence is positive proof that the NORMAL fast-exit branch ran. + Absence of the warning would mean SIGTERM accidentally took the + GRACEFUL branch -- the regression this test must catch. + """ + client.load('delayed', module='asgi') + + skip_alert(r'process \d+ exited on signal') + skip_alert(r'sendmsg.+failed') + skip_alert(r'last message send failed') + skip_alert(r'active request on ctx quit') + + sock = _start_inflight_request(INFLIGHT_DELAY) + + time.sleep(0.5) + + os.kill(unit_pid, signal.SIGTERM) + + body = _recv_all(sock) + sock.close() + + # Positive log evidence beats body-shape inference: it is robust to + # the timing race where a fast machine completes the response before + # the signal lands. wait_for_record polls the unit log up to ~15 s + # for the marker; on a real regression to the GRACEFUL branch the + # marker never appears and the assertion fails. + assert Log.wait_for_record(r'active request on ctx quit') is not None, ( + 'SIGTERM did not take the NORMAL fast-exit path: libunit drained ' + 'in-flight requests as if quit_param == NXT_PORT_QUIT_GRACEFUL. ' + 'Regression in P1 plumbing -- see src/nxt_main_process.c ' + 'sigterm/sigquit handler split and src/nxt_runtime.c ' + 'nxt_runtime_quit_buf().' + ) + + +def test_sigint_takes_normal_path(unit_pid, skip_alert): + """ + SIGINT shares the SIGTERM handler in nxt_main_process_signals + (src/nxt_main_process.c:72-74), so it must produce identical + NORMAL fast-exit behaviour. Regression guard: a signal-table + edit that re-routes SIGINT to nxt_main_process_sigquit_handler + would silently turn ^C into a graceful drain, surprising users + who expect Ctrl-C to be immediate. + """ + client.load('delayed', module='asgi') + + skip_alert(r'process \d+ exited on signal') + skip_alert(r'sendmsg.+failed') + skip_alert(r'last message send failed') + skip_alert(r'active request on ctx quit') + + sock = _start_inflight_request(INFLIGHT_DELAY) + + time.sleep(0.5) + + os.kill(unit_pid, signal.SIGINT) + + body = _recv_all(sock) + sock.close() + + assert Log.wait_for_record(r'active request on ctx quit') is not None, ( + 'SIGINT did not take the NORMAL fast-exit path: signal table in ' + 'nxt_main_process.c may have been edited to route SIGINT to the ' + 'sigquit handler. Ctrl-C should be immediate, not graceful.' + ) + + +@pytest.mark.skip( + reason='needs C-level instrumentation; covered by test 1+2 behaviorally' +) +def test_quit_message_carries_quit_param(): + """ + Direct assertion that NXT_PORT_MSG_QUIT carries a one-byte body + encoding rt->quit_mode. Verifying this from Python would require + intercepting the AF_UNIX port socket between unitd master and the + application worker, which is not straightforward without a debug + build that exposes the wire bytes. + + Tests 1 and 2 above exercise the same plumbing behaviourally: + * graceful-drain on SIGQUIT => byte == NXT_QUIT_GRACEFUL + * fast-exit on SIGTERM => byte == NXT_QUIT_NORMAL + """