Skip to content

net,tls: port Node.js net/tls compatibility tests and fix the gaps they surface — half-open/reset/write semantics, server TLSSocket wrap, session/keylog, SNICallback/ALPNCallback, pfx, OpenSSL error shapes, addCACert, local binding (+305 tests)#31155

Open
cirospaciari wants to merge 1 commit into
claude/upgrade-nodejs-26-v2from
claude/port-node-net-tls-tests-2

Conversation

@cirospaciari

@cirospaciari cirospaciari commented May 21, 2026

Copy link
Copy Markdown
Member

Brings node:net and node:tls compatibility in line with Node by porting upstream tests verbatim and fixing the native gaps they expose.

parallel sequential total
net 141/148 (95.3%) 10/12 151/160 (94.4%)
tls 150/215 (69.8%) 4/4 154/219 (70.3%)

305 verbatim upstream tests added.

Behavior changes

  • Half-open semantics: socket.end() now half-closes (FIN) instead of full-closing, so a server's response after a client end() is delivered before close.
  • Reset/write semantics: socket.resetAndDestroy(), server.close({ resetConnections }), write-after-end / write-after-destroy errors, and the post-write callback contract match Node.
  • Close-time read errors: a reset delivered with the close destroys the socket with Node's read ECONNRESET shape when an error listener is attached and a clean EOF has not already been delivered; a codeless close error that carries an errno derives its code from it. A reset that lands after the exchange already ended cleanly stays a graceful close.
  • Local binding: net.connect({ localAddress, localPort }) binds before connecting on every connect path including deferred DNS resolution.
  • The 'session' and 'keylog' events, end-to-end through the native handler-slot chain; the --tls-keylog flag.
  • OpenSSL error decomposition: handshake/context-creation failures carry Node's code/library/function/reason properties (ERR_SSL_<REASON>, ERR_OSSL_<LIB>_<REASON>).
  • secureContext.context.addCACert() extends the full default trust set (bundled roots, NODE_EXTRA_CA_CERTS, system CAs when enabled) on the context's own store, and chain verification then uses that store; tls.createSecureContext() returns a context that owns its SSL_CTX exclusively so a CA appended to one cannot affect another (the internal connect/listen paths keep the per-digest cache).
  • tls.setDefaultCACertificates() applies on every secure-context construction path (plain tls.connect(), addContext, setSecureContext), not just the public createSecureContext().
  • The SNICallback and ALPNCallback server dispatches, resolved per connection with Node's semantics; tlsSocket.setKeyCert(); ERR_TLS_ALPN_CALLBACK_WITH_PROTOCOLS.
  • The OpenSSL cipher-list selector grammar (PSK+HIGH, !aNULL, @SECLEVEL, …) is accepted/rejected the way BoringSSL evaluates it; a mixed EC/RSA multi-identity configuration is rejected with Node's decomposed KEY_TYPE_MISMATCH.
  • The pfx option: PKCS#12 blobs (single or array, per-entry passphrases) are parsed into key/cert with Node's error messages; CAs embedded in the bundle extend the trust set (they are not treated as an explicit ca replacement).
  • Server-side TLSSocket wrapping of an accepted socket and tls.connect({ socket: duplex }) over a generic Duplex, including connecting parity and synchronous teardown of the wrapped duplex on destroy.
  • Memory safety: a deferred-close protocol so a destroy issued from inside an SNI/ALPN/keylog callback cannot free the SSL out from under BoringSSL; the per-loop BIO state is snapshotted/restored across in-handshake JS so cross-socket I/O cannot misroute the in-flight handshake; the GC-rooting of the detailed peer-certificate chain.
  • net perf_hooks observer, options.handle, pause()/unref() accounting, SocketAddress/BlockList parity, the autoSelectFamily flag plumbing.

Known limitations / follow-ups

  • An asynchronous SNICallback now suspends the handshake (BoringSSL select-certificate retry) until its callback resolves - the upstream test-tls-sni-option.js passes; synchronous callbacks behave as before.
  • Two linked follow-ups around mid-handshake teardown: (1) a server connection raw-closed (RST) while its handshake is suspended leaves the JS connection count stale until the socket is GC'd - server.close() waits longer than it should in that edge case; (2) fixing it by dispatching the handshake failure from the close path requires first aligning tlsHandshakeError's no-listener behavior with Node's silent-destroy semantics, otherwise routine client-initiated mid-handshake teardowns (h2 connection management) surface as spurious ECONNRESET errors.
  • An SNICallback that reports an error, returns something that is not a SecureContext, or throws aborts the handshake (synchronously or asynchronously): the connection is dropped without a TLS alert and the server emits 'tlsClientError' with the callback's error, matching Node.
  • setKeyCert() from inside ALPNCallback is too late under BoringSSL's TLS 1.3 (the credential is already chosen); calling it from SNICallback works.
  • tls.DEFAULT_MIN/MAX_VERSION are now honored at context-construction time (assignment through the module exports works the way Node's does), and a TLS socket that ends first keeps reading the peer's in-flight data - BoringSSL has no TLS half-close, so end() defers the close_notify (flushing pending session tickets first) and half-closes at the TCP level instead.
  • test-net-perf_hooks.js is intermittently divergent on Ubuntu 25.04 (dual-stack localhost resolution).
  • A response written to a peer that has already gone away used to be retried as would-block on Linux, hanging the FIN-terminated-response teardown tests there; node:net writes now go through an opt-in fatal-send-error path (us_socket_write_check_error) that fails the pending write and closes the socket instead.
  • The fetch/h2 suites on Windows still see teardown resets surfaced between tests; under investigation alongside the write-error work.
  • An http2 session with no 'error' listener swallows ECONNRESET transport teardown noise (destroying the session quietly) instead of crashing the process the way Node's EventEmitter contract would; all other unobserved errors still surface.

Fixes #28638
Fixes #28641
Fixes #26418
Fixes #20642


Origin (consolidated from #31148)

What

Ports 11 net/tls tests from the Node.js test suite into test/js/node/test/parallel/ (verbatim) and fixes the runtime divergences they surfaced.

Tests added (verbatim from upstream Node)

net: test-net-pause-resume-connecting, test-net-server-keepalive, test-net-server-nodelay, test-net-socket-setnodelay, test-net-connect-memleak
tls: test-tls-basic-validations, test-tls-buffersize, test-tls-client-reject, test-tls-net-socket-keepalive, test-tls-secure-session, test-tls-connect-memleak

Fixes

net.Socket / net.Server

  • Accepted server sockets inherit the server's keepAlive, noDelay, allowHalfOpen and highWaterMark; the Socket constructor honors options.handle.
  • The native socket's half-open flag follows the Duplex's allowHalfOpen (default false closes on peer FIN so 'close' fires; true keeps the writable side open), matching Node/libuv.
  • pause() is honored while a socket is still connecting.
  • setNoDelay() forwards to the handle; Server coerces noDelay with Boolean() like keepAlive.
  • _write defers its callback to the next tick only for TLS sockets (the SSL engine batches, so bufferSize/writableLength reflect the queued bytes); plain TCP completes synchronously so a tight write() loop backpressures at the kernel rather than the JS highWaterMark.

tls

  • validateSecureContextOptions (ciphers / passphrase / ecdhCurve / min-max version / ticketKeys / sessionTimeout) and a simplified convertALPNProtocols.
  • TLS clients emit the 'session' event.

Errors

  • validateBuffer reports "must be an instance of Buffer, TypedArray, or DataView" to match Node. Two existing tests that asserted the old wording are re-synced to upstream.

Comments that mirror Node behavior link the corresponding upstream source.

Testing

Validated locally with the debug build before pushing:

  • All 11 ported tests pass 20× with no flakes and are byte-identical to upstream (flaky candidates were dropped).
  • Full test-{net,http,tls}-* parallel sweep: 0 failures.
  • test/js/node/http/node-http.test.ts (incl. HTTP server security tests): 78 pass, 0 fail.

Notes

The strictly Node-correct half-close _final (shutdown() rather than $end()) is left for a follow-up: it exposes a uWS TLS-handshake edge case where a successful handshake immediately followed by a close_notify is reported as ECONNRESET. Until that's addressed, _final uses $end().


This PR consolidates the original test-porting branch (claude/port-node-net-tls-tests, #31148) and the follow-up branch (claude/port-node-net-tls-tests-2); both branches now point to the same squashed content.

@robobun

robobun commented May 21, 2026

Copy link
Copy Markdown
Collaborator
Updated 3:31 PM PT - Jun 10th, 2026

@cirospaciari, your commit ef5a776 has 1 failures in Build #61822 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 31155

That installs a local version of the PR into your bun-31155 executable, so you can run:

bun-31155 --bun

@github-actions

Copy link
Copy Markdown
Contributor

Found 4 issues this PR may fix:

  1. Error response not sent to client when request body read fails due to client abort #28638 - socket.end() after partial POST body now half-closes (FIN) instead of full-closing, so the server's error response can still be read by the client
  2. http.createServer destroys socket instead of sending 431 response #28641 - socket.end("431...") in clientError handler now sends FIN after writing instead of immediately destroying, letting the error response reach the client
  3. net.createServer socket data events not emitted when used with ssh2 forwardOut #26418 - Half-close keeps the readable side open so bidirectional piped data flows correctly instead of being killed by a premature full close
  4. Cluster module freezes on worker disconnect when using unshared TCP servers #20642 - Half-close allows the remote end to see FIN and emit end, unblocking the disconnect() call that was never reached

If this is helpful, copy the block below into the PR description to auto-close these issues on merge.

Fixes #28638
Fixes #28641
Fixes #26418
Fixes #20642

🤖 Generated with Claude Code

Comment thread src/js/node/tls.ts
@cirospaciari cirospaciari force-pushed the claude/port-node-net-tls-tests-2 branch from 9302345 to f36f956 Compare May 21, 2026 02:39
Comment thread src/js/node/net.ts
Comment thread test/js/node/test/parallel/test-net-pingpong.js
Comment thread src/js/node/tls.ts Outdated
Comment thread src/js/node/tls.ts Outdated
Comment thread src/http/ssl_config.rs
Comment thread src/js/node/tls.ts Outdated
Comment thread src/js/node/net.ts Outdated
Comment thread src/js/node/net.ts Outdated
@cirospaciari cirospaciari changed the title net,tls: half-close Socket.end() (FIN) instead of a full close net,tls: Node.js compatibility — TLS protocol versions, listen/reset/connect fixes, +30 tests May 21, 2026
Comment thread test/js/node/test/parallel/test-net-server-listen-path.js
Comment thread test/js/node/test/parallel/test-tls-client-reject-12.js

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional findings (outside current diff — PR may have been updated during review):

  • 🟡 src/js/node/tls.ts:671-673 — Node only validates SNICallback for server-side TLSSockets (the validateFunction call in _tls_wrap.js is inside the if (options.isServer) block — SNI callbacks are inherently server-only), but this check runs unconditionally, so tls.connect({ ..., SNICallback: {} }) or new tls.TLSSocket(sock, { SNICallback: 42 }) throws ERR_INVALID_ARG_TYPE in Bun while Node silently ignores it. One-token fix since this.isServer is set just above: if (this.isServer && options.SNICallback != null). Very narrow trigger (junk server-only option on a client socket), so just a nit.

    Extended reasoning...

    What the bug is

    The TLSSocket constructor (src/js/node/tls.ts:671-673) now does:

    this.isServer = !!options.isServer;
    
    if (options.SNICallback != null) {
      validateFunction(options.SNICallback, "options.SNICallback");
    }

    This validates SNICallback regardless of whether the socket is server- or client-side. In Node.js, the equivalent check in TLSSocket.prototype._init (lib/_tls_wrap.js / lib/internal/tls/wrap.js) lives inside the if (options.isServer && options.SNICallback && ...) block — SNI callbacks are conceptually server-only (they let a server pick a certificate based on the client's SNI extension), so Node only type-checks the option when wrapping a server socket and silently ignores it on clients.

    Code path that triggers it

    const tls = require('tls');
    const net = require('net');
    // Either of these throws ERR_INVALID_ARG_TYPE in Bun, no-op in Node:
    new tls.TLSSocket(new net.Socket(), { SNICallback: {} });
    tls.connect({ port: 443, host: 'example.com', SNICallback: 42 });

    For tls.connect, the options object flows into new TLSSocket(options) with no isServer key, so this.isServer is false, but the unconditional check on the next line still runs validateFunction({}, ...) and throws.

    Why existing tests don't catch it

    The newly-ported test-tls-snicallback-error.js (verbatim from upstream) only exercises tls.createServer({ SNICallback }) and new tls.TLSSocket(sock, { isServer: true, SNICallback }) — both server-side — which is itself evidence that Node only validates on the server path. The Server constructor in this PR does its own SNICallback check (tls.ts:911-913), so the test passes without ever reaching the over-broad TLSSocket check on a client socket.

    Step-by-step proof

    Step Node.js Bun (this PR)
    new tls.TLSSocket(sock, { SNICallback: 42 }) (no isServer) _init: options.isServer is falsy → SNICallback branch skipped → no throw ctor: this.isServer = false; next line: options.SNICallback != nullvalidateFunction(42, ...)throws ERR_INVALID_ARG_TYPE
    new tls.TLSSocket(sock, { isServer: true, SNICallback: 42 }) enters isServer branch → validateFunction(42, ...) → throws validateFunction(42, ...) → throws ✅ (matches)

    Impact

    Minor Node-compat divergence in new code from this PR. The trigger is implausible in real code — SNICallback is documented as a server-only option, so passing a non-function value to a client socket is essentially garbage input; Bun is just stricter than Node here. No correct program is affected. The inline comment says "matches Node's TLSSocket init" but it's slightly stricter than Node. Filed as a nit since the fix is one token and the check is being added specifically for Node compat.

    Fix

    Gate on this.isServer, which is set on the line immediately above:

    if (this.isServer && options.SNICallback != null) {
      validateFunction(options.SNICallback, "options.SNICallback");
    }

@cirospaciari cirospaciari force-pushed the claude/port-node-net-tls-tests-2 branch from 04333c4 to 0a25656 Compare May 21, 2026 18:05
Comment thread src/js/node/net.ts
Comment thread src/js/node/tls.ts
Comment thread src/http/ssl_config.rs
Comment thread test/js/node/test/parallel/test-net-pingpong.js
Comment thread src/js/node/net.ts
Comment thread src/js/node/tls.ts
Comment thread src/js/node/tls.ts Outdated
Comment thread src/js/node/net.ts Outdated
@cirospaciari cirospaciari changed the title net,tls: Node.js compatibility — TLS protocol versions, listen/reset/connect fixes, +30 tests net,tls: Node.js compatibility — TLS versions, setDefaultCACertificates, listen/reset/connect fixes, +42 tests May 21, 2026
@cirospaciari cirospaciari requested a review from alii as a code owner May 21, 2026 20:13

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional findings (outside current diff — PR may have been updated during review):

  • 🟡 src/js/node/tls.ts:651-653tls.setDefaultCACertificates([]) makes getCACertificates('default') return [] correctly, but actual TLS connections without an explicit ca still trust the full bundled root store — the injected ca: [] collapses to ca: undefined at the native layer (handle_file_array() returns None for empty arrays, SSLConfig.rs:338-340), so the per-SSL client attach at openssl.c:677-681 falls back to us_get_shared_default_ca_store(). In Node an empty array clears the defaults so such connections fail verification; here the user requests zero-trust and silently gets full-trust. Worth listing as a known limitation alongside the other deferred setDefaultCACertificates items (a real fix needs a native 'empty store' sentinel); the ported -precedence-empty.js/-basic.js tests don't exercise this path.

    Extended reasoning...

    What the bug is

    The new tls.setDefaultCACertificates() is implemented as a JS-side override: _defaultCACertificatesOverride is stored, getCACertificates('default') returns it, and createSecureContext() injects it as options.ca when no explicit ca is given (tls.ts:651-653). For a non-empty override this works (modulo Bun's pre-existing additive-CA semantics). For an empty override it does not: ca: [] is silently dropped on the way into the native SSL_CTX, so the connection ends up using the full bundled Mozilla root store + NODE_EXTRA_CA_CERTS, exactly as if setDefaultCACertificates had never been called. Node's documented behavior is that setDefaultCACertificates([]) clears the defaults so subsequent connections without their own ca fail certificate verification.

    Code path

    1. tls.setDefaultCACertificates([]) → the validation loop runs zero times → _defaultCACertificatesOverride = [].
    2. tls.connect({...}) (no ca) → TLSSocket ctor → createSecureContext(options). _defaultCACertificatesOverride !== undefined && options.ca == null is true → options = { ...options, ca: [] }.
    3. newNativeSecureContextNativeSecureContext.intern → bindgen → SSLConfig::from_jshandle_file for cahandle_file_array():
      // SSLConfig.rs:338-340
      if elements.is_empty() {
          return Ok(None);
      }
      So result.ca = Noneidentical to ca: undefined.
    4. as_usockets()ctx_opts.ca = NULL, ca_count = 0.
    5. us_ssl_ctx_build_raw (openssl.c:543): options.ca && options.ca_count > 0 is false. requestCert is set on the connect-time tls object (net.ts:1006), not on the createSecureContext options that built this SSL_CTX, so options.request_cert (openssl.c:561) is also false. All three branches skipped → SSL_CTX has verify_mode == SSL_VERIFY_NONE and no cert store configured.
    6. us_internal_ssl_attach (openssl.c:677-681), client path:
      if (SSL_CTX_get_verify_mode(ctx) == SSL_VERIFY_NONE) {
        SSL_set_verify(ssl, SSL_VERIFY_PEER, us_verify_callback);
        X509_STORE *roots = us_get_shared_default_ca_store();
        if (roots) SSL_set0_verify_cert_store(ssl, roots);
      }
      The per-socket override installs the shared bundled-roots store (Mozilla bundle + NODE_EXTRA_CA_CERTS, see root_certs.cpp). The connection trusts everything Bun trusts by default.

    Why the ported tests don't catch it

    • test-tls-set-default-ca-certificates-precedence-empty.js sets defaults to [] but then passes a per-connection ca: [fakeStartcomCert], which makes options.ca == null false at tls.ts:651 — the override-injection branch is never taken; the test only proves per-connection ca still works.
    • test-tls-set-default-ca-certificates-basic.js / -array-buffer.js only round-trip through getCACertificates('default'), which reads the JS-side _defaultCACertificatesOverride array directly and never builds a native context or opens a connection.

    No test in this PR connects without an explicit ca after setDefaultCACertificates([]) and asserts a verification failure.

    Step-by-step proof

    const tls = require('tls');
    tls.setDefaultCACertificates([]);
    console.log(tls.getCACertificates('default')); // [] ✓ — JS surface correct
    tls.connect(443, 'example.com', { servername: 'example.com' }, () => {
      console.log('connected, authorized =', this.authorized);
    });
    Step Node.js Bun (this PR)
    getCACertificates('default') [] []
    native trust store for the connection empty X509_STORE us_get_shared_default_ca_store() (full bundle) ❌
    handshake against a public-CA-signed server fails verification (UNABLE_TO_GET_ISSUER_CERT etc.) succeeds, authorized === true

    The SSL_CTX content-hash digest for {ca: []} and {ca: undefined} is also identical (both feed ca = None into content_hash()), so the interned context is literally the same object as the no-CA default.

    Impact / why nit

    The failure mode is silently security-relevant — a user who calls setDefaultCACertificates([]) is explicitly asking for zero default trust and gets full default trust instead, with no error or warning. That said:

    • setDefaultCACertificates is brand-new in this PR (not a regression).
    • The implementation comment at tls.ts:1285-1287 already acknowledges "Bun has no equivalent native store override, so keep a JS-side override" — the JS-side ca-injection approach inherently can't express "empty store" because of pre-existing native behavior (handle_file_array's empty→None and the per-SSL client fallback at openssl.c:677-681 are both unchanged by this PR).
    • The non-empty override has a related pre-existing limitation: openssl.c:547 starts from us_get_default_ca_store() even when ca is supplied, so setDefaultCACertificates([myCA]) trusts bundled+myCA rather than only myCA. The empty case is the starkest instance of a broader "override doesn't restrict trust" gap.
    • A proper fix requires native changes (sentinel for "install an empty X509_STORE", or removing the per-socket shared-store fallback when ca was explicitly empty) that are out of scope for this compat PR.

    Filed as nit: worth flagging in the PR description's deferred-items list and/or adding a TODO at the override-injection site, since the PR description currently says "setDefaultCACertificates() implemented" without this caveat.

    Fix direction

    Either:

    1. Special-case the empty override in createSecureContext to pass a sentinel that the native layer interprets as "install an empty X509_STORE and set VERIFY_PEER" (so openssl.c:677 doesn't fall back to the shared store), or
    2. Thread an explicit "override default roots" flag through to us_internal_ssl_attach so it skips us_get_shared_default_ca_store() when an override (even empty) is active.

    Both require native changes; until then, documenting it as a known limitation is the honest option.

  • 🟡 src/js/node/net.ts:2403-2407 — The new comment says "a valid port takes precedence over path", but the implementation never clears path after validating port — so listen({ port: 0, path: '/tmp/sock' }) still falls into kRealListen's if (path) branch (net.ts:2543) and listens on the unix socket, ignoring port. The path-wins behavior is pre-existing (not a regression), but since this block is rewritten under "Match Node's listen() option normalization" and the ported test-net-server-listen-options.js only exercises { port: -1, path } (which throws in validatePort before reaching kRealListen), it'd be worth either adding path = undefined; after port = port | 0 to actually match Node, or dropping the precedence clause from the comment.

    Extended reasoning...

    What the issue is

    The rewritten options-object branch of Server.prototype.listen() adds, at net.ts:2403-2407:

    if (typeof port === "number" || typeof port === "string") {
      // validatePort coerces "0" -> 0 and throws ERR_SOCKET_BAD_PORT for
      // out-of-range/non-numeric values; a valid port takes precedence over path.
      validatePort(port, "options.port");
      port = port | 0;
    } else if (isPipeName(path)) {

    The comment (and the block's header at line 2397, "Match Node's listen() option normalization") states that a valid port takes precedence over path. In Node that is true — lib/net.js returns from the port branch (calling listenInCluster with no pipe) before ever reaching the path branch. In Bun, however, path is read at line 2379 and never cleared in the port branch. Both port and path are then passed through listenInCluster(..., path, ...) (line 2515) to kRealListen, which checks if (path) first (line 2543) and calls Bun.listen({ unix: path }), ignoring port entirely. So in practice path takes precedence over port, the opposite of what the new comment claims.

    Step-by-step proof

    For net.createServer().listen({ port: 0, path: '/tmp/sock' }):

    Step Line State
    1 2379 path = '/tmp/sock'
    2 2380 port = 0
    3 2399 port === undefined? no; port === null? no → skip
    4 2403 typeof port === 'number'true, enter port branch
    5 2406 validatePort(0) → ok
    6 2407 port = 0 | 00; path still '/tmp/sock'
    7 2515 listenInCluster(..., port=0, ..., path='/tmp/sock', ...)
    8 2543 kRealListen: if (path)trueBun.listen({ unix: '/tmp/sock' })

    Result: listens on the unix socket. In Node the same call listens on TCP port 0 (random port).

    Why existing tests don't catch it

    The newly-ported test-net-server-listen-options.js does include a port-vs-path precedence assertion:

    // In listen(options, cb), port takes precedence over path
    assert.throws(() => {
      net.createServer().listen({ port: -1, path: common.PIPE }, common.mustNotCall());
    }, assertPort());

    But port: -1 throws ERR_SOCKET_BAD_PORT inside validatePort at line 2406, before kRealListen is reached — so it only proves that an invalid port is validated ahead of path, not that a valid port wins at listen time. There is no test for { port: <valid>, path: <valid> }.

    Why this is a nit, not a regression

    The underlying path-wins behavior predates this PR. The old code's if (!Number.isSafeInteger(port) || port < 0) guard was also false for port = 0, leaving path set, and the same kRealListen if (path) check applied. So no user-observable behavior changes here. What is new is:

    • The comment at line 2405 explicitly claiming Node's precedence, which the implementation doesn't honor.
    • The block header at line 2397 ("Match Node's listen() option normalization") implying parity that isn't there for this case.
    • The pre-existing doc-comment at line ~2442 ("path Will be ignored if port is specified") was already wrong, but this PR adds a second wrong comment rather than fixing either.

    The trigger is also narrow — passing both port and path to listen() is unusual.

    Fix

    One line in the port branch:

    if (typeof port === "number" || typeof port === "string") {
      validatePort(port, "options.port");
      port = port | 0;
      path = undefined;   // port takes precedence over path (Node returns early here)
    } else if (isPipeName(path)) {

    Or, if matching Node's precedence is out of scope, drop "a valid port takes precedence over path" from the comment so it doesn't misdocument the implementation.

@cirospaciari cirospaciari force-pushed the claude/port-node-net-tls-tests branch from 5ca55ec to f49ffa0 Compare May 21, 2026 21:00

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional findings (outside current diff — PR may have been updated during review):

  • 🟣 src/bun_core/lib.rs:3347-3352 — Pre-existing (already on main via #31116, not introduced by this PR): the musl/Android fallback Bun__captureStackTrace at line 3339 is still pub extern "C" fn (safe) while the glibc/macOS/BSD and Windows definitions are pub unsafe extern "C" fn, so the shared wrapper's unsafe { Bun__captureStackTrace(...) } here triggers the unused_unsafe warning on aarch64-musl / x64-musl / x64-android / aarch64-android / x64-musl-baseline (the robobun annotations at bun_core/lib.rs:3352 in build #56711). It's warn-level only (unused_unsafe is not in the workspace deny list), so it doesn't block the build — just noting since it shows up in this PR's CI output; the one-keyword fix is to mark the fallback unsafe too (with a # Safety doc) so all three cfg variants match.

    Extended reasoning...

    What the issue is

    Bun__captureStackTrace has three cfg-gated definitions in src/bun_core/lib.rs:

    cfg line signature
    glibc / macOS / *BSD ~3233 pub unsafe extern "C" fn
    Windows ~3285 pub unsafe extern "C" fn
    fallback (musl, Android, …) 3339 pub extern "C" fnnot unsafe

    The shared safe wrapper at line 3349-3352 unconditionally wraps the call:

    pub fn capture_stack_trace(begin: usize, addrs: &mut [usize]) -> usize {
        // SAFETY: `addrs.as_mut_ptr()` is writable for `addrs.len()` slots; ...
        unsafe { Bun__captureStackTrace(begin, addrs.as_mut_ptr(), addrs.len()) }
    }

    On targets where the fallback definition is selected (musl-libc Linux, Android — i.e. anything not in the #[cfg(any(...))] list above), Bun__captureStackTrace is a safe function, so wrapping a call to it in unsafe { } triggers rustc's unused_unsafe lint at line 3352.

    Why this is pre-existing, not PR-introduced

    Although the unsafe-marking changes appear in this PR's GitHub diff, that's an artifact of the three-dot diff being computed against a stale merge-base. git log -S shows the unsafe markers + wrapper unsafe { } were introduced in commit 21db6826 ("clippy: 45 deny lints + fix 2735 violations across workspace (#31116)"), and git merge-base --is-ancestor 21db6826 <origin/main> confirms that commit is already on main. A two-dot git diff main..HEAD -- src/bun_core/lib.rs shows no change to Bun__captureStackTrace or capture_stack_trace from this PR. The same warning fires on main builds; robobun's parseAnnotations (scripts/utils.mjs:2662) scrapes both error: and warning: lines from build output regardless of provenance, so it appears in build #56711's annotation list alongside other ambient noise like clang++: argument unused during compilation: '-no-pie'.

    Why it's a warning, not a build failure

    unused_unsafe is warn-by-default in rustc and is not in the PR's [workspace.lints.rust] deny list (which only promotes dead_code, unused_imports, unused_variables, unused_mut, unused_assignments, unused_macros, unreachable_code, unreachable_patterns). There is no #![deny(unused_unsafe)] in bun_core/lib.rs and no -D warnings in scripts/build/rust.ts. So this surfaces as a compiler warning, not a hard error — robobun's "5 failures" count for build #56711 doesn't include it (the actual failures are the three build-cpp musl entries plus the FreeBSD unreachable_pub items).

    Step-by-step proof

    1. Compile bun_core with --target x86_64-unknown-linux-musl (or aarch64-linux-android, etc.).
    2. cfg resolution: target_os = "linux" but target_env = "musl" (not "gnu"), and not macOS/BSD/Windows → the #[cfg(not(any(...)))] fallback at line 3339 is selected.
    3. The fallback is pub extern "C" fn Bun__captureStackTrace(begin: usize, out: *mut usize, cap: usize) -> usize { let _ = (begin, out, cap); 0 } — no unsafe keyword, so it's a safe function (it never dereferences out, so this is technically correct).
    4. capture_stack_trace at line 3352 evaluates unsafe { Bun__captureStackTrace(...) }. The callee is safe → the unsafe block contains no unsafe operations.
    5. rustc emits warning: unnecessary \unsafe` block#[warn(unused_unsafe)]` (default level).
    6. robobun build #56711 lists exactly this: src/bun_core/lib.rs#L3352 - unnecessary \unsafe` blockon 🐧 aarch64-musl, x64-musl, x64-android, aarch64-android, x64-musl-baseline — the precise set of targets that match the fallback's#[cfg(not(...))]` gate.

    Addressing the refutation

    One reviewer argued for refuting on three grounds, two of which I agree with and have folded into the framing above: (a) "warning ≠ build failure" — correct; the original "fails build-rust" claim was overstated, and this comment is filed accordingly. (b) "not from this PR" — correct; verified via git merge-base --is-ancestor and the empty two-dot diff, hence filed as pre-existing. (c) "the fallback being safe is arguably correct" — also true (it ignores out entirely), but that just shifts where the fix goes: either mark the fallback unsafe for signature consistency, or #[allow(unused_unsafe)] the wrapper with a note that one cfg variant is safe. Either resolves the warning; the former keeps the three definitions interchangeable.

    Impact

    None functionally — a warn-level lint on five non-tier-1 targets. Flagged only because it shows up in this PR's CI annotations and is a one-keyword fix; entirely fine to defer to whichever PR next touches the unused_unsafe family in bun_core.

    Fix

    /// # Safety
    /// `out` must be writable for `cap` `usize` slots (or null/`cap == 0`).
    /// This fallback never dereferences `out`; `unsafe` is for signature parity
    /// with the other cfg-gated definitions so the shared wrapper's `unsafe { }`
    /// is not flagged `unused_unsafe` on musl/Android.
    #[unsafe(no_mangle)]
    pub unsafe extern "C" fn Bun__captureStackTrace(begin: usize, out: *mut usize, cap: usize) -> usize {
        let _ = (begin, out, cap);
        0
    }

Comment thread packages/bun-usockets/src/crypto/openssl.c
Comment thread packages/bun-usockets/src/libusockets.h
Comment thread src/js/node/net.ts Outdated
Comment thread src/js/node/tls.ts Outdated
Comment thread src/js/node/tls.ts
@cirospaciari cirospaciari changed the title net,tls: Node.js compatibility — TLS versions, setDefaultCACertificates, listen/reset/connect fixes, +42 tests net,tls: Node.js compatibility — reset/read-error semantics, graceful HTTP/WS client closes, TLS versions, setDefaultCACertificates, +48 tests May 21, 2026
@cirospaciari cirospaciari changed the title net,tls: Node.js compatibility — reset/read-error semantics, graceful HTTP/WS client closes, TLS versions, setDefaultCACertificates, +48 tests net,tls: Node.js compatibility — half-open/reset semantics, graceful HTTP/WS client closes, TLS versions, setDefaultCACertificates, +49 tests May 22, 2026
@cirospaciari cirospaciari changed the title net,tls: Node.js compatibility — half-open/reset semantics, graceful HTTP/WS client closes, TLS versions, setDefaultCACertificates, +49 tests net,tls: Node.js compatibility — half-open/reset/write semantics, graceful HTTP/WS client closes, TLS versions, setDefaultCACertificates, +50 tests May 22, 2026
@cirospaciari cirospaciari changed the title net,tls: Node.js compatibility — half-open/reset/write semantics, graceful HTTP/WS client closes, TLS versions, setDefaultCACertificates, +50 tests net,tls: Node.js compatibility — half-open/reset/write semantics, real connect errors, graceful HTTP/WS client closes, +51 tests May 22, 2026
Comment thread packages/bun-usockets/src/crypto/openssl.c Outdated
Comment thread src/js/node/net.ts
Comment thread src/js/node/net.ts
Comment thread src/runtime/socket/Listener.rs Outdated
Comment thread packages/bun-usockets/src/crypto/openssl.c
Comment thread packages/bun-usockets/src/crypto/openssl.c Outdated
Comment thread packages/bun-usockets/src/crypto/openssl.c Outdated
Comment thread test/js/node/tls/node-tls-server.test.ts
Comment thread src/runtime/socket/Listener.rs Outdated
Comment thread packages/bun-usockets/src/crypto/openssl.c
Comment thread src/runtime/socket/Listener.rs
@cirospaciari cirospaciari force-pushed the claude/port-node-net-tls-tests-2 branch from b4b0da6 to 3598acd Compare June 2, 2026 00:44
Comment thread packages/bun-usockets/src/crypto/openssl.c Outdated
Comment thread src/runtime/socket/socket_body.rs
@cirospaciari cirospaciari force-pushed the claude/port-node-net-tls-tests-2 branch 2 times, most recently from 55c6bc7 to e5129a0 Compare June 4, 2026 23:27
@cirospaciari cirospaciari force-pushed the claude/port-node-net-tls-tests-2 branch from e5129a0 to a23105d Compare June 5, 2026 16:50
Comment thread src/js/node/net.ts
Comment on lines +638 to +641
const err = new TypeError(
`ALPN callback returned a value (${result}) that did not match any of the client's offered protocols (${ArrayPrototypeJoin.$call(protocols, ", ")})`,
) as TypeError & { code?: string };
err.code = "ERR_TLS_ALPN_CALLBACK_INVALID_RESULT";

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 🟡 The new alpnCallback handler hand-rolls new TypeError(...) with .code = 'ERR_TLS_ALPN_CALLBACK_INVALID_RESULT' instead of routing through the centralized $ERR_* machinery — the code is not registered in src/jsc/bindings/ErrorCode.ts, while its sibling ERR_TLS_ALPN_CALLBACK_WITH_PROTOCOLS (used via $ERR_TLS_ALPN_CALLBACK_WITH_PROTOCOLS() at tls.ts:889/1221 in this same PR) IS registered. CLAUDE.md:292 explicitly says "never inline new Error with a hand-assigned .code". Same applies to the (err as any).code = 'ERR_OUT_OF_RANGE' added in convertProtocols at tls.ts:1556. Zero runtime impact (the error reaches user code with the right shape), but worth registering for consistency with the sibling code in the same feature.

Extended reasoning...

What the issue is

The new ServerHandlers.alpnCallback handler at src/js/node/net.ts:638-641 constructs the invalid-result error by hand:

const err = new TypeError(
  `ALPN callback returned a value (${result}) that did not match any of the client's offered protocols (...)`,
) as TypeError & { code?: string };
err.code = "ERR_TLS_ALPN_CALLBACK_INVALID_RESULT";

…instead of routing through the centralized $ERR_* machinery. A grep of src/ confirms ERR_TLS_ALPN_CALLBACK_INVALID_RESULT appears only in net.ts — it is not registered in src/jsc/bindings/ErrorCode.ts, not in ErrorCode.cpp, not in ErrorCode.rs, and there is no $ERR_TLS_ALPN_CALLBACK_INVALID_RESULT declaration in builtins.d.ts.

CLAUDE.md:292 states the convention explicitly: "Route user-facing JS errors through the centralized ErrorCode machinery (src/jsc/bindings/ErrorCode.ts, $ERR_*) — never inline new Error with a hand-assigned .code."

The asymmetry within this same feature

The sibling error code in the same ALPNCallback feature — ERR_TLS_ALPN_CALLBACK_WITH_PROTOCOLSis registered (ErrorCode.ts:263, with a dedicated case in ErrorCode.cpp:2510-2511) and is used via $ERR_TLS_ALPN_CALLBACK_WITH_PROTOCOLS() in this same PR at tls.ts:889 and tls.ts:1221. So one of the two ALPN error codes this PR adds goes through the centralized machinery and the other is hand-rolled, which is the inconsistency worth flagging.

The same pattern applies to the convertProtocols change at tls.ts:1554-1557 (also added by this PR):

const err = new RangeError(`The byte length of the protocol at index ${i} exceeds the maximum length. ...`);
(err as any).code = "ERR_OUT_OF_RANGE";
throw err;

ERR_OUT_OF_RANGE is already registered, but the existing $ERR_OUT_OF_RANGE helper formats its message as The value of "{name}" is out of range. It must be {range}. Received {value}, which doesn't quite match Node's specific RangeError text here (The byte length of the protocol at index N exceeds the maximum length...). So this secondary case is weaker — the hand-roll may be intentional to preserve Node's exact message — and is mentioned only for completeness.

Why this is a nit, not a bug

There is zero observable runtime difference: the hand-rolled error has the correct constructor (TypeError), the correct .name ('TypeError'), the correct .code string, and the correct Node-matching message. err instanceof TypeError is true. Anything that checks err.code === 'ERR_TLS_ALPN_CALLBACK_INVALID_RESULT' works identically. The centralized machinery would produce the same shape; the only difference is registry completeness and codebase consistency.

The codebase already has ~16 hand-rolled .code assignments across src/js/node (some justified for dynamic codes like ERR_OSSL_*/ERR_SSL_* that this PR also adds and which legitimately can't be pre-registered), so this is not unprecedented — but the author has fixed this exact class of nit three times already in this PR's review cycle (resolved comments 3278460562 for ERR_CRYPTO_CUSTOM_ENGINE_NOT_SUPPORTED, and 3278566864 → 3283092833 for ERR_TLS_INVALID_PROTOCOL_METHOD), demonstrating it's actionable here too.

Step-by-step proof

  1. tls.createServer({ ALPNCallback: () => 'h3' }) accepts a connection whose ClientHello offers ALPN ['h2', 'http/1.1'].
  2. ServerHandlers.alpnCallback runs the user's callback → returns 'h3'.
  3. !ArrayPrototypeIncludes.$call(['h2','http/1.1'], 'h3')true → enters the branch at net.ts:634.
  4. new TypeError(...) constructed; err.code = 'ERR_TLS_ALPN_CALLBACK_INVALID_RESULT' assigned.
  5. Stashed on self[kALPNError]; handler returns undefined → handshake refused.
  6. ServerHandlers.handshake reads self[kALPNError], emits it as 'tlsClientError'.
  7. The user's listener receives { name: 'TypeError', code: 'ERR_TLS_ALPN_CALLBACK_INVALID_RESULT', message: 'ALPN callback returned a value (h3) ...' }identical to what $ERR_TLS_ALPN_CALLBACK_INVALID_RESULT(result, protocols) would produce if registered.

Fix

Register the code in src/jsc/bindings/ErrorCode.ts alongside its sibling:

["ERR_TLS_ALPN_CALLBACK_INVALID_RESULT", TypeError],

…add a dedicated case in ErrorCode.cpp's jsFunctionMakeErrorWithCode that formats the message from two arguments (the returned value and the protocols list), regenerate ErrorCode.rs/builtins.d.ts, and replace lines 638-641 with if (self) self[kALPNError] = $ERR_TLS_ALPN_CALLBACK_INVALID_RESULT(result, ArrayPrototypeJoin.$call(protocols, ', '));. The convertProtocols ERR_OUT_OF_RANGE case can stay as-is if matching Node's exact message wording is preferred over routing through $ERR_OUT_OF_RANGE.

Comment thread src/js/node/tls.ts
Comment on lines +1263 to +1264
validateSecureContextOptions(options);
options = processPfxOptions(options);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 🟡 Adding validateSecureContextOptions(options) at line 1263 makes the pre-existing if (passphrase && typeof passphrase !== "string") throw ... check at lines 1347-1349 dead code: validateSecureContextOptions already does validateString(passphrase, "options.passphrase") for any non-nullish value, and processPfxOptions does not modify options.passphrase. Same flavor of refactor leftover as the dead validateInt32 import / dead early secureContext assignment already removed (resolved 3295620217). Nit — the second check can simply be deleted.

Extended reasoning...

What the issue is

Adding validateSecureContextOptions(options) at the top of Server.prototype.setSecureContext (tls.ts:1263) makes the pre-existing passphrase type check at tls.ts:1347-1349 dead code:

let passphrase = options.passphrase;
if (passphrase && typeof passphrase !== "string") {
  throw $ERR_INVALID_ARG_TYPE("options.passphrase", "string", passphrase);
}

validateSecureContextOptions (tls.ts:~367, also new in this PR) already validates passphrase:

if (passphrase !== undefined && passphrase !== null) validateString(passphrase, "options.passphrase");

validateString throws ERR_INVALID_ARG_TYPE for any non-string value. So by the time execution reaches line 1346, options.passphrase is guaranteed to be undefined, null, or a string — all of which make passphrase && typeof passphrase !== "string" evaluate to false.

Why nothing between the two reaches it

processPfxOptions(options) at line 1264 (the only call between the two) reads out.passphrase as a fallback for per-entry pfx passphrases but never reassigns it — verified by grep for out\.passphrase\s*= (no matches). So options.passphrase is unchanged from the value validateSecureContextOptions already approved.

Step-by-step proof

For tls.createServer({ passphrase: 42 }).setSecureContext({ passphrase: 42 }):

Step Result
Line 1263: validateSecureContextOptions(options) passphrase = 42, neither undefined nor null → validateString(42, "options.passphrase")throws ERR_INVALID_ARG_TYPE
Line 1347 never reached

For { passphrase: "secret" }: validateString passes; line 1347 sees "secret" && typeof "secret" !== "string""secret" && falsefalse → no throw.
For { passphrase: null } or { passphrase: undefined }: validateSecureContextOptions skips validation; line 1347 sees null && ... / undefined && ... → falsy → no throw.

So for every possible input, line 1347's condition is either unreachable (validateSecureContextOptions threw first) or false. The branch can never fire.

Same pattern as already-fixed cleanup

This is the same flavor of dead-code-made-dead-by-this-PR's-additions as resolved comment 3295620217 (the dead validateInt32 import and the dead early this[ksecureContext] = options.secureContext assignment), which the author accepted and removed in cb5d525. The PR's diff also shows the identical pre-existing passphrase check was removed from the InternalSecureContext constructor (the - line if (options.passphrase != null && typeof options.passphrase !== "string") throw new TypeError(...)) for exactly this reason — the author cleaned it up there but missed the parallel one in setSecureContext. CLAUDE.md: 'Delete dead code in the same PR that makes it dead.'

Impact / fix

Zero runtime impact — purely redundant. Filed as a nit. Delete lines 1347-1349, keeping this.passphrase = passphrase:

let passphrase = options.passphrase;
this.passphrase = passphrase;

Comment on lines +101 to +104
let pfx_bytes: &[u8] = if let Some(ab) = args[0].as_array_buffer(global) {
// SAFETY: the ArrayBuffer view is alive for the duration of the
// call (the argument is rooted by the call frame).
unsafe { core::slice::from_raw_parts(ab.ptr, ab.len) }

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 nit: from_raw_parts(ab.ptr, ab.len) bypasses the codebase's canonical ab.byte_slice() accessor — ab.ptr is null for a detached (or zero-length) ArrayBuffer, and from_raw_parts requires a non-null pointer even for len=0, so this is library-UB. No observable misbehavior today (the next line's is_empty() check throws without dereferencing), but every other as_array_buffer caller in the codebase uses byte_slice(), which guards is_detached() and also reads byte_len instead of len (the element count). One-token fix: ab.byte_slice().

Extended reasoning...

What the issue is

SecureContext::parse_pkcs12 (src/runtime/api/bun/SecureContext.rs:101-104, new in this PR) reads the pfx buffer as:

let pfx_bytes: &[u8] = if let Some(ab) = args[0].as_array_buffer(global) {
    // SAFETY: the ArrayBuffer view is alive for the duration of the
    // call (the argument is rooted by the call frame).
    unsafe { core::slice::from_raw_parts(ab.ptr, ab.len) }
} else { ... };

This bypasses the codebase's canonical accessor ArrayBuffer::byte_slice() (array_buffer.rs:572-578), which explicitly guards the null-pointer case:

pub fn byte_slice(&self) -> &[u8] {
    if self.is_detached() {
        return &[];
    }
    // SAFETY: ptr is non-null (checked above) ...
    unsafe { core::slice::from_raw_parts(self.ptr, self.byte_len) }
}

is_detached() is literally self.ptr.is_null() (array_buffer.rs:142-144), so the codebase acknowledges that ab.ptr can be null. JSC__JSValue__asArrayBuffer (bindings.cpp) sets out->ptr = view->vector() and returns true even for a detached view, where vector() is null.

Why this is library-UB

Rust's core::slice::from_raw_parts documents: "data must be non-null and aligned even for zero-length slices." Forming a slice from (null, 0) violates the validity invariant of &[u8] — Miri would flag it, and the compiler is permitted to assume the pointer is non-null.

There's a secondary issue: the code uses ab.len (the element count) instead of ab.byte_len. For a Buffer/Uint8Array these are equal, so it doesn't matter in practice for pfx, but byte_slice() reads the correct field.

Step-by-step proof

const buf = new Uint8Array(0);
// or: const ab = new ArrayBuffer(8); structuredClone(ab, {transfer:[ab]}); new Uint8Array(ab);
tls.createSecureContext({ pfx: buf });
Step Value Result
processPfxOptions (tls.ts) entry.buf = buf passed straight to NativeSecureContext.parsePkcs12(buf, undefined)
args[0].as_array_buffer(global) Some(ab) with ab.ptr = null, ab.len = 0 (detached/empty views still return Some)
from_raw_parts(null, 0) library-UB — slice validity invariant violated
pfx_bytes.is_empty() reads len only (= 0), never the pointer true → throws "PFX certificate argument is mandatory"

So in current rustc/LLVM the null pointer is never dereferenced and the user gets a clean error — no machine-level UB manifests. Only the slice's validity invariant is violated.

Why nit

  • No observable misbehavior: the very next line throws on is_empty() without touching the pointer; LLVM does not exploit from_raw_parts(null, 0) in a way that affects this.
  • Degenerate input: passing an empty/detached buffer as pfx is meaningless and fails with a clear error either way.
  • Convention deviation: every other as_array_buffer caller in the codebase (PostgresRequest.rs, internal_jsc.rs, etc.) uses byte_slice(); this is the only one that goes ab.ptr directly.

Fix

One token — use the canonical accessor:

let pfx_bytes: &[u8] = if let Some(ab) = args[0].as_array_buffer(global) {
    ab.byte_slice()
} else {
    pfx_string = args[0].to_slice(global)?;
    pfx_string.slice()
};

This guards is_detached(), returns &[] for the null case (which the existing is_empty() check then handles), and reads byte_len instead of len.

Comment thread src/uws/lib.rs
Comment on lines +1159 to +1167

// The SSL_do_handshake/SSL_read calls above may have parked
// new-session tickets / keylog lines (BoringSSL surfaces them
// mid-read, where dispatching JS could free the SSL out from
// under the caller). The stack has unwound here, so hand them
// to the owner - same ordering as the C path's
// ssl_flush_pending_session: handshake/data callbacks first,
// then sessions.
Self::flush_pending_events(this, &mut buffer);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 🟡 a8653be moved ssl_flush_pending_session() to run before each us_dispatch_data() in openssl.c so a 'data' handler that destroys the socket can't drop a parked TLS 1.3 session ticket — but the companion SSLWrapper path (TLS-over-duplex / Windows named pipes, added in the immediately-preceding commit ad98889) still calls flush_pending_events() after the handle_reading() loop. The comment at lines 1164-1166 ('same ordering as the C path … data callbacks first, then sessions') is now stale too — a8653be inverted that ordering. Mirror the three flush-before-dispatch sites into handle_reading() (or at minimum update the comment); same flavor of 'fix applied to one of two paths' follow-up the author has accepted ~10 times on this PR.

Extended reasoning...

What the gap is

HEAD commit a8653be ('tls: flush parked sessions before dispatching application data') changed openssl.c's us_internal_ssl_on_data to call ssl_flush_pending_session() / ssl_flush_pending_keylog() before each us_dispatch_data() (at three sites: the SSL_ERROR_ZERO_RETURN branch, the buffer-full mid-loop dispatch, and the loop-exit dispatch). Its commit message and the inline comment explain why: 'the data dispatch may run JS that closes the socket (an agent with keepAlive off destroys it as soon as the response completes) — the tail flush below never runs then and the parked session would be dropped. Node's NewSessionCallback runs before data, so the session is never lost there.'

The companion SSLWrapper path — used for tls.connect({ socket: <generic Duplex> }) and Windows TLS-over-named-pipe, whose session/keylog support was added in the immediately-preceding commit ad98889 — was not updated. handle_traffic() (src/uws/lib.rs:1143-1168) runs the handle_reading() loop, which dispatches data via trigger_data_callback() at lines 1050/1075/1087, and only after that loop completes calls flush_pending_events() at line 1167.

Code path that loses the session

Each of the three trigger_data_callback() sites in handle_reading() is immediately followed by a check that returns when the data callback tore the wrapper down:

Self::r(this).trigger_data_callback(&buffer[0..read]);
// The data callback may have closed the connection
if Self::r(this).ssl.is_none() || Self::r(this).flags.closed_notified() {
    return false;
}

When that returns false, handle_traffic()'s while-loop exits and reaches flush_pending_events() at line 1167 — but its first check is let Some(ssl) = Self::r(this).ssl else { return }; (line 1180), so it returns immediately with the parked session still queued on the (now-freed) SSL*. SSL_free runs the us_ssl_pending_session_free ex_data destructor, which frees the queue, and 'session' never fires.

So tls.connect({ socket: duplex }) with a 'data' handler that destroys the socket on response completion drops the TLS 1.3 session ticket, while a direct TCP tls.connect() no longer does after a8653be — exactly the asymmetry the HEAD commit fixed for one half.

The stale comment

The inline comment at lines 1164-1166 says:

'same ordering as the C path's ssl_flush_pending_session: handshake/data callbacks first, then sessions'

That was accurate when ad98889 added it, but a8653be inverted the C path's ordering to sessions-first, data-second. git show a8653be0 --stat confirms only packages/bun-usockets/src/crypto/openssl.c and a test file were touched — src/uws/lib.rs was not.

Step-by-step proof

tls.connect({ socket: duplexPair()[0], rejectUnauthorized: false }) against a TLS 1.3 server, with a 'data' listener that calls socket.destroy() once the response completes:

Step What happens
1 TCP segment arrives carrying NewSessionTicket + application data; SSLWrapper feeds it into handle_traffic()handle_reading()
2 SSL_read processes the NST; us_ssl_new_session_cb parks the serialized session on the SSL via us_ssl_pending_session_idx
3 SSL_read returns the application data; line 1087 trigger_data_callback(&buffer[0..read]) → JS 'data' handler runs
4 The 'data' handler calls socket.destroy()UpgradedDuplex::deinitSSLWrapper::deinitself.ssl = None and SSL_free
5 Line 1090: Self::r(this).ssl.is_none()truehandle_reading() returns false
6 handle_traffic() while-loop exits → line 1167 flush_pending_events()
7 Line 1180: let Some(ssl) = Self::r(this).ssl else { return };ssl is None → returns; parked session never delivered

In Node (and in Bun's C path after a8653be), the session is delivered at step 2½ — before the data — so step 4 cannot drop it.

Impact / why nit

  • Narrow surface: only TLS-over-generic-Duplex (tls.connect({ socket: duplex })) and Windows TLS-over-named-pipe; the direct-TCP path (the vast majority) is fixed by a8653be. The CONNECT-tunnel example in a8653be's own commit message uses upgradeTLS over a native net.Socket fd, which is the C path.
  • Performance-only failure mode: a dropped session ticket means resumption is silently skipped → full handshake on the next connect, not data loss or a crash.
  • Incomplete-new-feature, not regression: before this PR (specifically before ad98889, one commit ago) the SSLWrapper path had no 'session' event at all.
  • Same flavor of 'fix applied to one of two paths' follow-up the author has accepted ~10 times on this PR (e.g. resolved 3284512960 → 3291660583 for SocketEmitEndNT vs SocketHandlers2.close, resolved 3299444680 covering one of three perf_hooks paths).

The stale comment at lines 1164-1166 alone is worth fixing — the author has accepted ~5 stale-comment-after-fix nits on this PR (resolved 3291568392, 3285120525, 3296022042).

Fix

Mirror a8653be's three flush-before-dispatch sites into handle_reading() — call flush_pending_events() (or a per-site session/keylog pop) immediately before each trigger_data_callback() at lines 1050/1075/1087, with the same ssl.is_none() re-check afterward. At minimum, update the comment at lines 1164-1166 to reflect that the C path now flushes sessions before data, and note that this path is a known follow-up.

Comment on lines +158 to +162
let message = match reason {
"key" => "Unable to load private key from PFX data",
"cert" => "Unable to load certificate from PFX data",
_ => "Unable to load PFX certificate",
};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The C helper us_ssl_parse_pkcs12 distinguishes a wrong passphrase / MAC verify failure with *err_reason = "mac" (openssl.c:1040), but this match only handles "key" and "cert" explicitly — "mac" falls through to the generic 'Unable to load PFX certificate', so tls.createSecureContext({ pfx, passphrase: 'wrong' }) doesn't tell the user the passphrase was wrong (Node surfaces the OpenSSL-decomposed mac verify failure / INCORRECT_PASSWORD). The C side also ERR_clear_error()s right after setting the tag, so the tag is the only signal and it's discarded. One-arm addition, e.g. "mac" => "PFX MAC verification failed - is the passphrase correct?".

Extended reasoning...

What the gap is

The new us_ssl_parse_pkcs12 helper (openssl.c) sets *err_reason to one of four static tags, documented in its own header comment in libusockets.h: *"sets err_reason to a static tag: "parse" (not PKCS#12), "mac" (bad passphrase / corrupt), "key" (no private key), "cert" (no certificate)". The "mac" tag is set specifically when PKCS12_parse() fails (openssl.c:1039-1042) — almost always because the passphrase is wrong (PKCS12_R_MAC_VERIFY_FAILURE under OpenSSL, INCORRECT_PASSWORD under BoringSSL).

But the Rust consumer at SecureContext::parse_pkcs12 (SecureContext.rs:158-162) only matches "key" and "cert" explicitly:

let message = match reason {
    "key" => "Unable to load private key from PFX data",
    "cert" => "Unable to load certificate from PFX data",
    _ => "Unable to load PFX certificate",
};

So both "mac" (wrong passphrase) and "parse" (not a PKCS#12 blob at all) fall through to the catch-all 'Unable to load PFX certificate', even though the C side went out of its way to distinguish them.

Why the tag is the only signal

The C helper does ERR_clear_error() immediately after setting *err_reason = "mac" (openssl.c:1041), so the BoringSSL error queue is empty by the time control returns to Rust — the tag is the only signal that this was a passphrase failure, and the Rust side discards it.

What Node does

Node's SecureContext::LoadPKCS12 (crypto_context.cc) calls ThrowCryptoError(env, ERR_get_error(), "Unable to load PFX certificate") for a PKCS12_parse failure. Because Node doesn't clear the error queue first, ThrowCryptoError decomposes the queued PKCS12 error and reports the specific reason — so tls.createSecureContext({ pfx, passphrase: 'wrong' }) throws with a message containing mac verify failure (OpenSSL) or INCORRECT_PASSWORD (BoringSSL), plus the decomposed library/reason/code properties. The PR description claims pfx is parsed "with Node's error messages", and for the wrong-passphrase case it isn't.

Step-by-step proof

const tls = require('tls'), fs = require('fs');
tls.createSecureContext({
  pfx: fs.readFileSync('agent1.pfx'),
  passphrase: 'wrong-passphrase',
});
Step What happens
processPfxOptions (tls.ts) → NativeSecureContext.parsePkcs12(buf, 'wrong-passphrase') enters native
us_ssl_parse_pkcs12: d2i_PKCS12_bio succeeds (valid DER) p12 non-null
PKCS12_parse(p12, "wrong-passphrase", ...) fails — BoringSSL pushes INCORRECT_PASSWORD onto the error queue
openssl.c:1040: *err_reason = "mac" tag set
openssl.c:1041: ERR_clear_error() error queue wiped — INCORRECT_PASSWORD gone
Returns 0 → SecureContext.rs:149 if ok == 0 reads reason = "mac"
SecureContext.rs:158-162 match "mac""key", ≠ "cert" → catch-all _
Throws 'Unable to load PFX certificate' — same message as garbage input
Node.js (same input) Error: ... mac verify failure / INCORRECT_PASSWORD with err.reason/err.code set

A wrong passphrase is the most common real-world pfx error (typo, wrong env var, expired secret), so collapsing it into the same message as "this isn't even a PKCS#12 blob" is the least useful place to lose information.

Why CI doesn't catch it

The newly-ported test-tls-invalid-pfx.js only exercises the "key" case (uses cert-without-key.pfx'Unable to load private key from PFX data'); it never tests a wrong passphrase, so CI is green.

Impact / why nit

This is incomplete-new-feature, not a regression: the pfx option didn't exist before this PR. The error is still thrown, the user can still tell the pfx load failed; only the message is less actionable than it could be — and less actionable than what the C helper's own contract enables. Same flavor of error-message-shape gap as the ~30 already accepted on this PR (missing errno/syscall, formatListenError properties, etc.), and the C side specifically created the "mac" tag for a purpose the Rust side doesn't consume.

Fix

One arm:

let message = match reason {
    "key" => "Unable to load private key from PFX data",
    "cert" => "Unable to load certificate from PFX data",
    "mac" => "PFX MAC verification failed - is the passphrase correct?",
    _ => "Unable to load PFX certificate",
};

(Matching Node exactly would also need the C side to preserve the BoringSSL error instead of ERR_clear_error()ing it, so the Rust side could route through err_to_js for the full code/library/reason decomposition — but the one-arm message addition is the minimal improvement.)

@cirospaciari cirospaciari force-pushed the claude/port-node-net-tls-tests-2 branch 2 times, most recently from a0fa094 to 0123e91 Compare June 10, 2026 00:18
@cirospaciari cirospaciari changed the base branch from main to claude/upgrade-nodejs-26-v2 June 10, 2026 21:46
@cirospaciari cirospaciari force-pushed the claude/port-node-net-tls-tests-2 branch 3 times, most recently from ef5a776 to 9f68962 Compare June 10, 2026 22:45
@cirospaciari cirospaciari force-pushed the claude/upgrade-nodejs-26-v2 branch from a54090a to 3917481 Compare June 10, 2026 22:45
@cirospaciari cirospaciari force-pushed the claude/port-node-net-tls-tests-2 branch from 9f68962 to af336b9 Compare June 11, 2026 00:12
…ey surface [build images]

Half-open/reset/write semantics, server TLSSocket wrap, session/keylog,
SNICallback/ALPNCallback, pfx, OpenSSL error shapes, addCACert, local
binding (+305 tests).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants