Unauthenticated RCE via GRAPH_COMPUTE buffer=0 bypass in llama.cpp RPC backend

Summary

The RPC backend's deserialize_tensor() skips all bounds validation when a tensor's buffer field is 0. An unauthenticated attacker can read and write arbitrary process memory via crafted GRAPH_COMPUTE messages. Combined with pointer leaks from ALLOC_BUFFER/BUFFER_GET_BASE, this gives full ASLR bypass and remote code execution. No authentication required, just TCP access to the RPC server port.

Details

In ggml/src/ggml-rpc/ggml-rpc.cpp, deserialize_tensor() (called during GRAPH_COMPUTE) only validates the data pointer when buffer != nullptr:

// Line 1177-1180: buffer=0 -> nullptr, validation skipped
result->buffer = reinterpret_cast<ggml_backend_buffer_t>(tensor->buffer);
if (result->buffer && buffers.find(result->buffer) == buffers.end()) {
    result->buffer = nullptr;
}

// Line 1182-1189: bounds check ONLY runs when buffer != nullptr
if (result->buffer) {
    // ... GGML_ASSERT checks on data pointer ...
}

// Line 1196: data pointer set unconditionally from attacker input
result->data = reinterpret_cast<void *>(tensor->data);

When buffer=0, the attacker-controlled data pointer goes straight to ggml compute kernels (e.g. ggml_compute_forward_dup does memcpy using tensor->data). This gives arbitrary read/write.

The SET_TENSOR bounds check (the RPC-04 fix) does not protect this path. GRAPH_COMPUTE uses graph_compute() -> create_node() -> deserialize_tensor(), a completely separate code path.

Exploitation chain:

ASLR bypass: ALLOC_BUFFER returns raw heap pointer; BUFFER_GET_BASE returns data area pointer. Raw addresses are used as handles with no indirection table.
Arbitrary read: GRAPH_COMPUTE with buffer=0 tensors + OP_CPY copies from any address into a readable buffer.
Code pointer leak: Read the buffer struct's iface function pointers to get libggml-base.so base address.
libc resolution: Read free@GOT.PLT from libggml-base.so, compute system() address via offset arithmetic.
Function pointer overwrite: Overwrite a victim buffer struct's iface.clear with system(), place command string at offset 0 of the struct.
Trigger: BUFFER_CLEAR calls buffer->iface.clear(buffer, 0) which becomes system("attacker_command").

PoC

Tested 2026-02-07 against rpc-server in Docker (Ubuntu 24.04, aarch64, commit e0c93af2a03f5c53d052dfaefd86c06ed3784646).

# Build (pinned commit for reproducible offsets)
cd exploits/
docker build -t rpc-debug -f Dockerfile.rpc-pinned .
docker run -d --name rpc-debug -p 50052:50052 rpc-debug

# Extract offsets (needed if rebuilding with different commit/distro)
docker exec rpc-debug /extract_offsets.sh

# Run full RCE chain
python3 rpc_graph_compute_poc.py 127.0.0.1 50052

# Verify
docker exec rpc-debug cat /tmp/pwned
# uid=0(root) gid=0(root) groups=0(root)

PoC output:

Phase 1 - Pointer Leak:
  Buffer A struct: 0x0000aaab1ddc4f70
  Buffer A data:   0x0000aaab1ddc7800

Phase 2 - Arbitrary Copy (buffer=0):
  Buffer B after: b'EXPLOITED_BY_BUFFER_ZERO_BYPASS!'
  [!!!] MATCH - buffer=0 arbitrary copy CONFIRMED

Phase 3 - Function Pointer Leak:
  clear = 0x0000ffffaac0ece4 (libggml-base.so)

Phase 4 - Resolve system():
  system() = 0x0000ffff9f4bcec4

Phase 5 - RCE:
  [!!!] Server responded - command executed successfully!

The server crashes (SIGBUS, exit 135) shortly after command execution during cleanup of the corrupted buffer struct. The command runs successfully and the BUFFER_CLEAR response is sent back before the crash. An attacker could avoid the crash by restoring the original iface struct via a second GRAPH_COMPUTE copy before triggering.

Impact

Remote code execution. Any unauthenticated TCP client that can reach the RPC server port (default 50052) gets arbitrary command execution as the server process user (often root in Docker).

Defeats ASLR via protocol-level pointer leaks
Defeats NX via function pointer hijacking (no shellcode needed)
Stack canaries and RELRO are irrelevant since this is a heap struct overwrite

Attack scenarios:

Lateral movement in corporate/cloud networks: scan for port 50052 on GPU servers
Internet exposure via Docker -p 50052:50052 or --host 0.0.0.0
Local privilege escalation from any process on the same host

The RPC backend must be explicitly enabled (-DGGML_RPC=ON) and defaults to localhost, but the intended use case is distributed inference across machines, so network exposure is expected.

Relation to prior CVEs

This vulnerability shares the same root cause as two previously patched CVEs:

CVE-2024-42478 (GHSA-xcr4-8r59-4fh2): OOB read via GET_TENSOR - patched by adding bounds validation in the GET_TENSOR handler.
CVE-2024-42479 (GHSA-g84r-q9f6-3frf): OOB read/write via SET_TENSOR - patched by adding a // sanitize tensor->data bounds check in set_tensor() (lines 1227-1237).

All three share the same underlying bug: deserialize_tensor() does not validate tensor->data when buffer=0. The 2024 patches added bounds checks in the GET_TENSOR and SET_TENSOR command handlers, but the GRAPH_COMPUTE command takes a separate code path (graph_compute() -> create_node() -> deserialize_tensor()) that was never patched. The buffer=0 bypass has been present since the initial RPC implementation (PR #6829, merged 2024-05-14).

This variant is more severe than the prior two because GRAPH_COMPUTE processes multiple tensors with arbitrary operations, giving the attacker both read and write primitives in a single request. The prior CVEs only gave read (GET_TENSOR) or write (SET_TENSOR) individually.

Disclosure context

This was reported to CERT/CC on 2026-02-08 as VU#748698. CERT/CC closed the case, so this report is being filed directly with the maintainers.

The project's SECURITY.md explicitly excludes RPC from vulnerability scope, stating: "Do not use the RPC backend" and "none of the topics under 'Using llama.cpp securely' are considered vulnerabilities." However, a full-chain unauthenticated RCE with a working exploit against a network service that ships in the official build system warrants maintainer awareness regardless of scope policy. The RPC backend is used in production for distributed inference across machines.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unauthenticated RCE via GRAPH_COMPUTE buffer=0 bypass in llama.cpp RPC backend

Software

Affected versions

Patched versions

Description

Summary

Details

PoC

Impact

Relation to prior CVEs

Disclosure context

Severity

CVSS overall score

CVSS v3 base metrics

CVSS v3 base metrics

CVE ID

Weaknesses

Improper Restriction of Operations within the Bounds of a Memory Buffer

Credits