Summary
The RPC backend's deserialize_tensor() skips all bounds validation when a tensor's buffer field is 0. An unauthenticated attacker can read and write arbitrary process memory via crafted GRAPH_COMPUTE messages. Combined with pointer leaks from ALLOC_BUFFER/BUFFER_GET_BASE, this gives full ASLR bypass and remote code execution. No authentication required, just TCP access to the RPC server port.
Details
In ggml/src/ggml-rpc/ggml-rpc.cpp, deserialize_tensor() (called during GRAPH_COMPUTE) only validates the data pointer when buffer != nullptr:
// Line 1177-1180: buffer=0 -> nullptr, validation skipped
result->buffer = reinterpret_cast<ggml_backend_buffer_t>(tensor->buffer);
if (result->buffer && buffers.find(result->buffer) == buffers.end()) {
result->buffer = nullptr;
}
// Line 1182-1189: bounds check ONLY runs when buffer != nullptr
if (result->buffer) {
// ... GGML_ASSERT checks on data pointer ...
}
// Line 1196: data pointer set unconditionally from attacker input
result->data = reinterpret_cast<void *>(tensor->data);
When buffer=0, the attacker-controlled data pointer goes straight to ggml compute kernels (e.g. ggml_compute_forward_dup does memcpy using tensor->data). This gives arbitrary read/write.
The SET_TENSOR bounds check (the RPC-04 fix) does not protect this path. GRAPH_COMPUTE uses graph_compute() -> create_node() -> deserialize_tensor(), a completely separate code path.
Exploitation chain:
- ASLR bypass:
ALLOC_BUFFER returns raw heap pointer; BUFFER_GET_BASE returns data area pointer. Raw addresses are used as handles with no indirection table.
- Arbitrary read: GRAPH_COMPUTE with
buffer=0 tensors + OP_CPY copies from any address into a readable buffer.
- Code pointer leak: Read the buffer struct's
iface function pointers to get libggml-base.so base address.
- libc resolution: Read
free@GOT.PLT from libggml-base.so, compute system() address via offset arithmetic.
- Function pointer overwrite: Overwrite a victim buffer struct's
iface.clear with system(), place command string at offset 0 of the struct.
- Trigger:
BUFFER_CLEAR calls buffer->iface.clear(buffer, 0) which becomes system("attacker_command").
PoC
Tested 2026-02-07 against rpc-server in Docker (Ubuntu 24.04, aarch64, commit e0c93af2a03f5c53d052dfaefd86c06ed3784646).
# Build (pinned commit for reproducible offsets)
cd exploits/
docker build -t rpc-debug -f Dockerfile.rpc-pinned .
docker run -d --name rpc-debug -p 50052:50052 rpc-debug
# Extract offsets (needed if rebuilding with different commit/distro)
docker exec rpc-debug /extract_offsets.sh
# Run full RCE chain
python3 rpc_graph_compute_poc.py 127.0.0.1 50052
# Verify
docker exec rpc-debug cat /tmp/pwned
# uid=0(root) gid=0(root) groups=0(root)
PoC output:
Phase 1 - Pointer Leak:
Buffer A struct: 0x0000aaab1ddc4f70
Buffer A data: 0x0000aaab1ddc7800
Phase 2 - Arbitrary Copy (buffer=0):
Buffer B after: b'EXPLOITED_BY_BUFFER_ZERO_BYPASS!'
[!!!] MATCH - buffer=0 arbitrary copy CONFIRMED
Phase 3 - Function Pointer Leak:
clear = 0x0000ffffaac0ece4 (libggml-base.so)
Phase 4 - Resolve system():
system() = 0x0000ffff9f4bcec4
Phase 5 - RCE:
[!!!] Server responded - command executed successfully!
The server crashes (SIGBUS, exit 135) shortly after command execution during cleanup of the corrupted buffer struct. The command runs successfully and the BUFFER_CLEAR response is sent back before the crash. An attacker could avoid the crash by restoring the original iface struct via a second GRAPH_COMPUTE copy before triggering.
Impact
Remote code execution. Any unauthenticated TCP client that can reach the RPC server port (default 50052) gets arbitrary command execution as the server process user (often root in Docker).
- Defeats ASLR via protocol-level pointer leaks
- Defeats NX via function pointer hijacking (no shellcode needed)
- Stack canaries and RELRO are irrelevant since this is a heap struct overwrite
Attack scenarios:
- Lateral movement in corporate/cloud networks: scan for port 50052 on GPU servers
- Internet exposure via Docker
-p 50052:50052 or --host 0.0.0.0
- Local privilege escalation from any process on the same host
The RPC backend must be explicitly enabled (-DGGML_RPC=ON) and defaults to localhost, but the intended use case is distributed inference across machines, so network exposure is expected.
Relation to prior CVEs
This vulnerability shares the same root cause as two previously patched CVEs:
- CVE-2024-42478 (GHSA-xcr4-8r59-4fh2): OOB read via
GET_TENSOR - patched by adding bounds validation in the GET_TENSOR handler.
- CVE-2024-42479 (GHSA-g84r-q9f6-3frf): OOB read/write via
SET_TENSOR - patched by adding a // sanitize tensor->data bounds check in set_tensor() (lines 1227-1237).
All three share the same underlying bug: deserialize_tensor() does not validate tensor->data when buffer=0. The 2024 patches added bounds checks in the GET_TENSOR and SET_TENSOR command handlers, but the GRAPH_COMPUTE command takes a separate code path (graph_compute() -> create_node() -> deserialize_tensor()) that was never patched. The buffer=0 bypass has been present since the initial RPC implementation (PR #6829, merged 2024-05-14).
This variant is more severe than the prior two because GRAPH_COMPUTE processes multiple tensors with arbitrary operations, giving the attacker both read and write primitives in a single request. The prior CVEs only gave read (GET_TENSOR) or write (SET_TENSOR) individually.
Disclosure context
This was reported to CERT/CC on 2026-02-08 as VU#748698. CERT/CC closed the case, so this report is being filed directly with the maintainers.
The project's SECURITY.md explicitly excludes RPC from vulnerability scope, stating: "Do not use the RPC backend" and "none of the topics under 'Using llama.cpp securely' are considered vulnerabilities." However, a full-chain unauthenticated RCE with a working exploit against a network service that ships in the official build system warrants maintainer awareness regardless of scope policy. The RPC backend is used in production for distributed inference across machines.
Summary
The RPC backend's
deserialize_tensor()skips all bounds validation when a tensor'sbufferfield is 0. An unauthenticated attacker can read and write arbitrary process memory via crafted GRAPH_COMPUTE messages. Combined with pointer leaks from ALLOC_BUFFER/BUFFER_GET_BASE, this gives full ASLR bypass and remote code execution. No authentication required, just TCP access to the RPC server port.Details
In
ggml/src/ggml-rpc/ggml-rpc.cpp,deserialize_tensor()(called during GRAPH_COMPUTE) only validates thedatapointer whenbuffer != nullptr:When
buffer=0, the attacker-controlleddatapointer goes straight to ggml compute kernels (e.g.ggml_compute_forward_dupdoesmemcpyusingtensor->data). This gives arbitrary read/write.The SET_TENSOR bounds check (the RPC-04 fix) does not protect this path. GRAPH_COMPUTE uses
graph_compute()->create_node()->deserialize_tensor(), a completely separate code path.Exploitation chain:
ALLOC_BUFFERreturns raw heap pointer;BUFFER_GET_BASEreturns data area pointer. Raw addresses are used as handles with no indirection table.buffer=0tensors +OP_CPYcopies from any address into a readable buffer.ifacefunction pointers to getlibggml-base.sobase address.free@GOT.PLTfrom libggml-base.so, computesystem()address via offset arithmetic.iface.clearwithsystem(), place command string at offset 0 of the struct.BUFFER_CLEARcallsbuffer->iface.clear(buffer, 0)which becomessystem("attacker_command").PoC
Tested 2026-02-07 against rpc-server in Docker (Ubuntu 24.04, aarch64, commit
e0c93af2a03f5c53d052dfaefd86c06ed3784646).PoC output:
The server crashes (SIGBUS, exit 135) shortly after command execution during cleanup of the corrupted buffer struct. The command runs successfully and the BUFFER_CLEAR response is sent back before the crash. An attacker could avoid the crash by restoring the original iface struct via a second GRAPH_COMPUTE copy before triggering.
Impact
Remote code execution. Any unauthenticated TCP client that can reach the RPC server port (default 50052) gets arbitrary command execution as the server process user (often root in Docker).
Attack scenarios:
-p 50052:50052or--host 0.0.0.0The RPC backend must be explicitly enabled (
-DGGML_RPC=ON) and defaults to localhost, but the intended use case is distributed inference across machines, so network exposure is expected.Relation to prior CVEs
This vulnerability shares the same root cause as two previously patched CVEs:
GET_TENSOR- patched by adding bounds validation in the GET_TENSOR handler.SET_TENSOR- patched by adding a// sanitize tensor->databounds check inset_tensor()(lines 1227-1237).All three share the same underlying bug:
deserialize_tensor()does not validatetensor->datawhenbuffer=0. The 2024 patches added bounds checks in the GET_TENSOR and SET_TENSOR command handlers, but the GRAPH_COMPUTE command takes a separate code path (graph_compute()->create_node()->deserialize_tensor()) that was never patched. The buffer=0 bypass has been present since the initial RPC implementation (PR #6829, merged 2024-05-14).This variant is more severe than the prior two because GRAPH_COMPUTE processes multiple tensors with arbitrary operations, giving the attacker both read and write primitives in a single request. The prior CVEs only gave read (GET_TENSOR) or write (SET_TENSOR) individually.
Disclosure context
This was reported to CERT/CC on 2026-02-08 as VU#748698. CERT/CC closed the case, so this report is being filed directly with the maintainers.
The project's SECURITY.md explicitly excludes RPC from vulnerability scope, stating: "Do not use the RPC backend" and "none of the topics under 'Using llama.cpp securely' are considered vulnerabilities." However, a full-chain unauthenticated RCE with a working exploit against a network service that ships in the official build system warrants maintainer awareness regardless of scope policy. The RPC backend is used in production for distributed inference across machines.