Summary
An integer overflow vulnerability in the ggml_nbytes function allows an attacker to bypass memory validation by crafting a GGUF file with specific tensor dimensions. This causes ggml_nbytes to return a significantly smaller size than required (e.g., 4MB instead of Exabytes), leading to a heap-based buffer overflow when the application subsequently processes the tensor. This vulnerability allows potential Remote Code Execution (RCE) via memory corruption.
Details
The vulnerability exists in ggml/src/ggml.c within the ggml_nbytes function, which calculates the storage size of a tensor:
// ggml/src/ggml.c
size_t ggml_nbytes(const struct ggml_tensor * tensor) {
// ...
size_t nbytes;
const size_t blck_size = ggml_blck_size(tensor->type);
if (blck_size == 1) {
nbytes = ggml_type_size(tensor->type);
for (int i = 0; i < GGML_MAX_DIMS; ++i) {
nbytes += (tensor->ne[i] - 1)*tensor->nb[i]; // <--- VULNERABILITY
}
}
// ...
return nbytes;
}
The multiplication (tensor->ne[i] - 1) * tensor->nb[i] operates on int64_t (converted to size_t), which is vulnerable to integer overflow.
Additionally, the GGUF loader in gguf.cpp does not adequately validate that the claimed byte size matches the logical dimensions, nor does it check if the total logical size exceeds SIZE_MAX bytes, only checking if the element count fits in INT64_MAX.
PoC
To reproduce this vulnerability, create a GGUF file containing a tensor with the following properties:
- Type:
GGML_TYPE_F32 (4 bytes)
- Dimensions:
ne = [1024, 1024, 4398046511105, 1]
- Note:
4398046511105 is 2^42 + 1.
Calculation Breakdown:
- Strides (
nb) are calculated naturally:
nb[0] = 4
nb[1] = 4 * 1024 = 4096
nb[2] = 4096 * 1024 = 4,194,304 (2^22)
ggml_nbytes Loop:
i=2 term: (ne[2] - 1) * nb[2]
= (2^42) * (2^22) = 2^64
= 0 (due to overflow wrapping on 64-bit systems).
- Result: The calculated size (
nbytes) represents only the small dimensions (0 and 1), resulting in ~4MB allocation.
- Real Size: The tensor logically contains
~2^62 elements, requiring Exabytes of memory.
- Crash: Accessing this tensor triggers a heap buffer overflow.
Impact
- Vulnerability Type: Heap-based Buffer Overflow
- Impact: Remote Code Execution (RCE), Denial of Service (DoS).
- Affected Components: Applications using
llama.cpp to load untrusted GGUF models.
- Attack Vector: An attacker provides a malicious GGUF file which, when loaded, triggers the overflow.
Suggested Fix
- Checked Arithmetic: Use checked arithmetic functions (e.g.,
__builtin_mul_overflow) in ggml_nbytes and stride calculations to detect overflows.
- Input Validation: Implement strict size validation in
gguf.cpp when parsing tensor information. Verify that the total size in bytes (not just element count) fits within addressable memory limits (e.g., SIZE_MAX) before accepting the tensor.
// Example validation logic
if (total_elements > SIZE_MAX / ggml_type_size(type)) {
return false; // Overflow would occur
}
Summary
An integer overflow vulnerability in the
ggml_nbytesfunction allows an attacker to bypass memory validation by crafting a GGUF file with specific tensor dimensions. This causesggml_nbytesto return a significantly smaller size than required (e.g., 4MB instead of Exabytes), leading to a heap-based buffer overflow when the application subsequently processes the tensor. This vulnerability allows potential Remote Code Execution (RCE) via memory corruption.Details
The vulnerability exists in
ggml/src/ggml.cwithin theggml_nbytesfunction, which calculates the storage size of a tensor:The multiplication
(tensor->ne[i] - 1) * tensor->nb[i]operates onint64_t(converted tosize_t), which is vulnerable to integer overflow.Additionally, the GGUF loader in
gguf.cppdoes not adequately validate that the claimed byte size matches the logical dimensions, nor does it check if the total logical size exceedsSIZE_MAXbytes, only checking if the element count fits inINT64_MAX.PoC
To reproduce this vulnerability, create a GGUF file containing a tensor with the following properties:
GGML_TYPE_F32(4 bytes)ne = [1024, 1024, 4398046511105, 1]4398046511105is2^42 + 1.Calculation Breakdown:
nb) are calculated naturally:nb[0] = 4nb[1] = 4 * 1024 = 4096nb[2] = 4096 * 1024 = 4,194,304 (2^22)ggml_nbytesLoop:i=2term:(ne[2] - 1) * nb[2]= (2^42) * (2^22) = 2^64= 0(due to overflow wrapping on 64-bit systems).nbytes) represents only the small dimensions (0 and 1), resulting in ~4MB allocation.~2^62elements, requiring Exabytes of memory.Impact
llama.cppto load untrusted GGUF models.Suggested Fix
__builtin_mul_overflow) inggml_nbytesand stride calculations to detect overflows.gguf.cppwhen parsing tensor information. Verify that the total size in bytes (not just element count) fits within addressable memory limits (e.g.,SIZE_MAX) before accepting the tensor.