Skip to content

Heap Buffer Overflow via Integer Overflow in GGUF Tensor Parsing

High
ggerganov published GHSA-96jg-mvhq-q7q7 Mar 18, 2026

Package

ggml-org/llama.cpp

Affected versions

< b7437

Patched versions

b7824

Description

Summary

An integer overflow vulnerability in the ggml_nbytes function allows an attacker to bypass memory validation by crafting a GGUF file with specific tensor dimensions. This causes ggml_nbytes to return a significantly smaller size than required (e.g., 4MB instead of Exabytes), leading to a heap-based buffer overflow when the application subsequently processes the tensor. This vulnerability allows potential Remote Code Execution (RCE) via memory corruption.

Details

The vulnerability exists in ggml/src/ggml.c within the ggml_nbytes function, which calculates the storage size of a tensor:

// ggml/src/ggml.c
size_t ggml_nbytes(const struct ggml_tensor * tensor) {
    // ...
    size_t nbytes;
    const size_t blck_size = ggml_blck_size(tensor->type);
    if (blck_size == 1) {
        nbytes = ggml_type_size(tensor->type);
        for (int i = 0; i < GGML_MAX_DIMS; ++i) {
            nbytes += (tensor->ne[i] - 1)*tensor->nb[i]; // <--- VULNERABILITY
        }
    }
    // ...
    return nbytes;
}

The multiplication (tensor->ne[i] - 1) * tensor->nb[i] operates on int64_t (converted to size_t), which is vulnerable to integer overflow.
Additionally, the GGUF loader in gguf.cpp does not adequately validate that the claimed byte size matches the logical dimensions, nor does it check if the total logical size exceeds SIZE_MAX bytes, only checking if the element count fits in INT64_MAX.

PoC

To reproduce this vulnerability, create a GGUF file containing a tensor with the following properties:

  • Type: GGML_TYPE_F32 (4 bytes)
  • Dimensions: ne = [1024, 1024, 4398046511105, 1]
    • Note: 4398046511105 is 2^42 + 1.

Calculation Breakdown:

  1. Strides (nb) are calculated naturally:
    • nb[0] = 4
    • nb[1] = 4 * 1024 = 4096
    • nb[2] = 4096 * 1024 = 4,194,304 (2^22)
  2. ggml_nbytes Loop:
    • i=2 term: (ne[2] - 1) * nb[2]
    • = (2^42) * (2^22) = 2^64
    • = 0 (due to overflow wrapping on 64-bit systems).
  3. Result: The calculated size (nbytes) represents only the small dimensions (0 and 1), resulting in ~4MB allocation.
  4. Real Size: The tensor logically contains ~2^62 elements, requiring Exabytes of memory.
  5. Crash: Accessing this tensor triggers a heap buffer overflow.
poc

Impact

  • Vulnerability Type: Heap-based Buffer Overflow
  • Impact: Remote Code Execution (RCE), Denial of Service (DoS).
  • Affected Components: Applications using llama.cpp to load untrusted GGUF models.
  • Attack Vector: An attacker provides a malicious GGUF file which, when loaded, triggers the overflow.

Suggested Fix

  1. Checked Arithmetic: Use checked arithmetic functions (e.g., __builtin_mul_overflow) in ggml_nbytes and stride calculations to detect overflows.
  2. Input Validation: Implement strict size validation in gguf.cpp when parsing tensor information. Verify that the total size in bytes (not just element count) fits within addressable memory limits (e.g., SIZE_MAX) before accepting the tensor.
// Example validation logic
if (total_elements > SIZE_MAX / ggml_type_size(type)) {
    return false; // Overflow would occur
}

Severity

High

CVSS overall score

This score calculates overall vulnerability severity from 0 to 10 and is based on the Common Vulnerability Scoring System (CVSS).
/ 10

CVSS v3 base metrics

Attack vector
Local
Attack complexity
Low
Privileges required
None
User interaction
Required
Scope
Unchanged
Confidentiality
High
Integrity
High
Availability
High

CVSS v3 base metrics

Attack vector: More severe the more the remote (logically and physically) an attacker can be in order to exploit the vulnerability.
Attack complexity: More severe for the least complex attacks.
Privileges required: More severe if no privileges are required.
User interaction: More severe when no user interaction is required.
Scope: More severe when a scope change occurs, e.g. one vulnerable component impacts resources in components beyond its security scope.
Confidentiality: More severe when loss of data confidentiality is highest, measuring the level of data access available to an unauthorized user.
Integrity: More severe when loss of data integrity is the highest, measuring the consequence of data modification possible by an unauthorized user.
Availability: More severe when the loss of impacted component availability is highest.
CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:H

CVE ID

CVE-2026-33298

Weaknesses

Heap-based Buffer Overflow

A heap overflow condition is a buffer overflow, where the buffer that can be overwritten is allocated in the heap portion of memory, generally meaning that the buffer was allocated using a routine such as malloc(). Learn more on MITRE.

Integer Overflow or Wraparound

The product performs a calculation that can produce an integer overflow or wraparound when the logic assumes that the resulting value will always be larger than the original value. This occurs when an integer value is incremented to a value that is too large to store in the associated representation. When this occurs, the value may become a very small or negative number. Learn more on MITRE.

Credits