Skip to content

Store static edge weight as a primitive double field instead of a boxed Double in the attribute array. #278

Description

@brianckeegan

Summary

For the default configuration (static, Double-typed weight), EdgeImpl stores the weight as a boxed Double in the generic Object[] attributes. Every edge therefore allocates a heap Double, and every getWeight() does an array read + instanceof + unbox. Measured overhead is ~28 bytes per edge versus ~8 for a primitive field — about 190 MB of extra heap per 10M weighted edges. Since weight is both the most frequently read edge attribute and the one carried by essentially every weighted graph, a primitive double weight field for the static case is a sizeable memory and GC win, and it directly serves the "low memory footprint" goal in the README.

Environment

  • org.gephi:graphstore 0.8.7-SNAPSHOT, commit 5028d21
  • Build target JDK 17; measured on OpenJDK 21, compressed oops, default object alignment

Where

The weight is written into the attribute array in the EdgeImpl constructor (src/main/java/org/gephi/graph/impl/EdgeImpl.java:60):

if (graphStore == null || graphStore.configuration.getEdgeWeightType().equals(Double.class)) {
    this.attributes.setAttribute(GraphStoreConfiguration.EDGE_WEIGHT_INDEX, weight); // autobox to Double
}

and read back, unboxed, in getWeight() (EdgeImpl.java:79):

Object weightObject = attributes.getAttribute(GraphStoreConfiguration.EDGE_WEIGHT_INDEX);
if (weightObject instanceof Double) {
    return (Double) weightObject;   // unbox
}

EDGE_WEIGHT_INDEX is a fixed slot in the Object[] (GraphStoreConfiguration.java:75), so the value lives as a boxed Double object referenced from that array.

Why it matters

Weight is read in the inner loop of weighted degree, weighted PageRank, modularity, force-directed layout, and weighted shortest paths. The boxing costs on two axes:

  • Memory / GC: one Double object per edge, plus the array reference slot.
  • CPU: an Object[] indirection, a type check, and an unbox on every read (on top of the synchronization tracked separately in the monitor issue).

Measured memory (10M elements)

Storage Heap Per element
Object[] of boxed Double (current) ~267 MB ~28 bytes
double[] equivalent ~76 MB ~8 bytes

A primitive double field folds into the EdgeImpl object itself (~8 bytes, no separate allocation), saving on the order of 190 MB per 10M weighted edges in this configuration. Absolute numbers shift with JVM flags, but the ~3–4× ratio holds.

Measurement harness
public class Mem {
    static final int N = 10_000_000;
    static long used(){ Runtime r = Runtime.getRuntime();
        for (int i=0;i<4;i++){ System.gc(); try{Thread.sleep(80);}catch(Exception e){} }
        return r.totalMemory() - r.freeMemory(); }
    public static void main(String[] a){
        long base = used();
        Object[] boxed = new Object[N];
        for (int i=0;i<N;i++) boxed[i] = Double.valueOf(i*0.5);
        long afterBoxed = used();
        double[] prim = new double[N];
        for (int i=0;i<N;i++) prim[i] = i*0.5;
        long afterPrim = used();
        long mb = 1024*1024;
        System.out.printf("boxed Double: %d MB (%.1f B/elem)%n", (afterBoxed-base)/mb, (afterBoxed-base)/(double)N);
        System.out.printf("double[]    : %d MB (%.1f B/elem)%n", (afterPrim-afterBoxed)/mb, (afterPrim-afterBoxed)/(double)N);
        if (boxed.length+prim.length < 0) System.out.println("x");
    }
}
// java -Xmx4g Mem.java

Proposed direction

Add a primitive protected double weight field to EdgeImpl as the source of truth for the static, Double-typed weight case, and route the weight-column accessors to it:

  • getWeight() / setWeight(double) read and write the field directly.
  • The generic column accessors for the weight column (getAttribute(weightColumn), serialization in Serialization, and any value-index lookups) bridge to the field so the column abstraction stays consistent — i.e. the weight remains a first-class column externally, but its value is backed by a primitive internally.
  • Fall back to the existing attributes-array path only when the weight is dynamic (temporal TimeMap) or configured to a non-Double type via configuration.getEdgeWeightType().

This also composes with the separate attribute-monitor issue — both touch getWeight(), and together they make the static-weight read fully lock-free and allocation-free.

Scope / risk

  • API-visible behavior unchanged; weight stays a readable/writable column with the same semantics.
  • Serialization format: needs a compatible read/write path for the weight (can keep the wire format identical by serializing the field where the boxed value used to go).
  • Dynamic-weight and custom-weight-type configurations are unaffected (they keep the array path).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions