Summary
For the default configuration (static, Double-typed weight), EdgeImpl stores the weight as a boxed Double in the generic Object[] attributes. Every edge therefore allocates a heap Double, and every getWeight() does an array read + instanceof + unbox. Measured overhead is ~28 bytes per edge versus ~8 for a primitive field — about 190 MB of extra heap per 10M weighted edges. Since weight is both the most frequently read edge attribute and the one carried by essentially every weighted graph, a primitive double weight field for the static case is a sizeable memory and GC win, and it directly serves the "low memory footprint" goal in the README.
Environment
org.gephi:graphstore 0.8.7-SNAPSHOT, commit 5028d21
- Build target JDK 17; measured on OpenJDK 21, compressed oops, default object alignment
Where
The weight is written into the attribute array in the EdgeImpl constructor (src/main/java/org/gephi/graph/impl/EdgeImpl.java:60):
if (graphStore == null || graphStore.configuration.getEdgeWeightType().equals(Double.class)) {
this.attributes.setAttribute(GraphStoreConfiguration.EDGE_WEIGHT_INDEX, weight); // autobox to Double
}
and read back, unboxed, in getWeight() (EdgeImpl.java:79):
Object weightObject = attributes.getAttribute(GraphStoreConfiguration.EDGE_WEIGHT_INDEX);
if (weightObject instanceof Double) {
return (Double) weightObject; // unbox
}
EDGE_WEIGHT_INDEX is a fixed slot in the Object[] (GraphStoreConfiguration.java:75), so the value lives as a boxed Double object referenced from that array.
Why it matters
Weight is read in the inner loop of weighted degree, weighted PageRank, modularity, force-directed layout, and weighted shortest paths. The boxing costs on two axes:
- Memory / GC: one
Double object per edge, plus the array reference slot.
- CPU: an
Object[] indirection, a type check, and an unbox on every read (on top of the synchronization tracked separately in the monitor issue).
Measured memory (10M elements)
| Storage |
Heap |
Per element |
Object[] of boxed Double (current) |
~267 MB |
~28 bytes |
double[] equivalent |
~76 MB |
~8 bytes |
A primitive double field folds into the EdgeImpl object itself (~8 bytes, no separate allocation), saving on the order of 190 MB per 10M weighted edges in this configuration. Absolute numbers shift with JVM flags, but the ~3–4× ratio holds.
Measurement harness
public class Mem {
static final int N = 10_000_000;
static long used(){ Runtime r = Runtime.getRuntime();
for (int i=0;i<4;i++){ System.gc(); try{Thread.sleep(80);}catch(Exception e){} }
return r.totalMemory() - r.freeMemory(); }
public static void main(String[] a){
long base = used();
Object[] boxed = new Object[N];
for (int i=0;i<N;i++) boxed[i] = Double.valueOf(i*0.5);
long afterBoxed = used();
double[] prim = new double[N];
for (int i=0;i<N;i++) prim[i] = i*0.5;
long afterPrim = used();
long mb = 1024*1024;
System.out.printf("boxed Double: %d MB (%.1f B/elem)%n", (afterBoxed-base)/mb, (afterBoxed-base)/(double)N);
System.out.printf("double[] : %d MB (%.1f B/elem)%n", (afterPrim-afterBoxed)/mb, (afterPrim-afterBoxed)/(double)N);
if (boxed.length+prim.length < 0) System.out.println("x");
}
}
// java -Xmx4g Mem.java
Proposed direction
Add a primitive protected double weight field to EdgeImpl as the source of truth for the static, Double-typed weight case, and route the weight-column accessors to it:
getWeight() / setWeight(double) read and write the field directly.
- The generic column accessors for the weight column (
getAttribute(weightColumn), serialization in Serialization, and any value-index lookups) bridge to the field so the column abstraction stays consistent — i.e. the weight remains a first-class column externally, but its value is backed by a primitive internally.
- Fall back to the existing
attributes-array path only when the weight is dynamic (temporal TimeMap) or configured to a non-Double type via configuration.getEdgeWeightType().
This also composes with the separate attribute-monitor issue — both touch getWeight(), and together they make the static-weight read fully lock-free and allocation-free.
Scope / risk
- API-visible behavior unchanged; weight stays a readable/writable column with the same semantics.
- Serialization format: needs a compatible read/write path for the weight (can keep the wire format identical by serializing the field where the boxed value used to go).
- Dynamic-weight and custom-weight-type configurations are unaffected (they keep the array path).
Summary
For the default configuration (static,
Double-typed weight),EdgeImplstores the weight as a boxedDoublein the genericObject[] attributes. Every edge therefore allocates a heapDouble, and everygetWeight()does an array read +instanceof+ unbox. Measured overhead is ~28 bytes per edge versus ~8 for a primitive field — about 190 MB of extra heap per 10M weighted edges. Since weight is both the most frequently read edge attribute and the one carried by essentially every weighted graph, a primitivedouble weightfield for the static case is a sizeable memory and GC win, and it directly serves the "low memory footprint" goal in the README.Environment
org.gephi:graphstore0.8.7-SNAPSHOT, commit5028d21Where
The weight is written into the attribute array in the
EdgeImplconstructor (src/main/java/org/gephi/graph/impl/EdgeImpl.java:60):and read back, unboxed, in
getWeight()(EdgeImpl.java:79):EDGE_WEIGHT_INDEXis a fixed slot in theObject[](GraphStoreConfiguration.java:75), so the value lives as a boxedDoubleobject referenced from that array.Why it matters
Weight is read in the inner loop of weighted degree, weighted PageRank, modularity, force-directed layout, and weighted shortest paths. The boxing costs on two axes:
Doubleobject per edge, plus the array reference slot.Object[]indirection, a type check, and an unbox on every read (on top of the synchronization tracked separately in the monitor issue).Measured memory (10M elements)
Object[]of boxedDouble(current)double[]equivalentA primitive
doublefield folds into theEdgeImplobject itself (~8 bytes, no separate allocation), saving on the order of 190 MB per 10M weighted edges in this configuration. Absolute numbers shift with JVM flags, but the ~3–4× ratio holds.Measurement harness
Proposed direction
Add a primitive
protected double weightfield toEdgeImplas the source of truth for the static,Double-typed weight case, and route the weight-column accessors to it:getWeight()/setWeight(double)read and write the field directly.getAttribute(weightColumn), serialization inSerialization, and any value-index lookups) bridge to the field so the column abstraction stays consistent — i.e. the weight remains a first-class column externally, but its value is backed by a primitive internally.attributes-array path only when the weight is dynamic (temporalTimeMap) or configured to a non-Doubletype viaconfiguration.getEdgeWeightType().This also composes with the separate attribute-monitor issue — both touch
getWeight(), and together they make the static-weight read fully lock-free and allocation-free.Scope / risk