Skip to content

Per-read synchronized(this) on the attribute path dominates attribute/weight read cost #277

Description

@brianckeegan

Summary

Every attribute read in AttributesImpl enters a per-element monitor, and EdgeImpl.getWeight() adds a second monitor on top of it. Because getWeight() and getAttribute() are the hottest read paths in the library (weighted degree, weighted PageRank, modularity, layout, shortest paths), this synchronization is paid millions of times per analytics pass. In an isolated microbenchmark, removing the monitor cuts attribute-read time by ~3.5×. The lock exists only to guard a rare array-resize on column addition, which can be made safe without locking readers.

This is a single, self-contained change with what looks like a large upside and low risk, so I wanted to raise it before opening a PR.

Environment

  • org.gephi:graphstore 0.8.7-SNAPSHOT, commit 5028d21
  • Build target JDK 17; measured on OpenJDK 21
  • Relevant because biased locking is disabled (JDK 15) / removed (JDK 18), so uncontended synchronized is no longer near-free

Where

AttributesImpl.getAttribute(int) (src/main/java/org/gephi/graph/impl/AttributesImpl.java:70):

public Object getAttribute(int index) {
    Object res = null;
    synchronized (this) {                 // entered on every read
        if (index < attributes.length) res = attributes[index];
    }
    return res;
}

EdgeImpl.getWeight() (EdgeImpl.java:79) then wraps that already-synchronized call in a second monitor on a different object (the edge), so a single weight read enters two monitors:

public double getWeight() {
    synchronized (this) {                 // monitor #1: the EdgeImpl
        Object weightObject = attributes.getAttribute(EDGE_WEIGHT_INDEX); // monitor #2: the AttributesImpl
        if (weightObject instanceof Double) {
            return (Double) weightObject;
        } else {
            return getWeight(graphStore.getView());
        }
    }
}

The only writer that mutates shared state under this monitor is the resize in setAttribute(int, Object) (AttributesImpl.java:124), which swaps the attributes array reference when a newly added column's index exceeds the current array length. Column additions are rare and already happen under the graph write lock; readers are paying a monitor on every access to protect against an event that essentially never occurs during an algorithm run.

Why it matters

getWeight() is read in the inner loop of most weighted algorithms, and getAttribute() underlies every column read (sizes, partitions, rankings, filters). The cost is incurred even single-threaded, since the monitor is unconditional.

Evidence

Isolated microbenchmark: 5M elements, sum the weight across all of them per pass, warmed up. Single-file source mode (java Bench.java) on OpenJDK 21, ParallelGC. Not JMH, so treat as directional — but the effect size is large and the mechanism is well understood.

Variant ms / 5M reads
synchronized + boxed Double (current shape) ~116–136
boxed, no synchronized ~32
primitive field, but keep synchronized ~101–104
primitive field, no synchronized ~27

The monitor alone accounts for roughly 75 ms of the 116 ms; the rest is boxing (tracked separately). Removing the monitor is a ~3.5× improvement on attribute reads in isolation, before counting the second monitor in getWeight() that the harness does not even model.

Benchmark harness
public class Bench {
    static final int N = 5_000_000, REPS = 6;
    static final class E {
        final Object[] attrs; final double prim;
        E(double w){ attrs = new Object[]{ Double.valueOf(w) }; prim = w; }
        double syncBoxed(){ Object r; synchronized(this){ r = attrs[0]; } return (Double) r; }
        double boxedNoSync(){ return (Double) attrs[0]; }
        double primSync(){ double v; synchronized(this){ v = prim; } return v; }
        double prim(){ return prim; }
    }
    public static void main(String[] a){
        E[] e = new E[N];
        for (int i = 0; i < N; i++) e[i] = new E(i * 0.5);
        double sink = 0;
        for (int r = 0; r < REPS; r++){
            long t0 = System.nanoTime(); double s0 = 0; for (int i=0;i<N;i++) s0 += e[i].syncBoxed();
            long t1 = System.nanoTime(); double s1 = 0; for (int i=0;i<N;i++) s1 += e[i].boxedNoSync();
            long t2 = System.nanoTime(); double s2 = 0; for (int i=0;i<N;i++) s2 += e[i].primSync();
            long t3 = System.nanoTime(); double s3 = 0; for (int i=0;i<N;i++) s3 += e[i].prim();
            long t4 = System.nanoTime();
            sink += s0+s1+s2+s3;
            if (r >= REPS-2)
                System.out.printf("syncBoxed %.1f | boxedNoSync %.1f | primSync %.1f | prim %.1f (ms)%n",
                    (t1-t0)/1e6,(t2-t1)/1e6,(t3-t2)/1e6,(t4-t3)/1e6);
        }
        if (sink == Double.NaN) System.out.println(sink);
    }
}

Proposed fix

Make the read path lock-free by publishing the array through a volatile reference and reading it once into a local; keep writes serialized (under the existing monitor or the graph write lock).

protected volatile Object[] attributes;

public Object getAttribute(int index) {
    Object[] a = attributes;                 // single volatile read
    return index < a.length ? a[index] : null;
}

And drop the redundant outer monitor in EdgeImpl.getWeight() so the static-weight path is fully lock-free.

Why this is safe

A resize copies the existing values into a strictly larger array and then swaps the reference. With a volatile field, a reader observes either the old array or the new one — never a torn reference — and the value at index is valid in both (the new array is a superset copy). This is the same read/write consistency the graph's ReentrantReadWriteLock already governs at the structural level; the per-element monitor was only ever protecting the reference swap, which volatile publication handles.

Writers (setAttribute and the time-attribute variants) keep their synchronization to serialize concurrent array swaps among themselves.

Scope / risk

  • Touches AttributesImpl reads and the resize swap, plus the getWeight() wrapper. Behavior is unchanged for single-threaded callers; concurrent callers gain lock-free reads with identical consistency guarantees.
  • Existing tests around concurrent attribute access should cover this; happy to add a stress test that adds a column while readers iterate.

Related findings from the same review (separate concerns; I can file individually if useful): edge weight stored as a boxed Double in the generic Object[] rather than a primitive field, and the primary adjacency index using Long2ObjectOpenCustomHashMap<int[]> (one int[] allocation per edge) where a primitive-valued map would do when parallel edges are disabled. Both are mainly memory/GC wins on large graphs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions