Per-read synchronized(this) on the attribute path dominates attribute/weight read cost

## Summary

Every attribute read in `AttributesImpl` enters a per-element monitor, and `EdgeImpl.getWeight()` adds a second monitor on top of it. Because `getWeight()` and `getAttribute()` are the hottest read paths in the library (weighted degree, weighted PageRank, modularity, layout, shortest paths), this synchronization is paid millions of times per analytics pass. In an isolated microbenchmark, removing the monitor cuts attribute-read time by ~3.5×. The lock exists only to guard a rare array-resize on column addition, which can be made safe without locking readers.

This is a single, self-contained change with what looks like a large upside and low risk, so I wanted to raise it before opening a PR.

## Environment

- `org.gephi:graphstore` `0.8.7-SNAPSHOT`, commit `5028d21`
- Build target JDK 17; measured on OpenJDK 21
- Relevant because biased locking is disabled (JDK 15) / removed (JDK 18), so uncontended `synchronized` is no longer near-free

## Where

`AttributesImpl.getAttribute(int)` (`src/main/java/org/gephi/graph/impl/AttributesImpl.java:70`):

```java
public Object getAttribute(int index) {
    Object res = null;
    synchronized (this) {                 // entered on every read
        if (index < attributes.length) res = attributes[index];
    }
    return res;
}
```

`EdgeImpl.getWeight()` (`EdgeImpl.java:79`) then wraps that already-synchronized call in a *second* monitor on a different object (the edge), so a single weight read enters two monitors:

```java
public double getWeight() {
    synchronized (this) {                 // monitor #1: the EdgeImpl
        Object weightObject = attributes.getAttribute(EDGE_WEIGHT_INDEX); // monitor #2: the AttributesImpl
        if (weightObject instanceof Double) {
            return (Double) weightObject;
        } else {
            return getWeight(graphStore.getView());
        }
    }
}
```

The only writer that mutates shared state under this monitor is the resize in `setAttribute(int, Object)` (`AttributesImpl.java:124`), which swaps the `attributes` array reference when a newly added column's index exceeds the current array length. Column additions are rare and already happen under the graph write lock; readers are paying a monitor on every access to protect against an event that essentially never occurs during an algorithm run.

## Why it matters

`getWeight()` is read in the inner loop of most weighted algorithms, and `getAttribute()` underlies every column read (sizes, partitions, rankings, filters). The cost is incurred even single-threaded, since the monitor is unconditional.

## Evidence

Isolated microbenchmark: 5M elements, sum the weight across all of them per pass, warmed up. Single-file source mode (`java Bench.java`) on OpenJDK 21, ParallelGC. Not JMH, so treat as directional — but the effect size is large and the mechanism is well understood.

| Variant | ms / 5M reads |
|---|---|
| `synchronized` + boxed `Double` (current shape) | ~116–136 |
| boxed, no `synchronized` | ~32 |
| primitive field, but keep `synchronized` | ~101–104 |
| primitive field, no `synchronized` | ~27 |

The monitor alone accounts for roughly 75 ms of the 116 ms; the rest is boxing (tracked separately). Removing the monitor is a ~3.5× improvement on attribute reads in isolation, before counting the second monitor in `getWeight()` that the harness does not even model.

<details>
<summary>Benchmark harness</summary>

```java
public class Bench {
    static final int N = 5_000_000, REPS = 6;
    static final class E {
        final Object[] attrs; final double prim;
        E(double w){ attrs = new Object[]{ Double.valueOf(w) }; prim = w; }
        double syncBoxed(){ Object r; synchronized(this){ r = attrs[0]; } return (Double) r; }
        double boxedNoSync(){ return (Double) attrs[0]; }
        double primSync(){ double v; synchronized(this){ v = prim; } return v; }
        double prim(){ return prim; }
    }
    public static void main(String[] a){
        E[] e = new E[N];
        for (int i = 0; i < N; i++) e[i] = new E(i * 0.5);
        double sink = 0;
        for (int r = 0; r < REPS; r++){
            long t0 = System.nanoTime(); double s0 = 0; for (int i=0;i<N;i++) s0 += e[i].syncBoxed();
            long t1 = System.nanoTime(); double s1 = 0; for (int i=0;i<N;i++) s1 += e[i].boxedNoSync();
            long t2 = System.nanoTime(); double s2 = 0; for (int i=0;i<N;i++) s2 += e[i].primSync();
            long t3 = System.nanoTime(); double s3 = 0; for (int i=0;i<N;i++) s3 += e[i].prim();
            long t4 = System.nanoTime();
            sink += s0+s1+s2+s3;
            if (r >= REPS-2)
                System.out.printf("syncBoxed %.1f | boxedNoSync %.1f | primSync %.1f | prim %.1f (ms)%n",
                    (t1-t0)/1e6,(t2-t1)/1e6,(t3-t2)/1e6,(t4-t3)/1e6);
        }
        if (sink == Double.NaN) System.out.println(sink);
    }
}
```
</details>

## Proposed fix

Make the read path lock-free by publishing the array through a `volatile` reference and reading it once into a local; keep writes serialized (under the existing monitor or the graph write lock).

```java
protected volatile Object[] attributes;

public Object getAttribute(int index) {
    Object[] a = attributes;                 // single volatile read
    return index < a.length ? a[index] : null;
}
```

And drop the redundant outer monitor in `EdgeImpl.getWeight()` so the static-weight path is fully lock-free.

### Why this is safe

A resize copies the existing values into a strictly larger array and then swaps the reference. With a `volatile` field, a reader observes either the old array or the new one — never a torn reference — and the value at `index` is valid in both (the new array is a superset copy). This is the same read/write consistency the graph's `ReentrantReadWriteLock` already governs at the structural level; the per-element monitor was only ever protecting the reference swap, which `volatile` publication handles.

Writers (`setAttribute` and the time-attribute variants) keep their synchronization to serialize concurrent array swaps among themselves.

## Scope / risk

- Touches `AttributesImpl` reads and the resize swap, plus the `getWeight()` wrapper. Behavior is unchanged for single-threaded callers; concurrent callers gain lock-free reads with identical consistency guarantees.
- Existing tests around concurrent attribute access should cover this; happy to add a stress test that adds a column while readers iterate.

---

*Related findings from the same review (separate concerns; I can file individually if useful): edge weight stored as a boxed `Double` in the generic `Object[]` rather than a primitive field, and the primary adjacency index using `Long2ObjectOpenCustomHashMap<int[]>` (one `int[]` allocation per edge) where a primitive-valued map would do when parallel edges are disabled. Both are mainly memory/GC wins on large graphs.*


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Per-read synchronized(this) on the attribute path dominates attribute/weight read cost #277

Summary

Environment

Where

Why it matters

Evidence

Proposed fix

Why this is safe

Scope / risk

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Variant	ms / 5M reads
`synchronized` + boxed `Double` (current shape)	~116–136
boxed, no `synchronized`	~32
primitive field, but keep `synchronized`	~101–104
primitive field, no `synchronized`	~27

Uh oh!

Per-read synchronized(this) on the attribute path dominates attribute/weight read cost #277

Description

Summary

Environment

Where

Why it matters

Evidence

Proposed fix

Why this is safe

Scope / risk

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions