perf: scatter groupby-sum terms directly instead of unstacking by FBumann · Pull Request #793 · PyPSA/linopy

FBumann · 2026-06-29T07:58:56Z

Note

The following content was generated by AI.

What this does

The fast path of LinearExpression.groupby(...).sum() previously did
ds.unstack(group_dim, fill_value=...) followed by a stack. That
materializes 2–3 intermediate copies of the padded result
(n_groups × max_group_size × nterm) and routes through pandas
MultiIndex machinery sized by the number of elements.

This change factorizes the groups and scatters coeffs/vars directly
into preallocated padded result arrays; constants are group-summed with
np.add.at. Peak memory drops to input + result (the minimum for the
padded layout), and the grouping itself gets considerably faster. The
result is unchanged: same dims, coords, term ordering and padding.

The unstack-based implementation is kept as _sum_by_unstack and is
still used for chunked (dask-backed) data, which cannot be scattered into
numpy arrays. NaN group labels now raise an informative ValueError
instead of failing inside unstack.

Notes

Self-contained: touches only linopy/expressions.py and adds tests in
test/test_linear_expression.py (124 new lines).
_sum_by_unstack retains the current master names_to_drop logic
(drop every coordinate aligned to group_dim), so the slow path keeps
the existing behavior.

Verification

pytest test/test_linear_expression.py → 309 passed
groupby/sum subset (-k "group or sum or scatter or unstack") → 73 passed
ruff check, ruff format --check, mypy linopy/expressions.py → clean

The fast path of LinearExpression.groupby(...).sum() used ds.unstack(group_dim, fill_value=...) followed by a stack, which materializes 2-3 intermediate copies of the padded result (n_groups x max_group_size x nterm) and goes through pandas MultiIndex machinery sized by the number of elements. Instead, factorize the groups and scatter coeffs/vars directly into the preallocated padded result arrays; constants are group-summed with np.add.at. Peak memory drops to input + result (the minimum for the padded layout) and the grouping itself gets considerably faster. The result is unchanged: same dims, coords, term ordering and padding. The unstack-based implementation is kept as _sum_by_unstack and still used for chunked (dask-backed) data, which cannot be scattered into numpy arrays. NaN group labels now raise an informative ValueError instead of failing inside unstack. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

codspeed-hq · 2026-06-29T08:02:05Z

Merging this PR will improve performance by ×2.1

⚠️

Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 10 improved benchmarks
✅ 128 untouched benchmarks
⏩ 138 skipped benchmarks¹

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
⚡	Memory	`test_to_lp[nodal_balance-severity=100]`	17.9 MB	6 MB	×3
⚡	Memory	`test_to_lp[nodal_balance-severity=50]`	9.2 MB	3.1 MB	×3
⚡	Memory	`test_to_lp[nodal_balance-severity=0]`	385.3 KB	135.3 KB	×2.8
⚡	Memory	`test_build[nodal_balance-severity=100]`	32 MB	12.8 MB	×2.5
⚡	Memory	`test_build[nodal_balance-severity=50]`	16.8 MB	7 MB	×2.4
⚡	Memory	`test_to_solver[highs-nodal_balance-severity=100]`	24.9 MB	13.3 MB	+87.47%
⚡	Memory	`test_to_solver[gurobi-nodal_balance-severity=100]`	25.1 MB	13.5 MB	+86.1%
⚡	Memory	`test_to_solver[highs-nodal_balance-severity=50]`	12.9 MB	7.1 MB	+81.68%
⚡	Memory	`test_to_solver[gurobi-nodal_balance-severity=50]`	13.1 MB	7.3 MB	+79.32%
⚡	Memory	`test_build[nodal_balance-severity=0]`	1.4 MB	1.2 MB	+19.65%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing fluxopt:perf/groupby-sum-scatter-upstream (81d7c23) with master (fe798b1)}

138 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

Add a test for grouping over an empty group dimension, which the scatter fast path handles cleanly but the unstack fallback cannot. Trim comments that duplicated the helper docstrings.

FabianHofmann

@FBumann this is super strong. I added release notes and cleaned up some comments. feel free to merge.

coroa

Cool, need to wrap my head around it

coroa · 2026-06-30T11:59:23Z

+        scattered into preallocated numpy arrays) and no coordinates tied to
+        the grouped dimension besides its own index. Everything else falls


i don't understand the second condition. What means no coordinates tied to the grouped dimension besides its own index?

ie. can we please improve the docstring

My thinking is, that if the second condition does basically not happen, then we could wrap _can_sum_by_scatter into an xr.apply_ufunc, similar to https://github.com/PyPSA/atlite/blob/966380825089e5f913eb3dd45141abbff4fc075d/atlite/gis.py#L1077-L1089 , and get rid of the groupby path entirely.

coroa · 2026-06-30T14:34:51Z

Alternatively, we might think about deprecating the dask layer.

FBumann · 2026-06-30T16:38:39Z

As im working on the v1 convention and the multiindex stuff, this might even become obsolete indirection. I need to think about it

Relax the groupby-sum scatter gate to a pure numpy/dask check: auxiliary coordinates on the grouped dimension no longer force the slow unstack path. Summing over groups collapses that dimension, so both kernels drop every coordinate tied to it — the scatter result is identical, just cheaper. The unstack kernel now serves only chunked (dask) data, and a debug log records when that fallback is taken. Inline the now-trivial predicate into the dispatch and consolidate the kernel tests into a TestGroupbySumScatterKernel class: a one-line case table over a shared fixture, with added coverage for combined structures, auxiliary coords, and a MultiIndex grouped dimension. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

FBumann and others added 3 commits June 29, 2026 12:57

Merge branch 'master' into perf/groupby-sum-scatter-upstream

34d1332

test: cover empty group dim in scatter groupby-sum

7f3edea

Add a test for grouping over an empty group dimension, which the scatter fast path handles cleanly but the unstack fallback cannot. Trim comments that duplicated the helper docstrings.

docs: add release note for scatter groupby-sum

7598180

FabianHofmann force-pushed the perf/groupby-sum-scatter-upstream branch from d5d1c5a to 7598180 Compare June 30, 2026 11:56

FabianHofmann approved these changes Jun 30, 2026

View reviewed changes

coroa reviewed Jun 30, 2026

View reviewed changes

FBumann mentioned this pull request Jun 30, 2026

perf(groupby): unify scatter kernel over numpy and dask via apply_ufunc #802

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: scatter groupby-sum terms directly instead of unstacking#793

perf: scatter groupby-sum terms directly instead of unstacking#793
FBumann wants to merge 5 commits into
PyPSA:masterfrom
fluxopt:perf/groupby-sum-scatter-upstream

FBumann commented Jun 29, 2026

Uh oh!

codspeed-hq Bot commented Jun 29, 2026 •

edited

Loading

Uh oh!

FabianHofmann left a comment

Uh oh!

coroa left a comment

Uh oh!

coroa Jun 30, 2026

Uh oh!

coroa Jun 30, 2026

Uh oh!

coroa Jun 30, 2026

Uh oh!

coroa commented Jun 30, 2026

Uh oh!

FBumann commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		scattered into preallocated numpy arrays) and no coordinates tied to
		the grouped dimension besides its own index. Everything else falls

Uh oh!

Conversation

FBumann commented Jun 29, 2026

What this does

Notes

Uh oh!

codspeed-hq Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will improve performance by ×2.1

Performance Changes

Footnotes

Uh oh!

FabianHofmann left a comment

Choose a reason for hiding this comment

Uh oh!

coroa left a comment

Choose a reason for hiding this comment

Uh oh!

coroa Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

coroa Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

coroa Jun 30, 2026

Choose a reason for hiding this comment

Uh oh!

coroa commented Jun 30, 2026

Uh oh!

FBumann commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codspeed-hq Bot commented Jun 29, 2026 •

edited

Loading