Add merge_networks (closes #221) by dotsdl · Pull Request #507 · OpenFreeEnergy/alchemiscale

dotsdl · 2026-06-08T23:20:53Z

Summary

Revives the work originally proposed in #433 (now unreachable since the contributing fork was deleted) and finishes it.

Adds Neo4jStore.merge_networks for combining multiple existing AlchemicalNetwork\s into a new AlchemicalNetwork in a target Scope, with completed and errored Task\s (and their ProtocolDAGResultRef\s) cloned into the new scope so previously-computed results do not need to be re-run. (Originally authored by @ianmkenney; preserved here as-is, then refactored — see commits.)
Adds the user-facing surface:
- POST /networks/merge on AlchemiscaleAPI, which validates the destination Scope and each source network's Scope against the caller's token before delegating to the store.
- AlchemiscaleClient.merge_networks(networks, name, scope), with client-side guards for wildcard scopes, empty input, and non-AlchemicalNetwork ScopedKey\s.
Refactors the database-layer implementation to use KeyedChain.decode_subchains (gufe ≥1.8.0), which replaces a hand-rolled kc_to_gufe helper plus manual chain-slicing loop, and lifts the inline TransformationData dataclass to a private module-level _TransformationData for readability.
Bumps the gufe pin to >=1.8.0 in devtools/conda-envs/test.yml and devtools/conda-envs/alchemiscale-server.yml (the only env files that actually exercise this code path).
Adds integration tests at both the API and client layers covering: happy-path edge union, scope-based authorization (bad destination and bad source), and full preservation of complete/error Task\s with their ProtocolDAGResultRef\s — including a reachability check via get_network_tasks(merged_sk).

Closes #221.

Notes for reviewers

The store-level merge_networks body is unchanged in behavior from the original PR; the diff against Allow server-side copying and merging of AlchemicalNetworks #433's last revision is the decode_subchains refactor plus the lift of TransformationData.
The get_network_tasks assertion in test_merge_networks_preserves_tasks_and_results exercises the standard (:AlchemicalNetwork)-[:DEPENDS_ON]->(:Transformation)<-[:PERFORMS]-(:Task) traversal on the merged network. If it fails, that indicates a missing PERFORMS edge in _TransformationData.to_subgraph — flagged here for follow-up rather than fixed in this PR, since the original implementation has the same behavior.
subchain_cache is preserved with cleaner contents (KeyedChain.from_gufe(transformation)) but its necessity is an open question tied to the PERFORMS follow-up above.

Test plan

pytest -n auto -v alchemiscale/tests/integration/storage/test_statestore.py -k merge_networks
pytest -n auto -v alchemiscale/tests/integration/interface/test_api.py -k merge_networks
pytest -n auto -v alchemiscale/tests/integration/interface/client/test_client.py -k merge_networks
black --check --diff alchemiscale
mypy alchemiscale
Confirm whether test_merge_networks_preserves_tasks_and_results reveals the suspected missing PERFORMS wiring; open a follow-up if so.

🤖 Generated with Claude Code

Networks are merged into a new network under a new scope with a given name. Task and ProtocolDAGResultRef support is not implemented here.

…increase

There is hardly any difference in performance.

Add the client + API surface for the merge_networks database operation introduced in this branch. Users can now combine multiple existing AlchemicalNetworks into a single new AlchemicalNetwork through AlchemiscaleClient, with completed and errored Tasks (and their ProtocolDAGResultRefs) carried over so previously-computed results do not need to be re-run. - POST /networks/merge endpoint validates the destination Scope and each source network's Scope against the caller's token, then defers to Neo4jStore.merge_networks. - AlchemiscaleClient.merge_networks wraps the endpoint and guards against wildcard Scopes, empty input, and non-AlchemicalNetwork ScopedKeys client-side. - Integration tests cover the happy path, scope authorization (both bad destination and bad source), and that Tasks + PDRRs in complete/error state are cloned into the new scope and reachable through get_network_tasks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replace the inline kc_to_gufe helper and manual chain-slicing loop with a call to KeyedChain.decode_subchains, paired with a list comprehension that captures the original database GufeKeys in the same order. The decode_subchains helper (gufe >=1.8.0) reuses a shared tokenizable_map across yields, so common dependencies in a source network are decoded only once. Lift the previously nested TransformationData dataclass to a private module-level _TransformationData so merge_networks reads top-to-bottom without chasing an inline class definition, and so the helper can be referenced by future tests in isolation. The dropped key_decode_dependencies import goes with kc_to_gufe. subchain_cache is preserved with the same shape but now populated via KeyedChain.from_gufe(transformation); whether the cache (and the unconnected tf_node it feeds) is needed at all is a separate question to revisit once the PERFORMS-wiring question is settled. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

merge_networks now uses KeyedChain.decode_subchains, which was added in gufe 1.8.0. Bump the pin in: - devtools/conda-envs/test.yml (was =1.7.1) so the integration tests exercise the new code path. - devtools/conda-envs/alchemiscale-server.yml (was =1.6.1) so server deployments running merge_networks pick up the required gufe API. Client and compute env files are left alone since neither imports the new code path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Code review revealed that cloned Tasks on the merged network were written but never wired back to their Transformations, leaving every PERFORMS-based traversal (get_network_tasks, get_task_transformation, set_tasks_status, etc.) blind to them. The new test_merge_networks_preserves_tasks_and_results check via get_network_tasks asserts this exact thing and would have failed in CI. Changes ------- - Add the PERFORMS edge in _TransformationData.to_subgraph so cloned Tasks are reachable from the standard (:AlchemicalNetwork)-[:DEPENDS_ON]->(:Transformation)<-[:PERFORMS]-(:Task) traversal. The pre-existing tf_node construction is now meaningful, resolving the open question about subchain_cache. - Add a `state` parameter to Neo4jStore.merge_networks, /networks/merge, and AlchemiscaleClient.merge_networks, bringing merge_networks to feature parity with assemble_network / create_network. Defaults to NetworkStateEnum.active. - Document the "cloned Tasks are not actioned to the merged TaskHub" semantics in the store-level and client docstrings, including the remediation path (call action_tasks after merging). - Replace the O(N^2) `transformation_data.index(transformation)` dedup with a `dict[GufeKey, _TransformationData]` keyed on transformation.key, avoiding repeated GufeTokenizable equality checks on large networks. - Improve the missing-ScopedKey error to collect and report all missing source SKs in a single ValueError, rather than only the one that first triggered KeyError. - Add a chain-order contract comment at the zip(database_keys, transformations) site pointing at the decode_subchains yield-order guarantee. - Make /networks/merge an `async def` endpoint, matching the style of the neighboring create_network endpoint. - Add a parametrized client test (state in {active, inactive}) that exercises the new state parameter end-to-end. - Drop the unused network_tyk2 fixture from the two API-level authorization tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

dotsdl · 2026-06-08T23:43:19Z

Thanks for the careful review. Pushed d0bf4ec addressing the feedback.

Blocking

User story: Send a single simulation to F@H #1 Missing PERFORMS edge — fixed in _TransformationData.to_subgraph. The cloned Task is now wired to tf_node with a PERFORMS relationship in the same iteration that creates it. This also retires the open question about subchain_cache: tf_node is now meaningful as the PERFORMS anchor, so the cache stays.
[user story] Retrospective analysis of alchemical decisions #2 Cloned tasks not actioned to the new TaskHub — documented as intentional in both the store-level docstring and the client docstring, with the remediation path spelled out (call action_tasks against the merged network's TaskHub after the merge if you want errored tasks back in the queue). Holding behavior change for a follow-up.

Should-fix

Alternate methods of calculating free energies on GROMACS other than nonequilibrium methods #3 O(N²) dedup — switched to dict[GufeKey, _TransformationData] keyed on transformation.key. Insertion order is preserved by the dict itself; transformation_data.values() replaces the prior list in downstream iteration.
[user story] absolute binding free energies on a large scale #4 state parameter — threaded through Neo4jStore.merge_networks → POST /networks/merge → AlchemiscaleClient.merge_networks, defaulting to NetworkStateEnum.active. Added a parametrized client test test_merge_networks_respects_state over {active, inactive}.

Nits picked up

/networks/merge is now async def for consistency with create_network.
Improved the KeyError handler to collect all missing source ScopedKeys before raising, instead of reporting only the first failure.
Added a one-line comment at the zip(database_keys, transformations) site pointing at decode_subchains's yield-order guarantee.
Dropped the unused network_tyk2 fixture from test_merge_networks_bad_scope and test_merge_networks_bad_source_scope.

Deferred

update_task_trees staticmethod → module-level helper: skipping for this PR to keep the diff focused. Easy follow-up.
Stray f"..." strings without placeholders in test_statestore.py: these are in the original contributor's code; cleaning them up here would muddy the diff. Happy to do it in a follow-up cosmetic PR.

Sanity check the reviewer asked for

Static analysis confirms the PERFORMS edge is now emitted inside the per-task loop with the correct direction (Task → Transformation) and scope properties. Will report back once I've run pytest -n auto alchemiscale/tests/integration -k merge_networks in an env with grolt.

merge_networks now depends on KeyedChain.decode_subchains (gufe 1.8.0). Pin all five env files to an exact 1.8.0 release, matching the repo's convention of pinning gufe exactly rather than with a lower bound. Touches the test, server, compute, client, and docs envs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ianmkenney and others added 23 commits September 16, 2025 11:57

Add unimplemented merge_networks

1484de4

Make assemble_network chainable

40cb0ab

Partial implementation of merge_networks

30e7dd3

Networks are merged into a new network under a new scope with a given name. Task and ProtocolDAGResultRef support is not implemented here.

Merge remote-tracking branch 'origin/main' into an_merge

fc8656e

Make _validate_extends_tasks and create_tasks chainable

865a46f

Use Subgraphs for task (+PDRR) and network assembly

fb28a59

Update task trees with one query and cache subchains for performance …

5032295

…increase

Remove need for methods to be chainable

d4f3dda

Change confusing, but beneign clashing names

b9a527e

Merge directly into an_subgraph

96b527e

There is hardly any difference in performance.

Only create tasks that were complete

8f3b17f

Merge remote-tracking branch 'origin/main' into an_merge

349e05f

Avoid soft failure for setting tasks to complete/error

f566d4f

Also include error task ProtocolDAGResultRefs

1d8dfde

Add comments to merge_networks

f19985f

Merge remote-tracking branch 'origin/main' into an_merge

c6edcd4

Typo fix and formatting

9374b9b

Merge branch 'main' into an_merge

6b98cb0

Merge branch 'main' into an_merge

eb3c532

dotsdl and others added 2 commits June 8, 2026 18:07

Merge branch 'main' into an_merge

beaf68c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add merge_networks (closes #221)#507

Add merge_networks (closes #221)#507
dotsdl wants to merge 25 commits into
mainfrom
an_merge

dotsdl commented Jun 8, 2026 •

edited

Loading

Uh oh!

dotsdl commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dotsdl commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Notes for reviewers

Test plan

Uh oh!

dotsdl commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dotsdl commented Jun 8, 2026 •

edited

Loading