Skip to content

Add merge_networks (closes #221)#507

Open
dotsdl wants to merge 25 commits into
mainfrom
an_merge
Open

Add merge_networks (closes #221)#507
dotsdl wants to merge 25 commits into
mainfrom
an_merge

Conversation

@dotsdl

@dotsdl dotsdl commented Jun 8, 2026

Copy link
Copy Markdown
Member

Summary

Revives the work originally proposed in #433 (now unreachable since the contributing fork was deleted) and finishes it.

  • Adds Neo4jStore.merge_networks for combining multiple existing AlchemicalNetwork\s into a new AlchemicalNetwork in a target Scope, with completed and errored Task\s (and their ProtocolDAGResultRef\s) cloned into the new scope so previously-computed results do not need to be re-run. (Originally authored by @ianmkenney; preserved here as-is, then refactored — see commits.)
  • Adds the user-facing surface:
    • POST /networks/merge on AlchemiscaleAPI, which validates the destination Scope and each source network's Scope against the caller's token before delegating to the store.
    • AlchemiscaleClient.merge_networks(networks, name, scope), with client-side guards for wildcard scopes, empty input, and non-AlchemicalNetwork ScopedKey\s.
  • Refactors the database-layer implementation to use KeyedChain.decode_subchains (gufe ≥1.8.0), which replaces a hand-rolled kc_to_gufe helper plus manual chain-slicing loop, and lifts the inline TransformationData dataclass to a private module-level _TransformationData for readability.
  • Bumps the gufe pin to >=1.8.0 in devtools/conda-envs/test.yml and devtools/conda-envs/alchemiscale-server.yml (the only env files that actually exercise this code path).
  • Adds integration tests at both the API and client layers covering: happy-path edge union, scope-based authorization (bad destination and bad source), and full preservation of complete/error Task\s with their ProtocolDAGResultRef\s — including a reachability check via get_network_tasks(merged_sk).

Closes #221.

Notes for reviewers

  • The store-level merge_networks body is unchanged in behavior from the original PR; the diff against Allow server-side copying and merging of AlchemicalNetworks #433's last revision is the decode_subchains refactor plus the lift of TransformationData.
  • The get_network_tasks assertion in test_merge_networks_preserves_tasks_and_results exercises the standard (:AlchemicalNetwork)-[:DEPENDS_ON]->(:Transformation)<-[:PERFORMS]-(:Task) traversal on the merged network. If it fails, that indicates a missing PERFORMS edge in _TransformationData.to_subgraph — flagged here for follow-up rather than fixed in this PR, since the original implementation has the same behavior.
  • subchain_cache is preserved with cleaner contents (KeyedChain.from_gufe(transformation)) but its necessity is an open question tied to the PERFORMS follow-up above.

Test plan

  • pytest -n auto -v alchemiscale/tests/integration/storage/test_statestore.py -k merge_networks
  • pytest -n auto -v alchemiscale/tests/integration/interface/test_api.py -k merge_networks
  • pytest -n auto -v alchemiscale/tests/integration/interface/client/test_client.py -k merge_networks
  • black --check --diff alchemiscale
  • mypy alchemiscale
  • Confirm whether test_merge_networks_preserves_tasks_and_results reveals the suspected missing PERFORMS wiring; open a follow-up if so.

🤖 Generated with Claude Code

ianmkenney and others added 23 commits September 16, 2025 11:57
Networks are merged into a new network under a new scope with a given
name. Task and ProtocolDAGResultRef support is not implemented here.
There is hardly any difference in performance.
Add the client + API surface for the merge_networks database operation
introduced in this branch. Users can now combine multiple existing
AlchemicalNetworks into a single new AlchemicalNetwork through
AlchemiscaleClient, with completed and errored Tasks (and their
ProtocolDAGResultRefs) carried over so previously-computed results do
not need to be re-run.

- POST /networks/merge endpoint validates the destination Scope and
  each source network's Scope against the caller's token, then defers
  to Neo4jStore.merge_networks.
- AlchemiscaleClient.merge_networks wraps the endpoint and guards
  against wildcard Scopes, empty input, and non-AlchemicalNetwork
  ScopedKeys client-side.
- Integration tests cover the happy path, scope authorization (both
  bad destination and bad source), and that Tasks + PDRRs in
  complete/error state are cloned into the new scope and reachable
  through get_network_tasks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the inline kc_to_gufe helper and manual chain-slicing loop with
a call to KeyedChain.decode_subchains, paired with a list comprehension
that captures the original database GufeKeys in the same order. The
decode_subchains helper (gufe >=1.8.0) reuses a shared tokenizable_map
across yields, so common dependencies in a source network are decoded
only once.

Lift the previously nested TransformationData dataclass to a private
module-level _TransformationData so merge_networks reads top-to-bottom
without chasing an inline class definition, and so the helper can be
referenced by future tests in isolation.

The dropped key_decode_dependencies import goes with kc_to_gufe.

subchain_cache is preserved with the same shape but now populated via
KeyedChain.from_gufe(transformation); whether the cache (and the
unconnected tf_node it feeds) is needed at all is a separate question
to revisit once the PERFORMS-wiring question is settled.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
merge_networks now uses KeyedChain.decode_subchains, which was added
in gufe 1.8.0. Bump the pin in:

- devtools/conda-envs/test.yml (was =1.7.1) so the integration tests
  exercise the new code path.
- devtools/conda-envs/alchemiscale-server.yml (was =1.6.1) so server
  deployments running merge_networks pick up the required gufe API.

Client and compute env files are left alone since neither imports the
new code path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Code review revealed that cloned Tasks on the merged network were
written but never wired back to their Transformations, leaving every
PERFORMS-based traversal (get_network_tasks, get_task_transformation,
set_tasks_status, etc.) blind to them. The new
test_merge_networks_preserves_tasks_and_results check via
get_network_tasks asserts this exact thing and would have failed in
CI.

Changes
-------

- Add the PERFORMS edge in _TransformationData.to_subgraph so cloned
  Tasks are reachable from the standard
  (:AlchemicalNetwork)-[:DEPENDS_ON]->(:Transformation)<-[:PERFORMS]-(:Task)
  traversal. The pre-existing tf_node construction is now meaningful,
  resolving the open question about subchain_cache.

- Add a `state` parameter to Neo4jStore.merge_networks,
  /networks/merge, and AlchemiscaleClient.merge_networks, bringing
  merge_networks to feature parity with assemble_network /
  create_network. Defaults to NetworkStateEnum.active.

- Document the "cloned Tasks are not actioned to the merged TaskHub"
  semantics in the store-level and client docstrings, including the
  remediation path (call action_tasks after merging).

- Replace the O(N^2) `transformation_data.index(transformation)`
  dedup with a `dict[GufeKey, _TransformationData]` keyed on
  transformation.key, avoiding repeated GufeTokenizable equality
  checks on large networks.

- Improve the missing-ScopedKey error to collect and report all
  missing source SKs in a single ValueError, rather than only the
  one that first triggered KeyError.

- Add a chain-order contract comment at the
  zip(database_keys, transformations) site pointing at the
  decode_subchains yield-order guarantee.

- Make /networks/merge an `async def` endpoint, matching the style
  of the neighboring create_network endpoint.

- Add a parametrized client test (state in {active, inactive}) that
  exercises the new state parameter end-to-end.

- Drop the unused network_tyk2 fixture from the two API-level
  authorization tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dotsdl

dotsdl commented Jun 8, 2026

Copy link
Copy Markdown
Member Author

Thanks for the careful review. Pushed d0bf4ec addressing the feedback.

Blocking

  • User story: Send a single simulation to F@H #1 Missing PERFORMS edge — fixed in _TransformationData.to_subgraph. The cloned Task is now wired to tf_node with a PERFORMS relationship in the same iteration that creates it. This also retires the open question about subchain_cache: tf_node is now meaningful as the PERFORMS anchor, so the cache stays.
  • [user story] Retrospective analysis of alchemical decisions #2 Cloned tasks not actioned to the new TaskHub — documented as intentional in both the store-level docstring and the client docstring, with the remediation path spelled out (call action_tasks against the merged network's TaskHub after the merge if you want errored tasks back in the queue). Holding behavior change for a follow-up.

Should-fix

Nits picked up

  • /networks/merge is now async def for consistency with create_network.
  • Improved the KeyError handler to collect all missing source ScopedKeys before raising, instead of reporting only the first failure.
  • Added a one-line comment at the zip(database_keys, transformations) site pointing at decode_subchains's yield-order guarantee.
  • Dropped the unused network_tyk2 fixture from test_merge_networks_bad_scope and test_merge_networks_bad_source_scope.

Deferred

  • update_task_trees staticmethod → module-level helper: skipping for this PR to keep the diff focused. Easy follow-up.
  • Stray f"..." strings without placeholders in test_statestore.py: these are in the original contributor's code; cleaning them up here would muddy the diff. Happy to do it in a follow-up cosmetic PR.

Sanity check the reviewer asked for

Static analysis confirms the PERFORMS edge is now emitted inside the per-task loop with the correct direction (TaskTransformation) and scope properties. Will report back once I've run pytest -n auto alchemiscale/tests/integration -k merge_networks in an env with grolt.

dotsdl and others added 2 commits June 8, 2026 18:07
merge_networks now depends on KeyedChain.decode_subchains (gufe 1.8.0).
Pin all five env files to an exact 1.8.0 release, matching the
repo's convention of pinning gufe exactly rather than with a lower
bound. Touches the test, server, compute, client, and docs envs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add AlchemiscaleClient.copy_network, .merge_networks, and .merge_scopes methods

2 participants