Skip to content

Adapt to plumpy's greenback async bridge #7206

Merged
khsrali merged 3 commits into
aiidateam:mainfrom
khsrali:greenback
Feb 27, 2026
Merged

Adapt to plumpy's greenback async bridge #7206
khsrali merged 3 commits into
aiidateam:mainfrom
khsrali:greenback

Conversation

@khsrali
Copy link
Copy Markdown
Collaborator

@khsrali khsrali commented Feb 6, 2026

Depends on aiidateam/plumpy#332
Alternative to #7188

Background

AiiDA's engine has a fundamental architectural tension: user-facing entry points (engine.run(), engine.submit(), calcfunctions, workfunctions) are synchronous, but the engine internals that drive them — the state machine, the daemon, the runner — are async. Whenever synchronous process code needs to call an async operation (running a nested process, polling a scheduler, performing a transport operation), it has historically called loop.run_until_complete(). This fails when the loop is already running (inside the daemon, inside Jupyter, etc.) with RuntimeError: This event loop is already running.

Until now, nest_asyncio solved this by monkey-patching asyncio internals to allow re-entrant run_until_complete() calls. While convenient, this approach has serious drawbacks:

  • Debugging is nearly impossible — call stacks through nested loops were very difficult
  • Maximum recursion depth — requires workarounds for deeply nested process hierarchies
  • Unmaintained
  • Encourages bad practice — making the loop globally re-entrant means any code can nest loops arbitrarily, with no explicit boundary between sync and async contexts

What this PR does

This PR adapts aiida-core to use plumpy's new greenback-based async bridge (plumpy#332), replacing nest_asyncio entirely. The changes are organized in three commits:

Commit 1: Replace nest_asyncio with greenback support

All internal call sites (Runner.run_until_complete, FunctionProcess, WorkChain.run, TransportQueue, AsyncTransport) are updated to use plumpy's run_until_complete and run_with_portal helpers instead of direct loop.run_until_complete() calls. This requires plumpy~=0.26.0 which ships the greenback-based portal utilities.

The bridge in plumpy provides three functions:

# plumpy/greenback_bridge.py
def run_until_complete(loop, coro):
    if loop.is_running():             # portal active
        return greenback.await_(coro) # greenlet shim
    else:                             # loop idle
        return loop.run_until_complete(coro)  # native

async def ensure_portal():
    await greenback.ensure_portal()

def await_(awaitable):
    return greenback.await_(awaitable)

The bridge is the single decision point: if the loop is already running (daemon, Jupyter), it uses greenback.await_() through the greenlet portal. If the loop is idle (CLI invocation), it falls back to native loop.run_until_complete(). No monkey-patching anywhere.

Transport/scheduler portal:
ensure_portal() is called in transport's request_transport context manager, because when a CalcJob polls the scheduler, the update is scheduled as a new asyncio task (via call_later). This new task doesn't inherit the portal from the original execute() call, and scheduler.get_jobs() is sync code that internally needs run_until_complete(). Once the scheduler interface is fully async, this can be removed (see #7222).

Jupyter notebook support:
Jupyter kernels have a permanently running event loop, so run_until_complete() can never be called natively. The bridge needs a greenback portal to be active. Since ensure_portal() is async, we can't call it from synchronous user code in a cell. Instead, when load_profile() detects an IPython kernel environment, it patches kernel.do_execute to call await ensure_portal() before each cell execution. This patch activates on the next cell after load_profile(), requiring users to call it in a separate cell before running any AiiDA engine processes — because the cell that installs the patch has already had its own do_execute called without the portal. For non-notebook contexts (CLI, daemon, scripts), no patching occurs.

Documentation is updated with notebook instructions, and integration tests using nbclient verify notebook workflows across same-cell, separate-cell, and magic-cell scenarios. A nbstripout pre-commit hook is added to keep notebook test fixtures clean.

Commit 2: Replace deprecated asyncio.get_event_loop() with plumpy.get_or_create_event_loop()

Separately from the re-entrancy problem, AiiDA passes the event loop reference throughout the codebase — the runner, the daemon, and the communicator all hold a reference. On Python 3.12+, asyncio.get_event_loop() raises a DeprecationWarning when no loop is running, and may return a different loop object (e.g. after Python creates a fresh one in a new thread context), causing callbacks to be scheduled on the wrong loop.

This commit replaces all asyncio.get_event_loop() calls with plumpy.get_or_create_event_loop(), which consistently returns the same cached loop instance. The set/reset_event_loop_policy() calls are removed from Runner, as the new helper handles loop creation internally. An autouse _reset_runner fixture is added to tests/conftest.py to ensure a clean runner between tests.

This is orthogonal to the greenback migration — even with greenback, AiiDA still needs a stable loop reference.
But with this commit we make our code base fully compatible with python 3.14. (see also aiidateam/plumpy#336)

Commit 3: Bump disk-objectstore to ~=1.5.0

To pick up aiidateam/disk-objectstore#205, which adds no-op close() and flush() methods to PackedObjectReader, CallbackStreamWrapper, and ZlibLikeBaseStreamDecompresser.

This fixes flaky test failures discovered in this PR, where the async changes result in different test ordering (or timing), causing disk-objectstore to pack objects before certain tests read from the repository. When objects are packed (rather than loose), the returned stream is a PackedObjectReader — and in aiida-core, TextIOWrapper is used as a context manager whose __exit__ calls close(), which propagates to the underlying stream. Without close() on packed readers, this raises AttributeError: 'PackedObjectReader' object has no attribute 'close'.

The methods are intentionally no-ops because these readers don't own the underlying file handles they read from. The bump also removes now-unnecessary type: ignore comments on disk-objectstore's callback and stream APIs.

How to test

  • All existing tests pass (the bridge is transparent to the test suite)
  • Added notebook integration tests to verify Jupyter support (recommended as nightly due to startup overhead)

@khsrali khsrali marked this pull request as ready for review February 6, 2026 09:43
@khsrali khsrali removed the request for review from unkcpz February 6, 2026 09:43
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 6, 2026

Codecov Report

❌ Patch coverage is 91.83673% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.70%. Comparing base (af9d1f6) to head (d7f8825).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
src/aiida/manage/manager.py 86.96% 3 Missing ⚠️
src/aiida/manage/tests/pytest_fixtures.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7206      +/-   ##
==========================================
- Coverage   79.70%   79.70%   -0.00%     
==========================================
  Files         565      565              
  Lines       43836    43867      +31     
==========================================
+ Hits        34936    34959      +23     
- Misses       8900     8908       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@khsrali khsrali force-pushed the greenback branch 3 times, most recently from d31ac38 to 61aa092 Compare February 12, 2026 17:34
Copy link
Copy Markdown
Collaborator

@agoscinski agoscinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integration tests are typically in a separate tests directory since they are not unit tests. A separate tests directory however would require to adapt all the GH workflows which is a bit overkill. I would go for a directory tests/integration/notebook to separate it from the manager unit tests. My experience with jupyter notebook tests is that starting each notebook is quite slow, so I would put them as nightly.


# NOTE: We need to ensure the portal here only because
# our scheduler has only a sync interface and _get_jobs_from_scheduler is using that
# if we ever provide a fully async scheduler interface then we can remove this here
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand. We need to call ensure_portal, when we switch to a different task than the task in execute (because this one has an open portal), and when we require a nested sync->async call. So in scheduler we have such a case? But why do we need to open it in transport then? Aren't the two classes decoupled?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So when a CalcJob polls the scheduler, the update gets scheduled as a new asyncio task (via this call_later). And this new task doesn't inherit the portal that the original process had from execute(). The main problem is that scheduler.get_jobs() is sync, and internally it calls transport.exec_command_wait(). For which it uses run_until_complete().

Once scheduler interface is async, as well we can get rid of the ensure portal here

@khsrali khsrali force-pushed the greenback branch 2 times, most recently from 7134d55 to b749429 Compare February 17, 2026 10:02
Comment thread src/aiida/manage/manager.py Outdated
Comment on lines +152 to +153
def _install_greenback_portal(self) -> None:
"""Register an IPython input transformer that ensures a portal.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docstring and name is a bit inconsistent. Something like

Suggested change
def _install_greenback_portal(self) -> None:
"""Register an IPython input transformer that ensures a portal.
def _setup_event_loop_in_ipython_environment(self) -> None:
"""Setups the event loop in an IPython kernel environment.

The hacks we do only work for ipython and the the register docstring is more for the_register_portal_transformer function

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread src/aiida/manage/manager.py Outdated

When running inside an environment with an already-running event loop
(e.g. a Jupyter notebook kernel), this patches the kernel's
``do_execute`` so that ``await greenback.ensure_portal()`` is called
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would just make it greenback agonistic

Suggested change
``do_execute`` so that ``await greenback.ensure_portal()`` is called
``do_execute`` so that before cell execution aportal is opened to switch between tasks

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread docs/source/tutorials/basic.md Outdated
:::

:::{important}
If you are running this tutorial in a Jupyter notebook, make sure to call `load_profile()` in a **separate cell** before executing any AiiDA engine processes (e.g. calculation functions or work chains).
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If you are running this tutorial in a Jupyter notebook, make sure to call `load_profile()` in a **separate cell** before executing any AiiDA engine processes (e.g. calculation functions or work chains).
If you are running this tutorial in a Jupyter notebook, make sure to call `load_profile()` in a **separate cell** before running any AiiDA engine processes (e.g. calculation functions or work chains).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Collaborator

@agoscinski agoscinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need to add a section in the howto docs/source/howto/interact.rst
https://aiida.readthedocs.io/projects/aiida-core/en/stable/howto/interact.html#how-to-interact-notebook

@khsrali khsrali force-pushed the greenback branch 2 times, most recently from 5e6383a to 541d6b3 Compare February 24, 2026 17:09
@khsrali khsrali mentioned this pull request Feb 24, 2026
@khsrali khsrali force-pushed the greenback branch 3 times, most recently from 36f17f5 to 7e1d965 Compare February 24, 2026 19:08
@khsrali khsrali marked this pull request as draft February 25, 2026 10:55
@khsrali khsrali marked this pull request as ready for review February 25, 2026 16:17
Comment thread environment.yml
Comment thread pyproject.toml
@khsrali
Copy link
Copy Markdown
Collaborator Author

khsrali commented Feb 27, 2026

an issue is opened on plumpy to improve the error message:
aiidateam/plumpy#337

@khsrali khsrali force-pushed the greenback branch 2 times, most recently from 4e4eee9 to f464198 Compare February 27, 2026 09:21
@khsrali khsrali requested a review from agoscinski February 27, 2026 09:59
@khsrali
Copy link
Copy Markdown
Collaborator Author

khsrali commented Feb 27, 2026

@agoscinski do we dare?

Copy link
Copy Markdown
Collaborator

@agoscinski agoscinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one thing, the disk-objectstore version constraint in the pyproject.toml sneaked into the second commit 6cbd336 and not the first one. I ran the current version of the production tests (PR https://github.com/aiidateam/aiida-production-tests/pull/1) and they ran through without any issue or any significant changes in performance.

Comment thread tests/conftest.py
@agoscinski
Copy link
Copy Markdown
Collaborator

agoscinski commented Feb 27, 2026

Gantt chart from production test suite 1 (plot below is this branch). `
gantt_psql_1024cj_8wc
gantt_psql_1024cj_8wc

@khsrali khsrali force-pushed the greenback branch 2 times, most recently from 408307c to e47d673 Compare February 27, 2026 15:17
khsrali and others added 3 commits February 27, 2026 17:21
All internal call sites (Runner.run_until_complete, FunctionProcess,
WorkChain.run, TransportQueue, AsyncTransport) are updated to use plumpy's
run_until_complete and run_with_portal helpers instead of direct
loop.run_until_complete() calls. This requires plumpy~=0.26.0
which ship the greenback-based portal utilities.

The key challenge that we had to solve is that AiiDA's engine is async,
but users interact with it through synchronous entry points.
Outside notebooks this works well via loop.run_until_complete(),
but Jupyter kernels already run an event loop,
making nested run_until_complete() calls impossible. Previously this was
solved with nest_asyncio.

After replacement, IPythonKernel.do_execute is monkey-patched
at profile load time to call ensure_portal() before each cell execution,
establishing the greenback context that synchronous AiiDA calls rely on.
This patch activates on the next cell after load_profile(), requiring users
to call it in a separate cell. For non-notebook contexts (CLI, daemon,
scripts), no patching occurs since there is no running event loop.

Documentation is updated with instructions for running engine processes in
notebooks, including the separate-cell requirement for load_profile().
A nbstripout pre-commit hook is added to keep notebook test fixtures clean.
Integration tests using nbclient verify notebook workflows across
same-cell, separate-cell, and magic-cell scenarios.

Co-Authored-By: Alexander Goscinski <alex.goscinski@posteo.de>
…_event_loop() (aiidateam#7206)

Use plumpy's get_or_create_event_loop() throughout the engine and tests to
avoid DeprecationWarnings on Python 3.12+ where asyncio.get_event_loop()
raises when no loop is running. Remove the set/reset_event_loop_policy()
calls from Runner, as the new helper handles loop creation internally.

Add an autouse _reset_runner fixture in tests/conftest.py to ensure a clean
runner between tests.

Co-Authored-By: Alexander Goscinski <alex.goscinski@posteo.de>
Bump disk-objectstore~=1.5.0 and remove now-unnecessary
type: ignore comments on its callback and stream APIs.

Co-Authored-By: Alexander Goscinski <alex.goscinski@posteo.de>
@khsrali khsrali merged commit e3217c0 into aiidateam:main Feb 27, 2026
24 of 25 checks passed
khsrali added a commit that referenced this pull request Feb 27, 2026
All internal call sites (Runner.run_until_complete, FunctionProcess,
WorkChain.run, TransportQueue, AsyncTransport) are updated to use plumpy's
run_until_complete and run_with_portal helpers instead of direct
loop.run_until_complete() calls. This requires plumpy~=0.26.0
which ship the greenback-based portal utilities.

The key challenge that we had to solve is that AiiDA's engine is async,
but users interact with it through synchronous entry points.
Outside notebooks this works well via loop.run_until_complete(),
but Jupyter kernels already run an event loop,
making nested run_until_complete() calls impossible. Previously this was
solved with nest_asyncio.

After replacement, IPythonKernel.do_execute is monkey-patched
at profile load time to call ensure_portal() before each cell execution,
establishing the greenback context that synchronous AiiDA calls rely on.
This patch activates on the next cell after load_profile(), requiring users
to call it in a separate cell. For non-notebook contexts (CLI, daemon,
scripts), no patching occurs since there is no running event loop.

Documentation is updated with instructions for running engine processes in
notebooks, including the separate-cell requirement for load_profile().
A nbstripout pre-commit hook is added to keep notebook test fixtures clean.
Integration tests using nbclient verify notebook workflows across
same-cell, separate-cell, and magic-cell scenarios.

Co-Authored-By: Alexander Goscinski <alex.goscinski@posteo.de>
khsrali added a commit that referenced this pull request Feb 27, 2026
…_event_loop() (#7206)

Use plumpy's get_or_create_event_loop() throughout the engine and tests to
avoid DeprecationWarnings on Python 3.12+ where asyncio.get_event_loop()
raises when no loop is running. Remove the set/reset_event_loop_policy()
calls from Runner, as the new helper handles loop creation internally.

Add an autouse _reset_runner fixture in tests/conftest.py to ensure a clean
runner between tests.

Co-Authored-By: Alexander Goscinski <alex.goscinski@posteo.de>

.. important::

``load_profile()`` must be called in a **separate cell** before any AiiDA engine processes can be executed.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@khsrali is this a new requirement or has this been the case even before?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's new..
We could really not find a better way to do this.
The thing is notebooks have their own running event loop. Since nest_asyncio is dropped, the only way for us to use their loop (since you can have only one running event loop in) is to open a greenback portal.
And that has to be called when the loop has started but before engine calls.
The most practical place to stuff this logic in, was in load_profile.
However, there's a technical issue with that: the greenback portals are only usable when you are back in a the async context. Basically that means either we had to changes it to something like await load_profile_async() --which defies the efforts of aiida to not expose async syntax to users-- Or to register that call on each cell execution. After many brainstorming we decided to go with the second solution.
The interface remains the same load_profile() but greenback portals become useful from the next execution cell. A minimum "backward incompatible" price that we'll had to pay

@danielhollas
Copy link
Copy Markdown
Collaborator

danielhollas commented Feb 28, 2026 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants