Skip to content

feat(python): add mutable EventData proxy bindings with lifetime safety#5571

Draft
EliMe5 wants to merge 3 commits into
acts-project:mainfrom
EliMe5:python-event-data-bindings-clean
Draft

feat(python): add mutable EventData proxy bindings with lifetime safety#5571
EliMe5 wants to merge 3 commits into
acts-project:mainfrom
EliMe5:python-event-data-bindings-clean

Conversation

@EliMe5

@EliMe5 EliMe5 commented Jun 10, 2026

Copy link
Copy Markdown

feat(python): add mutable EventData proxy bindings with lifetime safety

Add mutable Python bindings for SpacePointContainer2,
SeedContainer2, and measurement containers, together with lifetime-safe
proxy and iterator handling.

Python proxies currently hold raw pointers into their backing C++
containers. If the container is consumed by a std::unique_ptr<T>
transfer, for example through the whiteboard, the Python wrapper can
remain alive while the C++ object has been moved out. Accessing an
existing proxy then dereferences freed memory.

This PR introduces ProxyTether, a binding-side wrapper that keeps the
owning py::object and revalidates it before proxy access. If the
backing container has been disowned, access now raises ValueError
instead of causing undefined behaviour or a segfault.

Also guard optional SpacePointColumns access so missing columns raise
AttributeError instead of relying on C++ assertions.

--- END COMMIT MESSAGE ---

Changes

  • Add mutable SpacePointContainer2 / MutableSpacePointProxy2 bindings:

    • createSpacePoint()
    • __getitem__
    • __iter__
    • guarded read/write access for scalar and array columns
  • Add SeedContainer2 / MutableSeedProxy2 bindings:

    • createSeed()
    • assignSpacePointContainer()
    • assignSpacePointIndices()
    • read/write quality and vertexZ
  • Make MeasurementContainer / MeasurementSubset proxy access use tethered proxies.

  • Add checked iterator handling for example flat multimaps.

  • Add py::keep_alive to ConstTrackContainer proxy accessors.

ProxyTether

ProxyTether stores the Python owner object, the proxy value, and a type-erased owner-validity check.

Before each proxy access, the owner is re-cast to the backing C++ type. If pybind11's smart_holder has disowned the object after a unique_ptr<T> transfer, this fails and is translated into a Python ValueError.

This handles both relevant lifetime cases:

  • the ordinary Python GC path, because the owner py::object is retained;
  • the whiteboard/disown path, because the owner is revalidated before access.

Tests

Adds a test-only pybind11 module, _acts_core_test_bindings, with helpers that consume SpacePointContainer2 and SeedContainer2 via std::unique_ptr<T>. This reproduces the same disowning behaviour as the whiteboard without depending on acts.examples.

test_event_data.py now covers:

  • mutable space point proxy read/write,
  • all-columns round trip,
  • missing-column AttributeError,
  • seed container/proxy access,
  • fail-loud behaviour after disown for space point and seed proxies,
  • proxy survival after Python GC.

All added tests pass.

Known follow-up work

This PR focuses on proxies and iterators whose own backing container has been consumed/disowned. It does not fully solve separate cross-owner or buffer-lifetime cases, including:

  • sourceLinks sub-iterators whose underlying container is transferred after the iterator is created;
  • SeedContainer2 references to an assigned SpacePointContainer2 if future Python APIs expose dereferencing that relationship;
  • TrackProxy objects kept alive after makeConst() drains the mutable track container;
  • MeasurementSubset views after the original MeasurementContainer is independently transferred;
  • NumPy arrays that outlive move-only backing storage.

Those require separate true-owner or buffer-base lifetime handling and are left for follow-up PRs.

cc @benjaminhuth @andiwand

@github-actions github-actions Bot added this to the next milestone Jun 10, 2026
@github-actions

Copy link
Copy Markdown
Contributor

📊: Physics performance monitoring for 24a3af8

Full contents
🟥 summary not found!

@sonarqubecloud

Copy link
Copy Markdown

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure if this complexity is necessary. The idea of the proxies is that the container stays in one place, if that changes it is the users responsibility to create new proxies. This is similar to C++ iterators and how they "invalidate" in certain situations.

since we capture the container with a std::unique_ptr it should always stay on the same address and the proxies should be stable after the creation of the container and after the container goes to the whiteboard. I believe during this transition we std::move the container which will change the container address?

is it possible to restructure the code so we do not hold on to proxies when this transition happens?

cc @benjaminhuth

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be a similar story like the python specific readers and writers. the question is if we want to buy into an all-the-way python look and feel interface which requires all these wrappers to avoid segfaults and look nice. cc @paulgessinger

@benjaminhuth benjaminhuth Jun 11, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm not sure we can solve all issues related to this with pointer stability, as we have also moveToConst, consuming handles, and potential use of proxies after s.run() finishes, when all whiteboards are destroyed.

So in some cases the memory is literally freed already on the C++ side, and we should not try to keep a pointer to this without beeing attached to the livecycle...

In general I like the approach, maybe we can think if we can reduce complexity in some cases a bit (e.g, iterators)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you that in C++, iterator invalidation is standard and the user's responsibility. But the reason I added the security is that the existing (before this PR) bindings already invited users to use the bindings in a "python-way" with, for example, the memory arrays to Python users through NumPy. The main danger isn't necessarily a loud segfault, but rather UB/UAF. Because Python doesn't have strict scoping or compile-time lifetime checks like C++, a user can easily save a proxy to a variable and later unknowingly read freed memory or uninitialised columns. This can silently corrupt physics data without crashing the program.

Finally, the reason I created a separate ProxyTether.hpp file is to avoid redundant code and to follow the existing utilities pattern. But this is definitely an architectural choice, and I can modify the code accordingly.

@andiwand andiwand marked this pull request as draft June 11, 2026 08:09
@andiwand

Copy link
Copy Markdown
Contributor

drafting for now until we have a conclusion on the discussion which takes load off the gitlab ci

@benjaminhuth benjaminhuth left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, really cool initiative. I think personally it would be cool to have a mechanism like this to avoid segfaults on the python-level that are due to our C++ ownership model.
I think there are still some things to be discussed about the implementation.

/// still being alive. Used for containers whose iteration yields values/pairs
/// whose Python conversion is still registered (e.g. the flat multimaps).
template <typename Container>
struct CheckedIterator {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm maybe we can make the naming consistent with the Tether naming used above...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we want to have a C++ file here, I think we can also have other test-related functionality src folders (e.g., for histogram creation)

try {
owner.cast<const Container&>();
return true;
} catch (...) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(...) is generally not great, is it possible to name the exception we want to catch here?


pybind11::object owner;
Proxy proxy;
AliveFn aliveFn;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be nice to pull the ownerAlive function into the ProxyTether class, and provide an additional template parameter Owner? Are there cases we want another alive function?
Then we also wouldn't handle the case of aliveFn == nullptr below.

/// be produced by several different container types (e.g. a measurement proxy
/// returned by both MeasurementContainer and MeasurementSubset).
template <typename Proxy>
struct ProxyTether {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I wonder if a proper class with constructor would make sense here to avoid uninitialized objects...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't pybind11 give you the ability to mark something as "internal reference"?

@benjaminhuth benjaminhuth Jun 11, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, because the problem is different:
keep_alive and return_value_policy::internal_reference keep the parent object alive if it goes out of scope and the proxy is still valid.

In our case, we might have the inverse situation:
The python object might still be there, but the underlying C++ memory might be moved away or so.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A different approach would be to completely avoid std::move on the python -> c++ boundary and to copy always. then livetime should work...

/// unregistered type). `maker(owner, container, index)` builds the tethered
/// proxy for each element.
template <typename Container>
struct CheckedIndexIterator {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this not follow the pattern as above of having a inner iterator that doe the iteration, and only wraps the creation of the ProxyTether? Maybe I'm missing something, but, having an index count here seems odd to me...

Comment thread Python/Utilities/include/ActsPython/Utilities/ProxyTether.hpp
Comment on lines +67 to +70
Proxy& checked() {
validate();
return proxy;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for proxies I don't think the non const is used? if we widen the scope of this shell it can be

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants