feat(python): add mutable EventData proxy bindings with lifetime safety#5571
feat(python): add mutable EventData proxy bindings with lifetime safety#5571EliMe5 wants to merge 3 commits into
Conversation
📊: Physics performance monitoring for 24a3af8Full contents |
|
There was a problem hiding this comment.
I am unsure if this complexity is necessary. The idea of the proxies is that the container stays in one place, if that changes it is the users responsibility to create new proxies. This is similar to C++ iterators and how they "invalidate" in certain situations.
since we capture the container with a std::unique_ptr it should always stay on the same address and the proxies should be stable after the creation of the container and after the container goes to the whiteboard. I believe during this transition we std::move the container which will change the container address?
is it possible to restructure the code so we do not hold on to proxies when this transition happens?
There was a problem hiding this comment.
it might be a similar story like the python specific readers and writers. the question is if we want to buy into an all-the-way python look and feel interface which requires all these wrappers to avoid segfaults and look nice. cc @paulgessinger
There was a problem hiding this comment.
Hmm not sure we can solve all issues related to this with pointer stability, as we have also moveToConst, consuming handles, and potential use of proxies after s.run() finishes, when all whiteboards are destroyed.
So in some cases the memory is literally freed already on the C++ side, and we should not try to keep a pointer to this without beeing attached to the livecycle...
In general I like the approach, maybe we can think if we can reduce complexity in some cases a bit (e.g, iterators)
There was a problem hiding this comment.
I agree with you that in C++, iterator invalidation is standard and the user's responsibility. But the reason I added the security is that the existing (before this PR) bindings already invited users to use the bindings in a "python-way" with, for example, the memory arrays to Python users through NumPy. The main danger isn't necessarily a loud segfault, but rather UB/UAF. Because Python doesn't have strict scoping or compile-time lifetime checks like C++, a user can easily save a proxy to a variable and later unknowingly read freed memory or uninitialised columns. This can silently corrupt physics data without crashing the program.
Finally, the reason I created a separate ProxyTether.hpp file is to avoid redundant code and to follow the existing utilities pattern. But this is definitely an architectural choice, and I can modify the code accordingly.
|
drafting for now until we have a conclusion on the discussion which takes load off the gitlab ci |
benjaminhuth
left a comment
There was a problem hiding this comment.
Hi, really cool initiative. I think personally it would be cool to have a mechanism like this to avoid segfaults on the python-level that are due to our C++ ownership model.
I think there are still some things to be discussed about the implementation.
| /// still being alive. Used for containers whose iteration yields values/pairs | ||
| /// whose Python conversion is still registered (e.g. the flat multimaps). | ||
| template <typename Container> | ||
| struct CheckedIterator { |
There was a problem hiding this comment.
Hmm maybe we can make the naming consistent with the Tether naming used above...
There was a problem hiding this comment.
Not sure if we want to have a C++ file here, I think we can also have other test-related functionality src folders (e.g., for histogram creation)
| try { | ||
| owner.cast<const Container&>(); | ||
| return true; | ||
| } catch (...) { |
There was a problem hiding this comment.
(...) is generally not great, is it possible to name the exception we want to catch here?
|
|
||
| pybind11::object owner; | ||
| Proxy proxy; | ||
| AliveFn aliveFn; |
There was a problem hiding this comment.
I wonder if it would be nice to pull the ownerAlive function into the ProxyTether class, and provide an additional template parameter Owner? Are there cases we want another alive function?
Then we also wouldn't handle the case of aliveFn == nullptr below.
| /// be produced by several different container types (e.g. a measurement proxy | ||
| /// returned by both MeasurementContainer and MeasurementSubset). | ||
| template <typename Proxy> | ||
| struct ProxyTether { |
There was a problem hiding this comment.
In general I wonder if a proper class with constructor would make sense here to avoid uninitialized objects...
There was a problem hiding this comment.
Doesn't pybind11 give you the ability to mark something as "internal reference"?
There was a problem hiding this comment.
I don't think so, because the problem is different:
keep_alive and return_value_policy::internal_reference keep the parent object alive if it goes out of scope and the proxy is still valid.
In our case, we might have the inverse situation:
The python object might still be there, but the underlying C++ memory might be moved away or so.
There was a problem hiding this comment.
A different approach would be to completely avoid std::move on the python -> c++ boundary and to copy always. then livetime should work...
| /// unregistered type). `maker(owner, container, index)` builds the tethered | ||
| /// proxy for each element. | ||
| template <typename Container> | ||
| struct CheckedIndexIterator { |
There was a problem hiding this comment.
Can this not follow the pattern as above of having a inner iterator that doe the iteration, and only wraps the creation of the ProxyTether? Maybe I'm missing something, but, having an index count here seems odd to me...
| Proxy& checked() { | ||
| validate(); | ||
| return proxy; | ||
| } |
There was a problem hiding this comment.
for proxies I don't think the non const is used? if we widen the scope of this shell it can be



feat(python): add mutable EventData proxy bindings with lifetime safety
Add mutable Python bindings for
SpacePointContainer2,SeedContainer2, and measurement containers, together with lifetime-safeproxy and iterator handling.
Python proxies currently hold raw pointers into their backing C++
containers. If the container is consumed by a
std::unique_ptr<T>transfer, for example through the whiteboard, the Python wrapper can
remain alive while the C++ object has been moved out. Accessing an
existing proxy then dereferences freed memory.
This PR introduces
ProxyTether, a binding-side wrapper that keeps theowning
py::objectand revalidates it before proxy access. If thebacking container has been disowned, access now raises
ValueErrorinstead of causing undefined behaviour or a segfault.
Also guard optional
SpacePointColumnsaccess so missing columns raiseAttributeErrorinstead of relying on C++ assertions.--- END COMMIT MESSAGE ---
Changes
Add mutable
SpacePointContainer2/MutableSpacePointProxy2bindings:createSpacePoint()__getitem____iter__Add
SeedContainer2/MutableSeedProxy2bindings:createSeed()assignSpacePointContainer()assignSpacePointIndices()qualityandvertexZMake
MeasurementContainer/MeasurementSubsetproxy access use tethered proxies.Add checked iterator handling for example flat multimaps.
Add
py::keep_alivetoConstTrackContainerproxy accessors.ProxyTetherProxyTetherstores the Python owner object, the proxy value, and a type-erased owner-validity check.Before each proxy access, the owner is re-cast to the backing C++ type. If pybind11's
smart_holderhas disowned the object after aunique_ptr<T>transfer, this fails and is translated into a PythonValueError.This handles both relevant lifetime cases:
py::objectis retained;Tests
Adds a test-only pybind11 module,
_acts_core_test_bindings, with helpers that consumeSpacePointContainer2andSeedContainer2viastd::unique_ptr<T>. This reproduces the same disowning behaviour as the whiteboard without depending onacts.examples.test_event_data.pynow covers:AttributeError,All added tests pass.
Known follow-up work
This PR focuses on proxies and iterators whose own backing container has been consumed/disowned. It does not fully solve separate cross-owner or buffer-lifetime cases, including:
sourceLinkssub-iterators whose underlying container is transferred after the iterator is created;SeedContainer2references to an assignedSpacePointContainer2if future Python APIs expose dereferencing that relationship;TrackProxyobjects kept alive aftermakeConst()drains the mutable track container;MeasurementSubsetviews after the originalMeasurementContaineris independently transferred;Those require separate true-owner or buffer-base lifetime handling and are left for follow-up PRs.
cc @benjaminhuth @andiwand