Skip to content

Tracing allocations across thread pools #438

@itamarst

Description

@itamarst

A common pattern (in BLAS libraries, numexpr, blosc2, Polars) is to have a thread pool that runs tasks on behalf of the Python thread. Allocations can happen in this thread pool.

From a Python programmer's perspective, they care which Python code was responsible for the allocation, especially when (as is the most likely scenario) they're using a third-party library. Unfortunately, since the allocation happens in a different thread connecting the allocation there to the Python code that is causally responsible isn't easy.

At the moment Fil (but not Sciagraph) solves this by setting libraries to single-threaded, so the native code runs in the same thread. This suffers from a number of issues, from distorting runtime outcomes to lack of support for arbitrary libraries, e.g. Polars (and Polars' thread limiting doesn't even seem to work?).

Another option: try to get these libraries to expose the information necessary to track causality across threads, for the benefit of memory profilers (and perhaps performance profilers?). This works specifically because of the thread pool model where there's a specific request being sent to the thread pool and a result sent back.

3rd party library support

A library that wanted to support this would have to do the following:

  1. When a task is submitted to the thread pool, get the current thread id, send it across to the thread pool with the task.
  2. When a thread receives a task, it sets a thread local to that originating thread id.
  3. When a thread finishes a task, it clears the thread local, just in case.
  4. The library exposes a public API functionmylibrary_get_responsible_thread_for_current_task() which returns the thread ID by reading the thread local, and can then be used by the memory profiler to match up with the responsible Python thread, which would presumably be waiting on the thread pool.

Ideally all libraries would use a consistent concept of thread ID, and this would have to cross-platform, but this isn't really a lot of code. So it seems feasible to submit it as patches to all the relevant upstream libraries.

Profiler support

The memory profiler when trapping e.g. malloc() would use that API, and then potentially have to get the callstack for a different thread... which may be tricky, but:

  • Looks doable in Fil, actually, would just need to maintain a dictionary from thread id -> Callstack instead of just thread local of Callstack.
  • Sciagraph is already doing horrible "getting callstack from a different thread" stuff so that's fine there.

Also I guess it would have to know which library is responsible... So I guess that implies need for a mapping from thread id to which library is running the code. Sciagraph already has hooks to do this. Perhaps there's another way as well via library support? The library knows which threads it is managing, after all.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions