Skip to content

gh-149101: Implement PEP 788#149116

Open
ZeroIntensity wants to merge 35 commits intopython:mainfrom
ZeroIntensity:pep-788
Open

gh-149101: Implement PEP 788#149116
ZeroIntensity wants to merge 35 commits intopython:mainfrom
ZeroIntensity:pep-788

Conversation

@ZeroIntensity
Copy link
Copy Markdown
Member

@ZeroIntensity ZeroIntensity commented Apr 28, 2026

Hugo has graciously given me permission to backport this if we don't make the May 5th deadline, but let's try to get this done in time!

I will write a full tutorial and migration guide once this is merged; I want to first make sure that this lands before the beta freeze.

@read-the-docs-community
Copy link
Copy Markdown

read-the-docs-community Bot commented Apr 28, 2026

Documentation build overview

📚 cpython-previews | 🛠️ Build #32504432 | 📁 Comparing e105120 against main (40dc61a)

  🔍 Preview build  

47 files changed · ± 47 modified

± Modified

Copy link
Copy Markdown
Member

@encukou encukou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding these!

I'll send notes for Doc/ now; code review coming up.

Comment thread Doc/c-api/interp-lifecycle.rst Outdated
Comment thread Doc/c-api/interp-lifecycle.rst Outdated
Comment thread Doc/c-api/interp-lifecycle.rst Outdated
Comment thread Doc/c-api/interp-lifecycle.rst
Comment thread Doc/c-api/interp-lifecycle.rst Outdated
Comment thread Doc/c-api/threads.rst Outdated
Comment thread Doc/c-api/threads.rst Outdated
Comment thread Doc/c-api/threads.rst Outdated
Comment thread Doc/c-api/threads.rst Outdated
Comment thread Doc/whatsnew/3.15.rst Outdated
ZeroIntensity and others added 5 commits April 29, 2026 08:24
Co-authored-by: Petr Viktorin <encukou@gmail.com>
Co-authored-by: Petr Viktorin <encukou@gmail.com>
Co-authored-by: Petr Viktorin <encukou@gmail.com>
@ZeroIntensity ZeroIntensity added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Apr 29, 2026
@bedevere-bot
Copy link
Copy Markdown

🤖 New build scheduled with the buildbot fleet by @ZeroIntensity for commit bc78c10 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F149116%2Fmerge

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Apr 29, 2026
@encukou
Copy link
Copy Markdown
Member

encukou commented Apr 30, 2026

for buildbots: The RHEL8 failures aren't relevant. Refleaks are worrying though.

@encukou
Copy link
Copy Markdown
Member

encukou commented Apr 30, 2026

Refleaks are worrying though.

Never mind; main currently leaks (#149179).

Copy link
Copy Markdown
Member

@encukou encukou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today's part of the review
(If some comment doesn't make sense, it might be because I didn't read through everything yet. )

Comment thread Doc/c-api/threads.rst Outdated
Comment thread Misc/stable_abi.toml
Comment on lines +2726 to +2727
[function.PyInterpreterGuard_FromView]
added = '3.15'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is missing from the PEP; I assume that's an oversight. Just update the PEP when you mark it Final, and ask SC to rubber-stamp it.

Comment thread Include/internal/pycore_pystate.h Outdated
Comment thread Modules/_testcapimodule.c Outdated
Comment thread Modules/_testcapimodule.c Outdated
Comment thread Python/pylifecycle.c Outdated
Comment thread Python/pylifecycle.c Outdated
Comment on lines +2161 to +2162
// For debugging purposes, we emit a fatal error if someone
// CTRL^C'ed the process.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be removed now?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's nice to have this for users. Otherwise, developers debugging a stuck process will have to pkill every time they mess something up.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or Ctrl+\ :)
At least limit this to KeyboardInterrupt? I really think this is not a good reason to bring down the entire process.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a built-in mechanism to only catch KeyboardInterrupt?

I really think this is not a good reason to bring down the entire process.

Well, if we simply don't allow this to be interrupted, then developers will just have to kill the process during debugging, which is a little annoying, but if the process is going to be killed anyway, why not make it easier for users to do that?

The alternative is to just stop waiting on guards if a CTRL^C occurs, but I think the process will likely segfault if we do that because threads are having their protection against finalization removed out from under them.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a built-in mechanism to only catch KeyboardInterrupt?

PyErr_ExceptionMatches(PyExc_KeyboardInterrupt)

Well, if we simply don't allow this to be interrupted, then developers will just have to kill the process during debugging, which is a little annoying, but if the process is going to be killed anyway, why not make it easier for users to do that?

Is the process going to be killed, though? We're finalizing the runtime here, not the process. Embedders can still Py_Initialize afterwards.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the process going to be killed, though? We're finalizing the runtime here, not the process. Embedders can still Py_Initialize afterwards.

This is for the case in which the developer forgot to call PyInterpreterGuard_Close. The process has to hang indefinitely. If we somehow don't, then there's a thread-safety issue, because the interpreter was somehow able to finalize while a guard was active.

Are you concerned about someone accidentally hitting this while they're shutting down the process? As a compromise, I could make it so that CTRL^C will only emit a fatal error if, say, 1 second has passed while waiting on guards. We could also make it only available in certain environments, like with a -X interrupt_interp_guards flag.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it also for the case where finalization is taking a long time? Perhaps it'll take 2 seconds to safely save some data before shutdown?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it solely for debugging unclosed guards. Unfortunately, there's not any way to distinguish between a long time and never here.

Comment thread Python/pystate.c Outdated
Comment thread Python/pystate.c Outdated
ZeroIntensity and others added 12 commits April 30, 2026 08:29
Co-authored-by: Petr Viktorin <encukou@gmail.com>
Co-authored-by: Petr Viktorin <encukou@gmail.com>
For some reason, clangd thought it would be a good idea to automatically
apply its ugly formatting style on save. It has since been uninstalled.
But this time, I didn't accidentally reformat all of pylifecycle.c!
Comment thread Python/pystate.c
// it is not necessarily guaranteed that countdown is zero). We use it
// to simply prevent the finalization from constantly spinning and
// atomically reading the countdown.
_PyEvent_Reset(&interp->finalization_guards.done);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer if a threading expert checked that _PyEvent_Reset is a safe primitive.

But before pinging one: why is countdown necessary? Could this do with only the R/W lock, with guards as readers and finalization (which sets the Finalized flag) as a writer?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resetting an event isn't a super common thing to do, but it is safe as long as there aren't any waiters (for reference, we do have Event.clear in the threading module).

It's likely possible to do this with only the RW lock, but it would require a lot of refactoring. During finalization, our current setup for "pre-finalization" tasks (guards, non-daemon threads, atexit callbacks, etc) is as follows:

  1. First, we try to run all of them sequentially. So, join is called for each thread, atexit callbacks are executed, and in this case, we wait on the PyEvent for guards.
  2. Then, we make a stop-the-world pause to do an atomic check for the number of each remaining. We know other threads can't create more things during this period.
  3. If any of them is non-zero (meaning that more tasks were created while the world was stopping), then the world is started and the cycle repeats.
  4. Otherwise, the world is kept stopped, and then all non-main threads are deleted and thread state attachment is disallowed before the world is started again.

We need the countdown for step 2. If we don't track the number of guards, we can't know whether they're all cleaned up, so we don't know whether to run another cycle. With the current setup, the alternative without the countdown would involve disallowing the creation of new guards prematurely (before step 1), but that would mean threads can't protect against finalization in perfectly valid scenarios. We can't do it after step 4, because the world is stopped, so threads wouldn't be able to release their guards.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also cc @colesbury for PyEvent

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a _PyRWMutex_TryLock that fails if the lock is held?

Well, guess that's for after the beta :)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a _PyRWMutex_TryLock that fails if the lock is held?

What for? Interrupting the lock acquisition?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With that this could be:

  1. as before
  2. We stop the world and try acquiring the write lock. We know other threads can't create more things during this period.
  3. If acquiring failed (meaning that more tasks were created while the world was stopping), then the world is started and the cycle repeats.
  4. as before

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem safe to me. Isn't there a race here with make_pre_finalization_calls and try_acquire_interp_guard?

  • T1 checks finalization_guards.countdown > 0
  • T2 decrements, notifies, and resets
  • T1 waits on finalization_guards.done

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the race there, exactly? It's supposed to wait on finalization_guards.done multiple times.

Copy link
Copy Markdown
Member Author

@ZeroIntensity ZeroIntensity May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I misunderstood your comment; I was looking at the finalization_guards.countdown read during the STW pause. I think I see the race now. We need to grab the lock before waiting. Edit: actually, we can just check finalization_guards.countdown > 0 while spinning.

Comment thread Python/pystate.c Outdated
PyThreadState_Swap(to_restore);
}

PyThreadState_Swap(to_restore);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyThreadState_Swap doesn't need to happen twice, right?
Also, could you assert the result?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, we don't need the PyThreadState_Swap in the branch above.

Also, could you assert the result?

What result am I asserting?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyThreadState_Swap should return tstate.

Comment thread Python/pystate.c Outdated
Comment on lines +3621 to +3624
if (tstate->ensure.owned_guard != NULL) {
PyInterpreterGuard_Close(tstate->ensure.owned_guard);
tstate->ensure.owned_guard = NULL;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fishy: tstate shouldn't be used after PyThreadState_Clear wipes it, and AFIAK owned_guard isn't locked here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, it's safe to access owned_guard as long as the thread state is only cleared, not deleted (because the memory is still valid). It's not really safe to release the guard before we've cleared the thread state because the guard needs to be there to protect finalizers.

Also, what do you mean by "locked"?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going by the docs, PyThreadState_Clear will “Reset all information in a thread state object.”
I realize we can rely on undocumented behavior here, but still, it would be nice to “take over” owned_guard & delete_on_release (stash them in locals and unset the tstate's owned_guard) while everything is still attached.


what do you mean by "locked"?

I don't see what's preventing another thread's EnsureFromView from relying on the owned_guard just before it's set to NULL.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay, I see where you're coming from now.

I realize we can rely on undocumented behavior here, but still, it would be nice to “take over” owned_guard & delete_on_release (stash them in locals and unset the tstate's owned_guard) while everything is still attached.

This sounds reasonable, though I'm a little uneasy about setting owned_guard to NULL in PyThreadState_Clear. (I can't think of a practical example where it would be an issue, but something feels a little off about it.)

Would you be comfortable with stashing owned_guard in locals before calling PyThreadState_Clear, but not actually clearing the owned_guard field in PyThreadState_Clear (for now)? We can always add it later (presumably after b1).

I don't see what's preventing another thread's EnsureFromView from relying on the owned_guard just before it's set to NULL.

Another thread shouldn't be touching this owned_guard at all (the field is local to the thread, not to the interpreter). The other thread will have its own guard associated with its own EnsureFromView call.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be comfortable with stashing owned_guard in locals before calling PyThreadState_Clear, but not actually clearing the owned_guard field in PyThreadState_Clear (for now)? We can always add it later (presumably after b1).

I got over my fear -- done in 25687af.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, OK.

Would you be comfortable with stashing owned_guard in locals before calling PyThreadState_Clear, but not actually clearing the owned_guard field in PyThreadState_Clear (for now)?

Definitely.
Ideally, close the guard before the PyThreadState_Clear, though :)

@ZeroIntensity
Copy link
Copy Markdown
Member Author

Failing CI is because Ubuntu is down.

Copy link
Copy Markdown
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only briefly looked through the PR, but my thoughts so far are:

  • This is a big change with plenty of opportunities for subtle bugs and deadlocks
  • It doesn't seem wise to me to rush this into 3.15 at the end of the feature development cycle

Comment thread Python/pystate.c
Comment on lines +3513 to +3514
static const PyThreadState _no_tstate_sentinel = {0};
#define NO_TSTATE_SENTINEL ((PyThreadState *)&_no_tstate_sentinel)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyThreadState objects are large. We shouldn't be creating one here just for a sentinel return value.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was initially an int. @encukou asked me to change it to a PyThreadState for debugging purposes.

How about we use a PyThreadState on debug builds, and an int on release builds?

Comment thread Python/pystate.c
if (attached_tstate != NULL && attached_tstate->interp == interp) {
/* Yay! We already have an attached thread state that matches. */
++attached_tstate->ensure.counter;
return NO_TSTATE_SENTINEL;
Copy link
Copy Markdown
Contributor

@colesbury colesbury May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like I'm missing something here. How is this supposed to be used by callers? Sometimes PyThreadState_Ensure will return a completely unusable (but not NULL) value? Like how are callers supposed to know if the PyThreadState* returned from this function is useful?

I didn't see any discussions about NO_TSTATE_SENTINEL in the PEP. I also didn't see it in the discussion forums, but that can be hard to follow/search so I might have missed the post.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like how are callers supposed to know if the PyThreadState* returned from this function is useful?

They aren't supposed to, and that's by design. The PyThreadState * returned by Ensure/EnsureFromView is a handle to the previous thread state, which isn't relevant to the user. If they want to see the thread state that was just attached, they have to use PyThreadState_Get().

I also didn't see it in the discussion forums, but that can be hard to follow/search so I might have missed the post.

I think it was brought up at some point in the C API WG discussion. I do agree that it's a little hacky, but it doesn't seem like a huge deal to me either. Switching to int PyThreadState_Ensure(PyInterpreterGuard *guard, PyThreadState **out_ptr) would complicate API usage, so it seemed like a fair tradeoff.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I understand correctly, the API has a public return value that is by design useless?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's in the PEP:

This function will return NULL to indicate a memory allocation failure, and otherwise return a pointer to the thread state that was previously attached (which might have been NULL, in which case an non-NULL sentinel value is returned instead to differentiate between failure – this means that this function will sometimes return an invalid PyThreadState pointer).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function will return NULL to indicate a memory allocation failure, and otherwise return a pointer to the thread state that was previously attached.

I find that text very confusing. This isn't a pointer to a previously attached thread state!

We have a return value that is:

  • Useless to the caller
  • An existing type used in multiple other APIs! So callers should reasonably expect that they interoperate!

Doesn't this stand out as a giant API design red flag to you?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's not ideal. The result is something you can only pass to PyThreadState_Release -- or compare with another PyThreadState*.

@ZeroIntensity
Copy link
Copy Markdown
Member Author

It doesn't seem wise to me to rush this into 3.15 at the end of the feature development cycle

Hugo granted an exception to backport this to 3.15 if we don't make the b1 deadline (though I did tell him that if we do go over the deadline, it won't be by that long). Are you suggesting that we defer this to 3.16?

@colesbury
Copy link
Copy Markdown
Contributor

Are you suggesting that we defer this to 3.16?

Yes

@encukou
Copy link
Copy Markdown
Member

encukou commented May 1, 2026

You should take that up with the Steering Council, as the PEP was approved for 3.15.

@colesbury
Copy link
Copy Markdown
Contributor

I'm taking it up with the PEP author, who can choose to disagree with my opinion

@ZeroIntensity
Copy link
Copy Markdown
Member Author

I'd like to aim for 3.15, at least for the time being. I think the majority of thread-safety bugs here are going to be rare and won't require large changes to the implementation.

That said, if it turns out that we do find some fundamental flaws here, I am okay with deferring the PEP to 3.16.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants