gh-149101: Implement PEP 788 by ZeroIntensity · Pull Request #149116 · python/cpython

ZeroIntensity · 2026-04-28T21:41:53Z

Hugo has graciously given me permission to backport this if we don't make the May 5th deadline, but let's try to get this done in time!

I will write a full tutorial and migration guide once this is merged; I want to first make sure that this lands before the beta freeze.

Issue: Implement PEP 788 -- Protecting the C API from Interpreter Finalization #149101

read-the-docs-community · 2026-04-28T21:49:25Z

Documentation build overview

📚 cpython-previews | 🛠️ Build #32504432 | 📁 Comparing e105120 against main (40dc61a)

🔍 Preview build

47 files changed · ± 47 modified

± Modified

encukou

Thanks for adding these!

I'll send notes for Doc/ now; code review coming up.

Co-authored-by: Petr Viktorin <encukou@gmail.com>

bedevere-bot · 2026-04-29T16:58:01Z

🤖 New build scheduled with the buildbot fleet by @ZeroIntensity for commit bc78c10 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F149116%2Fmerge

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

encukou · 2026-04-30T07:31:27Z

for buildbots: The RHEL8 failures aren't relevant. Refleaks are worrying though.

encukou · 2026-04-30T09:30:04Z

Refleaks are worrying though.

Never mind; main currently leaks (#149179).

encukou

Today's part of the review
(If some comment doesn't make sense, it might be because I didn't read through everything yet. )

encukou · 2026-04-30T09:08:37Z

+[function.PyInterpreterGuard_FromView]
+    added = '3.15'


This is missing from the PEP; I assume that's an oversight. Just update the PEP when you mark it Final, and ask SC to rubber-stamp it.

encukou · 2026-04-30T11:18:21Z

+        // For debugging purposes, we emit a fatal error if someone
+        // CTRL^C'ed the process.


Should this be removed now?

I think it's nice to have this for users. Otherwise, developers debugging a stuck process will have to pkill every time they mess something up.

Or Ctrl+\ :)
At least limit this to KeyboardInterrupt? I really think this is not a good reason to bring down the entire process.

Do we have a built-in mechanism to only catch KeyboardInterrupt?

I really think this is not a good reason to bring down the entire process.

Well, if we simply don't allow this to be interrupted, then developers will just have to kill the process during debugging, which is a little annoying, but if the process is going to be killed anyway, why not make it easier for users to do that?

The alternative is to just stop waiting on guards if a CTRL^C occurs, but I think the process will likely segfault if we do that because threads are having their protection against finalization removed out from under them.

Do we have a built-in mechanism to only catch KeyboardInterrupt?

PyErr_ExceptionMatches(PyExc_KeyboardInterrupt)

Well, if we simply don't allow this to be interrupted, then developers will just have to kill the process during debugging, which is a little annoying, but if the process is going to be killed anyway, why not make it easier for users to do that?

Is the process going to be killed, though? We're finalizing the runtime here, not the process. Embedders can still Py_Initialize afterwards.

Is the process going to be killed, though? We're finalizing the runtime here, not the process. Embedders can still Py_Initialize afterwards.

This is for the case in which the developer forgot to call PyInterpreterGuard_Close. The process has to hang indefinitely. If we somehow don't, then there's a thread-safety issue, because the interpreter was somehow able to finalize while a guard was active.

Are you concerned about someone accidentally hitting this while they're shutting down the process? As a compromise, I could make it so that CTRL^C will only emit a fatal error if, say, 1 second has passed while waiting on guards. We could also make it only available in certain environments, like with a -X interrupt_interp_guards flag.

Isn't it also for the case where finalization is taking a long time? Perhaps it'll take 2 seconds to safely save some data before shutdown?

I added it solely for debugging unclosed guards. Unfortunately, there's not any way to distinguish between a long time and never here.

Co-authored-by: Petr Viktorin <encukou@gmail.com>

For some reason, clangd thought it would be a good idea to automatically apply its ugly formatting style on save. It has since been uninstalled.

But this time, I didn't accidentally reformat all of pylifecycle.c!

encukou · 2026-05-01T10:20:55Z

+        // it is not necessarily guaranteed that countdown is zero). We use it
+        // to simply prevent the finalization from constantly spinning and
+        // atomically reading the countdown.
+        _PyEvent_Reset(&interp->finalization_guards.done);


I'd prefer if a threading expert checked that _PyEvent_Reset is a safe primitive.

But before pinging one: why is countdown necessary? Could this do with only the R/W lock, with guards as readers and finalization (which sets the Finalized flag) as a writer?

Resetting an event isn't a super common thing to do, but it is safe as long as there aren't any waiters (for reference, we do have Event.clear in the threading module).

It's likely possible to do this with only the RW lock, but it would require a lot of refactoring. During finalization, our current setup for "pre-finalization" tasks (guards, non-daemon threads, atexit callbacks, etc) is as follows:

First, we try to run all of them sequentially. So, join is called for each thread, atexit callbacks are executed, and in this case, we wait on the PyEvent for guards.

Then, we make a stop-the-world pause to do an atomic check for the number of each remaining. We know other threads can't create more things during this period.

If any of them is non-zero (meaning that more tasks were created while the world was stopping), then the world is started and the cycle repeats.

Otherwise, the world is kept stopped, and then all non-main threads are deleted and thread state attachment is disallowed before the world is started again.

We need the countdown for step 2. If we don't track the number of guards, we can't know whether they're all cleaned up, so we don't know whether to run another cycle. With the current setup, the alternative without the countdown would involve disallowing the creation of new guards prematurely (before step 1), but that would mean threads can't protect against finalization in perfectly valid scenarios. We can't do it after step 4, because the world is stopped, so threads wouldn't be able to release their guards.

Also cc @colesbury for PyEvent

Do we need a _PyRWMutex_TryLock that fails if the lock is held?

Well, guess that's for after the beta :)

Do we need a _PyRWMutex_TryLock that fails if the lock is held?

What for? Interrupting the lock acquisition?

With that this could be:

as before

We stop the world and try acquiring the write lock. We know other threads can't create more things during this period.

If acquiring failed (meaning that more tasks were created while the world was stopping), then the world is started and the cycle repeats.

as before

This doesn't seem safe to me. Isn't there a race here with make_pre_finalization_calls and try_acquire_interp_guard?

T1 checks finalization_guards.countdown > 0

T2 decrements, notifies, and resets

T1 waits on finalization_guards.done

What would be the race there, exactly? It's supposed to wait on finalization_guards.done multiple times.

Oh, I misunderstood your comment; I was looking at the finalization_guards.countdown read during the STW pause. I think I see the race now. We need to grab the lock before waiting. Edit: actually, we can just check finalization_guards.countdown > 0 while spinning.

encukou · 2026-05-01T10:39:17Z

+        PyThreadState_Swap(to_restore);
+    }
+
+    PyThreadState_Swap(to_restore);


PyThreadState_Swap doesn't need to happen twice, right?
Also, could you assert the result?

Good catch, we don't need the PyThreadState_Swap in the branch above.

Also, could you assert the result?

What result am I asserting?

PyThreadState_Swap should return tstate.

encukou · 2026-05-01T10:39:19Z

+    if (tstate->ensure.owned_guard != NULL) {
+        PyInterpreterGuard_Close(tstate->ensure.owned_guard);
+        tstate->ensure.owned_guard = NULL;
+    }


This looks fishy: tstate shouldn't be used after PyThreadState_Clear wipes it, and AFIAK owned_guard isn't locked here.

Technically, it's safe to access owned_guard as long as the thread state is only cleared, not deleted (because the memory is still valid). It's not really safe to release the guard before we've cleared the thread state because the guard needs to be there to protect finalizers.

Also, what do you mean by "locked"?

Going by the docs, PyThreadState_Clear will “Reset all information in a thread state object.”
I realize we can rely on undocumented behavior here, but still, it would be nice to “take over” owned_guard & delete_on_release (stash them in locals and unset the tstate's owned_guard) while everything is still attached.

what do you mean by "locked"?

I don't see what's preventing another thread's EnsureFromView from relying on the owned_guard just before it's set to NULL.

Ah, okay, I see where you're coming from now.

I realize we can rely on undocumented behavior here, but still, it would be nice to “take over” owned_guard & delete_on_release (stash them in locals and unset the tstate's owned_guard) while everything is still attached.

This sounds reasonable, though I'm a little uneasy about setting owned_guard to NULL in PyThreadState_Clear. (I can't think of a practical example where it would be an issue, but something feels a little off about it.)

Would you be comfortable with stashing owned_guard in locals before calling PyThreadState_Clear, but not actually clearing the owned_guard field in PyThreadState_Clear (for now)? We can always add it later (presumably after b1).

I don't see what's preventing another thread's EnsureFromView from relying on the owned_guard just before it's set to NULL.

Another thread shouldn't be touching this owned_guard at all (the field is local to the thread, not to the interpreter). The other thread will have its own guard associated with its own EnsureFromView call.

Would you be comfortable with stashing owned_guard in locals before calling PyThreadState_Clear, but not actually clearing the owned_guard field in PyThreadState_Clear (for now)? We can always add it later (presumably after b1).

I got over my fear -- done in 25687af.

Ah, OK.

Would you be comfortable with stashing owned_guard in locals before calling PyThreadState_Clear, but not actually clearing the owned_guard field in PyThreadState_Clear (for now)?

Definitely.
Ideally, close the guard before the PyThreadState_Clear, though :)

ZeroIntensity · 2026-05-01T14:28:16Z

Failing CI is because Ubuntu is down.

colesbury

I've only briefly looked through the PR, but my thoughts so far are:

This is a big change with plenty of opportunities for subtle bugs and deadlocks
It doesn't seem wise to me to rush this into 3.15 at the end of the feature development cycle

colesbury · 2026-05-01T14:34:42Z

+static const PyThreadState _no_tstate_sentinel = {0};
+#define NO_TSTATE_SENTINEL ((PyThreadState *)&_no_tstate_sentinel)


PyThreadState objects are large. We shouldn't be creating one here just for a sentinel return value.

This was initially an int. @encukou asked me to change it to a PyThreadState for debugging purposes.

How about we use a PyThreadState on debug builds, and an int on release builds?

colesbury · 2026-05-01T14:49:41Z

+    if (attached_tstate != NULL && attached_tstate->interp == interp) {
+        /* Yay! We already have an attached thread state that matches. */
+        ++attached_tstate->ensure.counter;
+        return NO_TSTATE_SENTINEL;


I feel like I'm missing something here. How is this supposed to be used by callers? Sometimes PyThreadState_Ensure will return a completely unusable (but not NULL) value? Like how are callers supposed to know if the PyThreadState* returned from this function is useful?

I didn't see any discussions about NO_TSTATE_SENTINEL in the PEP. I also didn't see it in the discussion forums, but that can be hard to follow/search so I might have missed the post.

Like how are callers supposed to know if the PyThreadState* returned from this function is useful?

They aren't supposed to, and that's by design. The PyThreadState * returned by Ensure/EnsureFromView is a handle to the previous thread state, which isn't relevant to the user. If they want to see the thread state that was just attached, they have to use PyThreadState_Get().

I also didn't see it in the discussion forums, but that can be hard to follow/search so I might have missed the post.

I think it was brought up at some point in the C API WG discussion. I do agree that it's a little hacky, but it doesn't seem like a huge deal to me either. Switching to int PyThreadState_Ensure(PyInterpreterGuard *guard, PyThreadState **out_ptr) would complicate API usage, so it seemed like a fair tradeoff.

So if I understand correctly, the API has a public return value that is by design useless?

It's in the PEP:

This function will return NULL to indicate a memory allocation failure, and otherwise return a pointer to the thread state that was previously attached (which might have been NULL, in which case an non-NULL sentinel value is returned instead to differentiate between failure – this means that this function will sometimes return an invalid PyThreadState pointer).

This function will return NULL to indicate a memory allocation failure, and otherwise return a pointer to the thread state that was previously attached.

I find that text very confusing. This isn't a pointer to a previously attached thread state!

We have a return value that is:

Useless to the caller

An existing type used in multiple other APIs! So callers should reasonably expect that they interoperate!

Doesn't this stand out as a giant API design red flag to you?

Yes, it's not ideal. The result is something you can only pass to PyThreadState_Release -- or compare with another PyThreadState*.

ZeroIntensity · 2026-05-01T15:05:07Z

It doesn't seem wise to me to rush this into 3.15 at the end of the feature development cycle

Hugo granted an exception to backport this to 3.15 if we don't make the b1 deadline (though I did tell him that if we do go over the deadline, it won't be by that long). Are you suggesting that we defer this to 3.16?

colesbury · 2026-05-01T15:10:39Z

Are you suggesting that we defer this to 3.16?

Yes

encukou · 2026-05-01T15:14:25Z

You should take that up with the Steering Council, as the PEP was approved for 3.15.

colesbury · 2026-05-01T15:19:52Z

I'm taking it up with the PEP author, who can choose to disagree with my opinion

ZeroIntensity · 2026-05-01T15:45:08Z

I'd like to aim for 3.15, at least for the time being. I think the majority of thread-safety bugs here are going to be rare and won't require large changes to the implementation.

That said, if it turns out that we do find some fundamental flaws here, I am okay with deferring the PEP to 3.16.

ZeroIntensity added 6 commits April 28, 2026 15:49

Copy the reference implementation.

43b0798

Merge branch 'main' of https://github.com/python/cpython into pep-788

722dbdf

Document the new APIs.

73cccbb

Add a whatsnew entry.

25e1cf0

Fix stable ABI things.

af1022a

Fix test_embed.

2fb8419

ZeroIntensity requested review from a team, AA-Turner, FFY00, encukou and ericsnowcurrently as code owners April 28, 2026 21:41

bedevere-app Bot added the awaiting core review label Apr 28, 2026

bedevere-app Bot mentioned this pull request Apr 28, 2026

Implement PEP 788 -- Protecting the C API from Interpreter Finalization #149101

Open

ZeroIntensity added 2 commits April 28, 2026 17:43

SIlly news entry.

1096a32

Documentation fixes.

407062f

ZeroIntensity and others added 3 commits April 28, 2026 18:09

Make the sentinel const instead of changing the C analyzer.

a33bdfc

Fix the html IDs job.

c504a9f

Merge branch 'main' into pep-788

f95bfc7

encukou reviewed Apr 29, 2026

View reviewed changes

ZeroIntensity and others added 5 commits April 29, 2026 08:24

Apply suggestions from code review

a00bfbb

Co-authored-by: Petr Viktorin <encukou@gmail.com>

Update Doc/c-api/threads.rst

6ca2499

Co-authored-by: Petr Viktorin <encukou@gmail.com>

Fix lint and remove dead comment.

74259f8

Improve new PyThreadState API docs.

eff9b40

Co-authored-by: Petr Viktorin <encukou@gmail.com>

Fix lint.

bc78c10

ZeroIntensity added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Apr 29, 2026

bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Apr 29, 2026

encukou reviewed Apr 30, 2026

View reviewed changes

ZeroIntensity and others added 12 commits April 30, 2026 08:29

Apply suggestions from code review

0a30c25

Co-authored-by: Petr Viktorin <encukou@gmail.com>

Update Doc/c-api/threads.rst

fe3d8a1

Co-authored-by: Petr Viktorin <encukou@gmail.com>

Fix line endings.

ad69f96

Remove accidental formatting changes.

1623556

For some reason, clangd thought it would be a good idea to automatically apply its ugly formatting style on save. It has since been uninstalled.

Fix memory ordering for the event reset.

cad8786

Improve NO_TSTATE_SENTINEL.

e042656

Fix some test things.

6a05c90

Remove note about deallocation.

728ee3a

Explicitly mark implementation details in the docs.

d9e1170

Add missing versionadded markers.

2f824d3

Remove dead comment.

ab9d783

But this time, I didn't accidentally reformat all of pylifecycle.c!

Merge branch 'main' of https://github.com/python/cpython into pep-788

10e241e

encukou reviewed May 1, 2026

View reviewed changes

ZeroIntensity added 4 commits May 1, 2026 14:04

Some improvements to PyThreadState_Release() based on review.

25687af

Add a comment.

6d3b40a

Remove _ prefix from struct names.

0c08141

Only issue a fatal error for KeyboardInterrupt.

31f1155

colesbury reviewed May 1, 2026

View reviewed changes

ZeroIntensity added 3 commits May 1, 2026 16:35

Don't use a full PyThreadState for NO_TSTATE_SENTINEL on release builds.

6c62b40

Fix race during event spinning.

d8ce02d

Merge branch 'main' of https://github.com/python/cpython into pep-788

e105120

		// For debugging purposes, we emit a fatal error if someone
		// CTRL^C'ed the process.

		static const PyThreadState _no_tstate_sentinel = {0};
		#define NO_TSTATE_SENTINEL ((PyThreadState *)&_no_tstate_sentinel)

Uh oh!

Conversation

ZeroIntensity commented Apr 28, 2026 • edited by bedevere-app Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

read-the-docs-community Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Documentation build overview

Uh oh!

encukou left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bedevere-bot commented Apr 29, 2026

Uh oh!

encukou commented Apr 30, 2026

Uh oh!

encukou commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

encukou left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ZeroIntensity May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

ZeroIntensity commented Apr 28, 2026 •

edited by bedevere-app Bot

Loading

read-the-docs-community Bot commented Apr 28, 2026 •

edited

Loading

encukou commented Apr 30, 2026 •

edited

Loading

ZeroIntensity May 1, 2026 •

edited

Loading

colesbury May 1, 2026 •

edited

Loading