pal_uring: quiesce minircu before blocking in io_uring worker by smalis-msft · Pull Request #3490 · microsoft/openvmm

smalis-msft · 2026-05-14T16:51:27Z

Summary

Fix recurring rcu_preempt self-detected stall failures in OpenHCL on large isolated VMs (most visible on the x64-windows-amd-snp CI runner, e.g. the memory_validation_debug_very_heavy test) by having io-uring worker threads quiesce the global minircu domain immediately before blocking in io_uring_enter.

Root cause

Every pal_uring threadpool worker (the threads named tp) registers itself as a non-quiesced reader in the global minircu domain the first time it polls a future that enters an RCU read-side critical section — which happens as soon as a worker touches anything in guestmem. After that point the worker stays registered for the lifetime of the thread.

The only production caller of minircu::global().quiesce() is the VTL2 VP loop in openhcl/virt_mshv_vtl/src/processor/mod.rs. Threadpool workers never quiesced, so every guestmem::rcu().synchronize_blocking() writer in openhcl/underhill_mem/src/lib.rs (five hot call sites — page visibility / VTL protection updates) was forced to broadcast membarrier(MEMBARRIER_CMD_PRIVATE_EXPEDITED) to every CPU.

On a 64-VP SNP CVM with most VPs running VTL0 guest code, the IPI broadcast can stall for tens of seconds waiting for VPs to context-switch, long enough to trigger:

rcu: INFO: rcu_preempt self-detected stall on CPU
...
smp_call_function_many_cond+0x113/0x300
__x64_sys_membarrier+0x27c/0x360

inside one of the tp workers, after which dependent tests eventually hit their nextest timeout. This was observed e.g. on PR #3487 CI in run 25867445619 on x64-windows-amd-snp.

This is essentially the same class of symptom as #2334 (TDX AP-start stall) but with a different trigger.

Fix

minircu's own docs (support/minircu/src/lib.rs) prescribe the fix:

"For best performance, ensure all threads in your process call quiesce when a thread is going to sleep or block."

Call minircu::global().quiesce() in the io-uring worker loop just before submit_and_wait. Once quiesced, the worker is invisible to membarrier broadcasts until it next enters a critical section, at which point ThreadData::enter_slow already emits a fence(SeqCst) to publish the transition — so correctness is preserved. This mirrors the existing quiesce call in the VTL2 VP loop.

Cost: a single TLS load + relaxed atomic store per idle cycle on the worker. No-op for threads that have never entered a critical section.

Every io-uring worker thread registers itself as a non-quiesced minircu reader the first time it polls a future that enters an RCU read-side critical section (e.g. anything touching guestmem). Workers then stay registered for the lifetime of the thread, so every guestmem::rcu().synchronize_blocking() writer in underhill_mem (page visibility / VTL protection changes) is forced to broadcast membarrier(MEMBARRIER_CMD_PRIVATE_EXPEDITED) to every CPU. On large isolated VMs (observed on 64-VP SNP) that broadcast IPI can stall long enough waiting for VPs running VTL0 to context-switch that the openhcl kernel logs rcu: INFO: rcu_preempt self-detected stall on CPU ... smp_call_function_many_cond+0x113/0x300 __x64_sys_membarrier+0x27c/0x360 from inside one of the 'tp' worker threads, and dependent tests eventually hit their nextest timeout. minircu's own docs prescribe the fix: // For best performance, ensure all threads in your process call // 'quiesce' when a thread is going to sleep or block. Call minircu::global().quiesce() in the io-uring worker loop immediately before submit_and_wait. Once quiesced, the worker is invisible to membarrier broadcasts until it next enters a critical section, at which point ThreadData::enter_slow already emits a fence(SeqCst) to publish the transition, so correctness is preserved. This mirrors the existing quiesce call in the VTL2 VP loop in virt_mshv_vtl.

github-actions · 2026-05-14T16:55:26Z

⚠️ Unsafe Code Detected

This PR modifies files containing unsafe Rust code. Extra scrutiny is required during review.

For more on why we check whole files, instead of just diffs, check out the Rustonomicon

Copilot

Pull request overview

This PR addresses recurring Linux kernel rcu_preempt self-detected stall warnings seen in OpenHCL on large isolated VMs by ensuring pal_uring io_uring worker threads explicitly quiesce the global minircu domain immediately before blocking in io_uring_enter (via submit_and_wait). This reduces unnecessary process-wide membarrier(PRIVATE_EXPEDITED) broadcasts triggered by synchronize_blocking() writers when idle worker threads remain registered as non-quiesced readers.

Changes:

Quiesce the global minircu domain in the io_uring worker loop just before blocking on submit_and_wait.
Add minircu as a Linux-only dependency of pal_uring (and update Cargo.lock accordingly).

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated no comments.

File	Description
support/pal/pal_uring/src/threadpool.rs	Quiesces the global RCU domain before blocking in the worker loop to avoid unnecessary membarrier broadcasts from writers.
support/pal/pal_uring/Cargo.toml	Adds `minircu` as a `target_os = "linux"` dependency needed for the new quiesce call.
Cargo.lock	Records the new `pal_uring -> minircu` dependency edge.

smalis-msft requested a review from a team as a code owner May 14, 2026 16:51

Copilot AI review requested due to automatic review settings May 14, 2026 16:51

github-actions Bot added the unsafe Related to unsafe code label May 14, 2026

chris-oo approved these changes May 14, 2026

View reviewed changes

Copilot started reviewing on behalf of smalis-msft May 14, 2026 16:57 View session

Copilot AI reviewed May 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pal_uring: quiesce minircu before blocking in io_uring worker#3490

pal_uring: quiesce minircu before blocking in io_uring worker#3490
smalis-msft wants to merge 1 commit into
microsoft:mainfrom
smalis-msft:pal_uring-quiesce-rcu

smalis-msft commented May 14, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

smalis-msft commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Fix

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

smalis-msft commented May 14, 2026 •

edited

Loading