Skip to content

CI: Retry powershell commands on transient AccessViolationExceptions#3457

Draft
smalis-msft wants to merge 11 commits into
microsoft:mainfrom
smalis-msft:disable-powershell-logging
Draft

CI: Retry powershell commands on transient AccessViolationExceptions#3457
smalis-msft wants to merge 11 commits into
microsoft:mainfrom
smalis-msft:disable-powershell-logging

Conversation

@smalis-msft
Copy link
Copy Markdown
Contributor

@smalis-msft smalis-msft commented May 11, 2026

Problem

Petri-based VMM tests intermittently fail on Windows CI runners with powershell.exe crashing during session initialization:

exit code: 0xDEAD
System.AccessViolationException ... at EventLogLogProvider.LogProviderLifecycleEvent

The crash happens in InitialSessionState.Bind_LoadProviders before the user's command is dispatched, so retrying is idempotent.

Approach

Bake a tightly-scoped retry into PowerShellBuilder:

  • support/powershell_builder gains PowerShellBuilder::output(). These run the command and retry up to twice on the exact AV signature (exit 0xDEAD + stderr containing both System.AccessViolationException and EventLogLogProvider). Other failures are returned immediately.
  • petri::run_host_ps(builder) is the PowerShell entry point used by all hyperv/powershell.rs cmdlet wrappers. It pipes stdio + logs identically to run_host_cmd but goes through PowerShellBuilder::output(), so retry is automatic for every callsite.
  • petri::run_host_cmd(Command) is unchanged for non-PowerShell tools (hvc.exe, vmgs).
  • flowey_lib_hvlite::run_prep_steps likewise calls .output() on the builder.

The retry signature is precise enough that no false positives are expected, and the retried call is the same Command invocation — no partial side effects to worry about.

Alternatives considered

  • pwsh.exe (PowerShell 7): blocked — runner images don't have it preinstalled.
  • Disabling Windows PowerShell engine event logging via registry (EnableEventLogging=0): verified ineffective in CI run 25811224088LogProviderLifecycleEvent fires regardless of that switch.
  • Installing PS7 on runners: rejected (don't want a new image dep).

Copilot AI review requested due to automatic review settings May 11, 2026 21:34
@smalis-msft smalis-msft requested review from a team as code owners May 11, 2026 21:34
@smalis-msft smalis-msft added the release-ci-required Add to a PR to trigger PR gates in release mode label May 11, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a Flowey CI step to disable Windows PowerShell engine event logging on Windows runners to mitigate transient powershell.exe startup crashes that cause test flakiness.

Changes:

  • Introduces a new Flowey node to harden PowerShell event logging via registry settings.
  • Wires the hardening node into the Windows test job’s pre-run dependencies.
  • Adds explicit CI workflow steps to run the hardening node before starting test services.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
flowey/flowey_lib_hvlite/src/lib.rs Exposes the new hardening module from the crate.
flowey/flowey_lib_hvlite/src/harden_powershell_event_log.rs Implements the Windows runner hardening step (registry + event source check).
flowey/flowey_lib_hvlite/src/_jobs/consume_and_test_nextest_vmm_tests_archive.rs Ensures hardening runs before tests on Windows.
ci-flowey/openvmm-pr.yaml Adds pipeline steps to run the hardening node before starting test services.
.github/workflows/openvmm-pr.yaml Adds GitHub Actions steps to run the hardening node in multiple Windows jobs.
.github/workflows/openvmm-pr-release.yaml Adds the hardening step to Windows release PR workflow jobs.
.github/workflows/openvmm-ci.yaml Adds the hardening step to Windows CI workflow jobs.

Comment thread flowey/flowey_lib_hvlite/src/harden_powershell_event_log.rs Outdated
Comment thread flowey/flowey_lib_hvlite/src/harden_powershell_event_log.rs Outdated
Comment thread flowey/flowey_lib_hvlite/src/harden_powershell_event_log.rs Outdated
Comment thread flowey/flowey_lib_hvlite/src/harden_powershell_event_log.rs Outdated
Comment thread flowey/flowey_lib_hvlite/src/harden_powershell_event_log.rs Outdated
Comment thread .github/workflows/openvmm-pr.yaml Outdated
Copilot AI review requested due to automatic review settings May 13, 2026 14:06
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Comment thread flowey/flowey_lib_hvlite/src/harden_powershell_event_log.rs Outdated
Copilot AI review requested due to automatic review settings May 13, 2026 14:16
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.

Copilot AI review requested due to automatic review settings May 13, 2026 18:14
@smalis-msft smalis-msft changed the title CI: Disable powershell engine event logging before running tests CI: Add retries to powershell commands when early AccessViolationExceptions occur May 13, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 4 changed files in this pull request and generated 4 comments.

Comment thread support/powershell_builder/src/lib.rs Outdated
Comment thread support/powershell_builder/src/lib.rs Outdated
Comment thread support/powershell_builder/src/lib.rs Outdated
Comment thread petri/src/lib.rs Outdated
@smalis-msft smalis-msft changed the title CI: Add retries to powershell commands when early AccessViolationExceptions occur WIP CI: Add retries to powershell commands when early AccessViolationExceptions occur May 13, 2026
@smalis-msft smalis-msft changed the title WIP CI: Add retries to powershell commands when early AccessViolationExceptions occur CI: Retry powershell commands on transient EventLog AccessViolationException May 13, 2026
Copilot AI review requested due to automatic review settings May 13, 2026 18:58
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 6 changed files in this pull request and generated 1 comment.

Comment thread support/powershell_builder/src/lib.rs
@smalis-msft smalis-msft changed the title CI: Retry powershell commands on transient EventLog AccessViolationException CI: Retry powershell commands on transient AccessViolationExceptions May 13, 2026
@smalis-msft smalis-msft marked this pull request as draft May 14, 2026 20:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-ci-required Add to a PR to trigger PR gates in release mode

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants