Reduce blame test flakiness: increase hang dump timeout to 10s#15590
Reduce blame test flakiness: increase hang dump timeout to 10s#15590nohwnd merged 1 commit intomicrosoft:mainfrom
Conversation
HangDumpOnTimeout used 3s and HangDumpChildProcesses used 5s, both too tight for CI where process startup can be slow. Increase to 10s which is the safe minimum to avoid flakiness. This fixes the HangDumpChildProcesses failure seen across multiple PRs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Increases blame hang-dump timeouts in acceptance tests to reduce CI flakiness caused by slow process startup on hosted agents.
Changes:
- Increased
HangDumpOnTimeoutblameTestTimeoutfrom 3s to 10s. - Increased
HangDumpChildProcessesblameTestTimeoutfrom 5s to 10s. - Added inline comments warning against reducing the timeout due to flakiness.
| // Don't reduce this, 10s is about the safe minimum to not have flakiness. | ||
| arguments = string.Concat(arguments, $@" /Blame:""CollectHangDump;HangDumpType=mini;TestTimeout=10s"" /Diag:{TempDirectory.Path}/log.txt"); |
There was a problem hiding this comment.
The new timeout value (10s) and rationale are embedded inline, and the same comment/value are duplicated across tests. To make future adjustments safer and consistent, consider introducing a shared constant (e.g., const string SafeHangDumpTimeout = \"10s\"; or TimeSpan) and reusing it in both places, with a single comment explaining why that value is required.
| // Don't reduce this, 10s is about the safe minimum to not have flakiness. | ||
| arguments = string.Concat(arguments, $@" /Blame:""CollectHangDump;HangDumpType=mini;TestTimeout=10s"""); |
There was a problem hiding this comment.
Same duplication as above: the TestTimeout=10s string and warning comment are repeated. Recommend factoring into a shared constant/helper to avoid drift if one test is updated and the other is not.
|
/azp run |
|
/azp run |
2 similar comments
|
/azp run |
|
/azp run |
Problem
\HangDumpOnTimeout\ (3s timeout) and \HangDumpChildProcesses\ (5s timeout) are flaky in CI because process startup can be slow on hosted agents. \HangDumpChildProcesses\ is currently failing across 6 open PRs.
Fix
Increase both timeouts to 10s — the safe minimum to avoid flakiness without making tests unnecessarily slow.
Failing PRs this unblocks
Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com