Skip to content

[vs18.5] Fix race condition in LoggingService.ProcessLoggingEvent after shutdo…#13497

Open
SimaTian wants to merge 1 commit intovs18.5from
backport/vs18.5/fix-logging-race-condition
Open

[vs18.5] Fix race condition in LoggingService.ProcessLoggingEvent after shutdo…#13497
SimaTian wants to merge 1 commit intovs18.5from
backport/vs18.5/fix-logging-race-condition

Conversation

@SimaTian
Copy link
Copy Markdown
Member

@SimaTian SimaTian commented Apr 7, 2026

…wn (#13450)

Summary

Fix a race condition crash (access violation 0xC0000005 / NullReferenceException) in LoggingService.ProcessLoggingEvent when the method is called after the logging service has been shut down.

Problem

ProcessLoggingEvent directly accesses _eventQueue, _dequeueEvent, and _enqueueEvent without any shutdown guard. During ShutdownComponent(), CleanLoggingEventProcessing() disposes and nullifies all of these fields. If an external callback (e.g., Process.Exited from a ProjectCachePlugin) attempts to log a message after shutdown has completed, the null field access causes an unrecoverable crash.

Reproduction Scenario

  1. A build runs with a ProjectCachePlugin
  2. The build completes and LoggingService.ShutdownComponent() is called
  3. The cache plugin's external process exits, triggering Process.OnExited on a ThreadPool thread
  4. The callback logs "Project cache service process exited" via LoggingServiceToPluginLoggerAdapter.LogMessage
  5. This calls ProcessLoggingEvent() which accesses _eventQueue.Countnull reference crash

Debugger Evidence

_serviceState = Shutdown
_eventQueue = null
_dequeueEvent = null
_enqueueEvent = null
_emptyQueueEvent = null
_loggingEventProcessingCancellation = null

buildEvent.Message = "Project cache service process exited"
Call: System.Diagnostics.Process.OnExited -> ProjectCachePlugin.dll -> LoggingServiceToPluginLoggerAdapter.LogMessage

Root Cause

ProcessLoggingEvent lacked two protections that already exist elsewhere in the codebase:

  1. Shutdown state guard - Other public methods (RegisterLogger, RegisterDistributedLogger, InitializeNodeLoggers, InitializeComponent) all check _serviceState != LoggingServiceState.Shutdown. ProcessLoggingEvent did not.

  2. Local field capture - The LoggingEventProc queue pump thread already captures _eventQueue, _dequeueEvent, _emptyQueueEvent, and _enqueueEvent into local variables to prevent races with CleanLoggingEventProcessing(). ProcessLoggingEvent accessed the fields directly.

Fix

Three layers of defense, all following patterns already established in the same file:

  1. Early return when _serviceState == LoggingServiceState.Shutdown
  • silently drops the event, since the service is no longer operational.
  1. Capture fields to locals before use - prevents null dereference if CleanLoggingEventProcessing() runs concurrently between the state check and the field reads.
  2. Null-check the captured locals - second defense against the TOCTOU race between step 1 and step 2.

Risk

Low. The fix only adds early-exit guards - no behavioral change for the normal (non-shutdown) code path. The patterns used are already established in the same file.

Fixes #

Context

Changes Made

Testing

Notes

…wn (#13450)

## Summary

Fix a race condition crash (access violation 0xC0000005 /
NullReferenceException) in `LoggingService.ProcessLoggingEvent` when the
method is called after the logging service has been shut down.

## Problem

`ProcessLoggingEvent` directly accesses `_eventQueue`, `_dequeueEvent`,
and `_enqueueEvent` without any shutdown guard. During
`ShutdownComponent()`, `CleanLoggingEventProcessing()` disposes and
nullifies all of these fields. If an external callback (e.g.,
`Process.Exited` from a `ProjectCachePlugin`) attempts to log a message
after shutdown has completed, the null field access causes an
unrecoverable crash.

### Reproduction Scenario

1. A build runs with a `ProjectCachePlugin`
2. The build completes and `LoggingService.ShutdownComponent()` is
called
3. The cache plugin's external process exits, triggering
`Process.OnExited` on a ThreadPool thread
4. The callback logs "Project cache service process exited" via
`LoggingServiceToPluginLoggerAdapter.LogMessage`
5. This calls `ProcessLoggingEvent()` which accesses `_eventQueue.Count`
→ **null reference crash**

### Debugger Evidence

```
_serviceState = Shutdown
_eventQueue = null
_dequeueEvent = null
_enqueueEvent = null
_emptyQueueEvent = null
_loggingEventProcessingCancellation = null

buildEvent.Message = "Project cache service process exited"
Call: System.Diagnostics.Process.OnExited -> ProjectCachePlugin.dll -> LoggingServiceToPluginLoggerAdapter.LogMessage
```

## Root Cause

`ProcessLoggingEvent` lacked two protections that already exist
elsewhere in the codebase:

1. **Shutdown state guard** - Other public methods (`RegisterLogger`,
`RegisterDistributedLogger`, `InitializeNodeLoggers`,
`InitializeComponent`) all check `_serviceState !=
LoggingServiceState.Shutdown`. `ProcessLoggingEvent` did not.

2. **Local field capture** - The `LoggingEventProc` queue pump thread
already captures `_eventQueue`, `_dequeueEvent`, `_emptyQueueEvent`, and
`_enqueueEvent` into local variables to prevent races with
`CleanLoggingEventProcessing()`. `ProcessLoggingEvent` accessed the
fields directly.

## Fix

Three layers of defense, all following patterns already established in
the same file:

1. **Early return** when `_serviceState == LoggingServiceState.Shutdown`
- silently drops the event, since the service is no longer operational.
2. **Capture fields to locals** before use - prevents null dereference
if `CleanLoggingEventProcessing()` runs concurrently between the state
check and the field reads.
3. **Null-check the captured locals** - second defense against the
TOCTOU race between step 1 and step 2.

## Risk

**Low.** The fix only adds early-exit guards - no behavioral change for
the normal (non-shutdown) code path. The patterns used are already
established in the same file.
Copilot AI review requested due to automatic review settings April 7, 2026 11:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens LoggingService.ProcessLoggingEvent against a shutdown-time race where internal queue fields are disposed/nullified, preventing crashes from late-arriving log events (for example from plugin callbacks running on thread pool threads after the build completes).

Changes:

  • Add a shutdown guard and local-field capture in ProcessLoggingEvent to avoid null dereferences and handle disposed wait handles safely.
  • Add unit tests covering post-shutdown calls and concurrent shutdown/logging scenarios.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/Build/BackEnd/Components/Logging/LoggingService.cs Adds shutdown/race defenses to async event enqueueing to prevent post-shutdown crashes.
src/Build.UnitTests/BackEnd/LoggingService_Tests.cs Adds regression tests for logging after shutdown and during concurrent shutdown.

Comment on lines +1093 to +1096
LoggingService loggingService = (LoggingService)LoggingService.CreateLoggingService(LoggerMode.Asynchronous, 1);
((IBuildComponent)loggingService).InitializeComponent(mockHost);
loggingService.RegisterLogger(new ConsoleLogger());

Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new tests register ConsoleLogger, which writes to stdout and can introduce noisy/flaky test output. Prefer using MockLogger (optionally constructed with ITestOutputHelper) so assertions can be made against captured events without depending on console IO.

Copilot uses AI. Check for mistakes.
Comment on lines +1164 to +1178
startSignal.WaitOne();
for (int i = 0; i < 100; i++)
{
try
{
BuildMessageEventArgs msg = new BuildMessageEventArgs($"Message {i}", null, null, MessageImportance.Low);
loggingService.ProcessLoggingEvent(msg);
}
catch (Exception ex)
{
Interlocked.CompareExchange(ref caughtException, ex, null);
break;
}
}
});
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logThread is created as a foreground Thread and waits on startSignal.WaitOne() with no timeout. If the test fails before startSignal.Set() (or if the thread blocks unexpectedly), it can leave a live foreground thread and hang the test host process. Set IsBackground = true and/or use a timed wait with an assertion so the thread can’t outlive the test.

Suggested change
startSignal.WaitOne();
for (int i = 0; i < 100; i++)
{
try
{
BuildMessageEventArgs msg = new BuildMessageEventArgs($"Message {i}", null, null, MessageImportance.Low);
loggingService.ProcessLoggingEvent(msg);
}
catch (Exception ex)
{
Interlocked.CompareExchange(ref caughtException, ex, null);
break;
}
}
});
try
{
bool signaled = startSignal.WaitOne(TimeSpan.FromSeconds(10));
signaled.ShouldBeTrue("Logging thread did not receive the start signal within the allotted time.");
for (int i = 0; i < 100; i++)
{
try
{
BuildMessageEventArgs msg = new BuildMessageEventArgs($"Message {i}", null, null, MessageImportance.Low);
loggingService.ProcessLoggingEvent(msg);
}
catch (Exception ex)
{
Interlocked.CompareExchange(ref caughtException, ex, null);
break;
}
}
}
catch (Exception ex)
{
Interlocked.CompareExchange(ref caughtException, ex, null);
}
})
{
IsBackground = true
};

Copilot uses AI. Check for mistakes.
Comment on lines +1078 to +1087
#region ProcessLoggingEvent After Shutdown Tests

/// <summary>
/// Verify that ProcessLoggingEvent does not crash when called after the
/// LoggingService has been shut down (asynchronous mode). This reproduces
/// the race condition where an external callback (e.g., Process.Exited)
/// tries to log after ShutdownComponent has nullified internal state.
/// </summary>
[Fact]
public void ProcessLoggingEventAfterShutdown_Asynchronous_DoesNotThrow()
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These added tests don't use TestEnvironment / ITestOutputHelper patterns that are used broadly in this test suite to ensure deterministic cleanup and capture diagnostic output in CI. Consider updating the test class to accept ITestOutputHelper and creating a TestEnvironment in the new tests so failures include useful logs and any test state is reliably reverted.

Copilot uses AI. Check for mistakes.
@JanProvaznik JanProvaznik changed the title Fix race condition in LoggingService.ProcessLoggingEvent after shutdo… [vs18.5] Fix race condition in LoggingService.ProcessLoggingEvent after shutdo… Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants