Skip to content

feat: _ANR detector and Sentry reporting Minidump files#8811

Open
NickKhalow wants to merge 44 commits into
devfrom
feat/anr-detection
Open

feat: _ANR detector and Sentry reporting Minidump files#8811
NickKhalow wants to merge 44 commits into
devfrom
feat/anr-detection

Conversation

@NickKhalow
Copy link
Copy Markdown
Contributor

@NickKhalow NickKhalow commented May 19, 2026

Pull Request Description

What does this PR change?

Features

Key components

  • DclAnrIntegration — Sentry ISdkIntegration that creates and registers the watchdog
  • DclAnrWatchDog (abstract) — Base watchdog with pause/resume awareness, configurable timeout, and minidump collection on report
  • DclAnrWatchDogMultiThreaded — Desktop implementation: a background thread monitors main-thread heartbeats via a coroutine tick counter
  • DclAnrWatchDogSingleThreaded — WebGL fallback: coroutine-based elapsed-time check (single-threaded environment)
  • DclApplicationNotRespondingException — Custom exception type with Sentry mechanism metadata; on Windows, carries the dump file path for attachment
  • ThreadsDumpUtility — Windows-only utility that:
    • Calls the native MiniDumpWriteDump Win32 API (via MiniDumpNative) to generate a .dmp file — no external binaries required
    • Opens a process handle with PROCESS_QUERY_INFORMATION | PROCESS_VM_READ via ProcessInfoNative, with proper IDisposable handle cleanup
    • Polls for file readiness with a 5s timeout to handle async write completion
    • Compresses the dump into a ZIP archive for smaller Sentry attachment payloads
    • Caches Application.persistentDataPath and Application.streamingAssetsPath on the main thread ([RuntimeInitializeOnLoadMethod]) since Unity APIs are unavailable from background threads
    • Provides a Unity Editor menu item (Tools/ProcDump/Dump Current) for quick testing during development
  • MiniDumpNative — Static class wrapping the Dbghelp.dll MiniDumpWriteDump P/Invoke with thread-info, handle-data, and unloaded-modules dump flags
  • ProcessInfoNative / ProcessHandle — Win32 process handle management with IDisposable for safe cleanup
  • DclProcesses.ExecuteBlocking / dcl_start_process_blocking — New native C function for blocking process execution (waits for child exit and returns exit code), available on both Windows (_spawnvp _P_WAIT) and macOS (posix_spawnp + waitpid)

Debug chat commands (registered in DynamicWorldContainer)

  • /anr-simulate [ms] — Freezes the main thread for the specified duration (default 10s) to trigger ANR detection
  • /anr-dump — Manually collects and archives a process dump to the app directory (Windows only); runs on a background thread via ExecuteOnThreadPoolScope

Sentry diagnostics & CI

  • Sentry status loggingDiagnosticInfoUtils now logs whether Sentry is enabled, plus its environment, release, and DSN status at startup
  • enable-sentry PR label — The build-unitycloud.yml workflow now checks for an enable-sentry label on PR builds, allowing Sentry to be activated for specific PRs without requiring a manual workflow input
  • script-debugging PR label / script_debugging workflow input — Enables AllowDebugging build option (forces a Development build). Can be toggled via the script-debugging label on PR builds or the script_debugging input for workflow_dispatch / workflow_call triggers
  • Application arguments formatting — Log output for app args now uses separator lines for better readability
  • S3 upload URL reporting — New CI step prints the artifact upload URL to the job summary for easy access
  • Command error logging — Chat command execution errors are now reported via ReportHub.LogError in addition to the user-facing error message

Native code changes

  • dcl_processes.c / .h — Added dcl_start_process_blocking() for synchronous child process execution; also fixed a CloseHandle leak in get_process_name on Windows (handle was closed after the buffer was freed)
  • DCLProcesses.dll / .exp / .lib — Rebuilt native binaries with the new blocking export
  • DclProcessesNativeMethods.cs — Added managed P/Invoke binding for dcl_start_process_blocking

The built-in Sentry ANR integration is explicitly disabled (DisableAnrIntegration()) to avoid duplicate detection.


Test Instructions

Steps (standard run):

metaforge explorer run 8811

Expected result:

  • Explorer launches and runs normally without false-positive ANR reports

Steps (fresh account):

metaforge account create --clear
metaforge explorer run 8811

Expected result:

  • Same as above — no spurious ANR reports during normal operation

Prerequisites

  • Windows build for minidump verification (macOS dump collection not yet supported)
  • Sentry dashboard access to verify ANR event format and dump attachment

Test Steps

  1. Launch the explorer and navigate normally — no ANR events should fire
  2. Use the /anr-simulate chat command (or /anr-simulate 8000 for 8s) to freeze the main thread and trigger ANR detection
  3. Verify Sentry receives a DclApplicationNotRespondingException event with a .dmp file attachment (ZIP-compressed)
  4. Use the /anr-dump chat command to manually collect a dump and confirm the file paths are printed
  5. In Unity Editor (Windows), use Tools > ProcDump > Dump Current to verify the editor integration works
  6. Confirm the built-in Sentry ANR integration does not fire duplicate events
  7. Add the enable-sentry label to a test PR and verify Sentry is enabled in the resulting build
  8. Add the script-debugging label to a test PR and verify the build includes AllowDebugging and Development flags
  9. Check startup logs for the new Sentry status section (Enabled, Environment, Release, DSN)

Additional Testing Notes

  • macOS does not yet support dump collection — ANR events will report "Dump is not available on macOS yet"
  • WebGL uses a single-threaded fallback (coroutine-based timing) — verify no false positives on that platform
  • The watchdog pauses during app background/focus-loss to avoid false reports
  • Dump file readiness is polled with a 5s timeout — if MiniDumpWriteDump takes longer, the dump will fail gracefully
  • Minidump uses MiniDumpNormal | WithThreadInfo | WithHandleData | WithUnloadedModules flags (equivalent to procdump -mt)

Quality Checklist

  • Changes have been tested locally
  • Documentation has been updated (if required)
  • Performance impact has been considered
  • For SDK features: Test scene is included

Code Review Reference

Please review our Branch & PR Standards before submitting. It explains the automated review flow, QA/DEV approval requirements, and what each label does — especially useful for first-time contributors.

@NickKhalow NickKhalow self-assigned this May 19, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 19, 2026

@NickKhalow NickKhalow added force-build Used to trigger a build on draft PR and removed force-build Used to trigger a build on draft PR labels May 19, 2026
@NickKhalow NickKhalow added force-build Used to trigger a build on draft PR clean-build Used to trigger clean build on PR labels May 19, 2026
@NickKhalow NickKhalow added the enable-sentry Enables Sentry integration label May 19, 2026
NickKhalow and others added 2 commits May 20, 2026 19:26
@m3taphysics
Copy link
Copy Markdown
Collaborator

PR #8811, run #26234952932

Builds: Windows change, Windows baseline, macOS change, macOS baseline

Framework 13 i7

Metric Change Baseline Delta Improvement
Samples 2700 2700
CPU average 33.3 ms 33.3 ms 0.0 ms -0.0%
CPU 1% worst 33.8 ms 33.8 ms 0.0 ms -0.1%
CPU 0.1% worst 36.8 ms 36.7 ms 0.1 ms -0.4%
GPU average 8.3 ms 8.4 ms -0.1 ms 1.4%
GPU 1% worst 19.3 ms 18.3 ms 1.1 ms -5.8% 🔴
GPU 0.1% worst 25.1 ms 24.7 ms 0.4 ms -1.7%

@NickKhalow NickKhalow marked this pull request as ready for review May 21, 2026 15:38
@NickKhalow NickKhalow requested review from a team as code owners May 21, 2026 15:38
@claude

This comment has been minimized.

@NickKhalow NickKhalow changed the title feat: anr detection feat: ANR detector and Sentry reporting Minidump files May 21, 2026
@m3taphysics
Copy link
Copy Markdown
Collaborator

PR #8811, run #26238775740

Builds: Windows change, Windows baseline, macOS change, macOS baseline

Framework 13 i7

Metric Change Baseline Delta Improvement
Samples 2700 2699
CPU average 33.3 ms 33.3 ms -0.0 ms 0.0%
CPU 1% worst 33.5 ms 33.6 ms -0.1 ms 0.3%
CPU 0.1% worst 33.7 ms 33.8 ms -0.1 ms 0.4%
GPU average 7.2 ms 7.4 ms -0.1 ms 2.0%
GPU 1% worst 18.6 ms 20.4 ms -1.9 ms 9.1% 🟢
GPU 0.1% worst 24.7 ms 26.6 ms -1.8 ms 6.9% 🟢

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 21, 2026

Claude PR Review — security re-review after minidump collect via native api

Re-reviewing the latest changes on top of the previous review.

  • Read CLAUDE.md and project docs
  • Diff current branch against origin/dev
  • Check the new native minidump path (replaces procdump?)
  • Re-evaluate dependency surface (procdump still bundled?)
  • Re-evaluate workflow changes
  • Post updated review

View job run · branch feat/anr-detection

@NickKhalow NickKhalow changed the title feat: ANR detector and Sentry reporting Minidump files feat: _ANR detector and Sentry reporting Minidump files May 21, 2026
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 21, 2026

Claude PR Review — security re-review after minidump collect via native api

Re-reviewing the latest changes on top of the previous review.

  • Read CLAUDE.md and project docs
  • Diff current branch against origin/dev
  • Check the new native minidump path (replaces procdump?)
  • Re-evaluate dependency surface (procdump still bundled?)
  • Re-evaluate workflow changes
  • Post updated review

View job run · branch feat/anr-detection

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 21, 2026

Claude finished @NickKhalow's task in 4m 13s —— View job


Claude PR Review — security re-review after minidump collect via native api

  • Read CLAUDE.md and project docs
  • Diff current branch against origin/dev
  • Check the new native minidump path (replaces procdump?)
  • Re-evaluate dependency surface (procdump still bundled?)
  • Re-evaluate workflow changes
  • Post updated review

Headline change since my previous review

procdump is gone. The three procdump*.exe files are no longer in the tree (commits 187523902 strip procdump down and b93107794 minidump collect via native api). Dump collection now goes through Dbghelp.dll!MiniDumpWriteDump via P/Invoke in DclAnrIntegration.cs:546-589. That directly addresses recommendation #3 from my previous review and resolves the three MEDIUM concerns that were on it:

Previous MEDIUM concern Status
Microsoft Sysinternals redistribution license Resolved — nothing redistributed
Authenticode signature verification of bundled procdump*.exe Resolved — binaries removed
Disproportionate capability shipped to every Windows user Resolved — in-process API, no external .exe

The new path is also tighter: opens the current process with PROCESS_QUERY_INFORMATION | PROCESS_VM_READ (the minimum that MiniDumpWriteDump requires for self-dump — no SeDebugPrivilege, no full-memory dump), and the chosen MINIDUMP_TYPE is MiniDumpNormal | MiniDumpWithThreadInfo | MiniDumpWithHandleData | MiniDumpWithUnloadedModules — deliberately not MiniDumpWithFullMemory, which keeps the attachment small and limits data exfiltration surface.


Workflow review — .github/workflows/build-unitycloud.yml

Diff is unchanged from the previous review:

  1. New script_debugging input + script-debugging PR label → adds Development + AllowDebugging to build options.
  2. New enable-sentry PR label → alternative path to the existing sentry_enabled input.
  3. DESTINATION_PATH refactored into its own step + new "Print S3 upload URL" step writing to GITHUB_STEP_SUMMARY.

W.1–W.10: N/A — workflow is pull_request (not pull_request_target), no LLM step, no untrusted text routed into a prompt, no new allowedTools/wildcards. The label-driven branches all read from github.event.pull_request.labels, which on pull_request requires repo write to set, so external contributors cannot flip sentry_enabled / AllowDebugging for themselves. Fork PRs do not have access to SENTRY_DSN, SENTRY_CLI_AUTH_TOKEN, EXPLORER_TEAM_AWS_*, or the SSL.com code-signing secrets.

Pre-existing, still worth a follow-up: sslcom/esigner-codesign@develop is pinned to a branch on a step that consumes the Windows code-signing TOTP secrets (build-unitycloud.yml:599). Recommend SHA-pinning (W.7a). Not introduced by this PR.


Dependency / binary surface review

File Change Type Scope Risk Confidence
procdump.exe, procdump64.exe, procdump64a.exe Removed Sysinternals PE runtime, Win n/a HIGH
Explorer/Assets/Plugins/DclNativeProcesses/DCLProcesses.dll (+.exp/.lib) Rebuilt native, built from in-repo dcl_processes.c runtime, Win LOW HIGH
Explorer/Assets/Plugins/DclNativeProcesses/libDCLProcesses.dylib Not rebuilt (last touched 27b999ed9, 2026-05-05) native runtime, macOS LOW HIGH
Utility.asmdef references 1087662aaf… + 75edf6fa50ff… (DCL.Native.Processes) Added asmdef ref LOW HIGH

Net delta vs my previous review: the supply-chain surface has shrunk. The only "binary added by this PR" is a rebuild of the in-repo DCLProcesses.dll, whose source is right next to it.

Native-source review (dcl_processes.c)

  • New export dcl_start_process_blocking (Windows: _spawnvp(_P_WAIT, …); macOS: posix_spawnp + waitpid with EINTR retry). Implementation looks correct.
  • Pre-existing memory-leak fix on the Windows path of get_process_name (now closes hProcess on the success branch) is still good.
  • dcl_start_process_blocking has zero C# callers anywhere in the repo — the move to MiniDumpWriteDump made it dead code in this PR. Worth removing in a follow-up to keep the native surface minimal.
  • libDCLProcesses.dylib drift persists. It wasn't rebuilt with the new dcl_start_process_blocking export. Today this is harmless because nothing calls ExecuteBlocking and the ANR path is #if UNITY_STANDALONE_WIN. Still worth rebuilding (or, since the function is unused, deleting the export from dcl_processes.c / DclProcessesNativeMethods.cs).

DEPENDENCY_REVIEW: PASS


Non-security code review — items from previous review still standing

These are minor and were called out last round; would be nice to clean up before merge:

  • Explorer/Assets/DCL/Chat/Commands/AnrDumpChatCommand.cs:2,6using Cysharp.Threading.Tasks; is still duplicated. Fix this →

  • Explorer/Assets/DCL/Chat/Commands/AnrSimulateChatCommand.cs:23-25int.TryParse return value is still ignored and Thread.Sleep(freezeMs) runs unconditionally. /anr-simulate -1ArgumentOutOfRangeException; /anr-simulate 2147483647 → ~25-day freeze. Clamp to e.g. [100, 60000] ms in ValidateParameters. Fix this →

  • Explorer/Assets/DCL/PerformanceAndDiagnostics/Diagnostics/ReportsHandling/Sentry/DclAnrIntegration.cs:35-73Watchdog.OnApplicationNotResponding += is still outside the if (Watchdog is null) guard. If Register runs twice (editor reload, or two integrations registered), the same ANR is reported once per Register call. Move the += inside the if. Fix this →

New non-security observations on the native-minidump path

  • DclAnrIntegration.cs:104-117 (NewDumpAttachment) — on dump failure, the original .dmp is not cleaned up; the only cleanup happens in CollectDumpInfoBase64. ANR-path attachments live in Application.persistentDataPath and will accumulate over time for users that experience repeated ANRs. Consider deleting on failure or on a TTL sweep at startup.
  • DclAnrIntegration.cs:546-589 (MiniDumpWriteDump) — Microsoft's guidance is that a hung-process dump is best collected out-of-process, because the dumping thread shares state with the hung one. In practice it usually succeeds for a stuck main thread because the watchdog runs from a background thread, but worth knowing it's a known constraint of the in-process approach.
  • Privacy note (informational): even without MiniDumpWithFullMemory, thread-info + handle-data captures thread stack memory and handle tables. Auth tokens, web3 auth chains, encrypted credential material that the main thread has recently touched can end up in the dump that gets uploaded to Sentry. Worth confirming with whoever signed off on Sentry data handling that minidump attachments are acceptable for production rollout — the existing // TODO: validate correctness before production rollout comment from the PR description should probably land in code (SentryBuildTimeConfiguration.cs:32) alongside options.DisableAnrIntegration().

Verdict

The PR is substantially cleaner than the previous revision. Procdump removal + in-process MiniDumpWriteDump resolves the dependency-supply-chain concerns I raised earlier, and the workflow / label changes remain safe (no pull_request_target, no LLM secret exposure).

DEPENDENCY_REVIEW: PASS

The three carry-over nits (duplicate using, unbounded Thread.Sleep, double-subscribed ANR handler) and the suggested macOS dylib rebuild / dead-code cleanup are non-blockers.
· branch feat/anr-detection

@m3taphysics
Copy link
Copy Markdown
Collaborator

PR #8811, run #26248729154

Builds: Windows change, Windows baseline, macOS change, macOS baseline

Framework 13 i7

Metric Change Baseline Delta Improvement
Samples 2700 2700
CPU average 33.3 ms 33.3 ms -0.0 ms 0.0%
CPU 1% worst 33.5 ms 34.0 ms -0.5 ms 1.5%
CPU 0.1% worst 34.2 ms 37.0 ms -2.8 ms 7.7% 🟢
GPU average 8.1 ms 8.3 ms -0.3 ms 3.5% 🟢
GPU 1% worst 17.8 ms 19.3 ms -1.6 ms 8.1% 🟢
GPU 0.1% worst 21.4 ms 23.8 ms -2.5 ms 10.4% 🟢

@NickKhalow NickKhalow changed the base branch from dev to main May 22, 2026 12:31
@balducciv
Copy link
Copy Markdown

Build: v0.149.0-alpha-feat/anr-detection-4b3a988
Tested on: Windows 11 + macOS (Apple M3 Pro)

Passed: ✅

  • Normal operation (old + new account) — no false-positive ANR events on either platform

  • /anr-simulate freezes the main thread and the app recovers normally

  • Sentry status section appears in startup logs (Enabled: True, DSN set)

  • App args formatting with separator lines working

  • WatchDog lifecycle (init → run → clean shutdown) confirmed on both platforms

  • Mac and Windows smoke test passed

Everything I can validate from the client side looks good on both platforms.

Sentry was verified here

Windows with dump
https://decentraland.sentry.io/issues/7497666819/?project=4510719707250688&query=&referrer=issue-stream

MacOS without dump
https://decentraland.sentry.io/issues/7206982691/?project=4510719707250688&query=&referrer=issue-stream
Player-prev tests 2-3-4-5-6.log

Logs:
Player Mac.log
Player old account test 1.log
Player-prev - new account test 1.log

@NickKhalow NickKhalow changed the base branch from main to dev May 22, 2026 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enable-sentry Enables Sentry integration force-build Used to trigger a build on draft PR new-dependency

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants