Skip to content

fix: Handle stray :DOWN messages in LogAccumulator#31

Open
erik-the-implementer wants to merge 5 commits intomainfrom
fix/log-accumulator-down-crash
Open

fix: Handle stray :DOWN messages in LogAccumulator#31
erik-the-implementer wants to merge 5 commits intomainfrom
fix/log-accumulator-down-crash

Conversation

@erik-the-implementer
Copy link
Copy Markdown
Contributor

Summary

  • Fixes FunctionClauseError crash in LogAccumulator when receiving :DOWN messages for task refs that have already been cleaned up by block_until_any_task_ready
  • Properly demonitors tasks and removes refs from pending_tasks in all code paths that handle task completion
  • Adds catch-all handle_info clause as safety net for any remaining edge cases

Root cause

The LogAccumulator has two code paths that consume task completion messages:

  1. handle_info/2 — via the normal GenServer message loop
  2. block_until_any_task_ready/1 — via a raw receive block when blocking on concurrent request limits

When block_until_any_task_ready received {ref, result}, it removed the ref from pending_tasks. The subsequent :DOWN message for that same process then arrived via handle_info, but with the ref no longer in pending_tasks, the guard failed and no catch-all existed — causing the crash.

Additionally, handle_info({ref, result}, state) had a comment saying "Remove the task from the pending tasks map" but the actual code returned state unchanged.

Test plan

  • All 45 tests pass (44 existing + 1 new)
  • New integration test exercises the specific crash scenario: repeated export failures with max_buffer_size=1 and otlp_concurrent_requests=1 to force the block_until_any_task_ready code path

Closes #18

🤖 Generated with Claude Code

erik-the-implementer and others added 5 commits March 20, 2026 16:38
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When block_until_any_task_ready receives a task result {ref, result},
it removes the ref from pending_tasks. The subsequent :DOWN message for
that same task process then arrives via handle_info, but the guard
is_map_key(state.pending_tasks, ref) fails since the ref was already
removed. With no catch-all clause, this causes a FunctionClauseError
crash.

Three fixes applied:
1. handle_info({ref, result}, state) now properly demonitors the task
   and removes the ref from pending_tasks (previously it did neither
   despite a comment saying it did)
2. block_until_any_task_ready now demonitors with flush when receiving
   a task result, preventing the orphaned :DOWN message
3. Added catch-all handle_info clause as safety net for any stray
   messages that slip through edge cases

Closes #18

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Exercises the code path where block_until_any_task_ready consumes task
results, generating stray :DOWN messages that previously crashed the
LogAccumulator.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

When export fails repeatedly due to the remote server being unavailable, LogAccumulator crashes hard

1 participant