Skip to content

fix: Reset streamer in case of L1 reorg causing batch tx not to go through#1045

Open
lukeiannucci wants to merge 2 commits into
integration-v3.9.8from
li/fix-streamer-reset
Open

fix: Reset streamer in case of L1 reorg causing batch tx not to go through#1045
lukeiannucci wants to merge 2 commits into
integration-v3.9.8from
li/fix-streamer-reset

Conversation

@lukeiannucci
Copy link
Copy Markdown

@lukeiannucci lukeiannucci commented May 5, 2026

Closes #<ISSUE_NUMBER>

This PR:

This PR does not:

Key places to review:


@lukeiannucci lukeiannucci changed the title streamer fixes Reset streamer in case of L1 reorg causing batch tx not to go through May 5, 2026
@lukeiannucci lukeiannucci changed the title Reset streamer in case of L1 reorg causing batch tx not to go through fix: Reset streamer in case of L1 reorg causing batch tx not to go through May 5, 2026
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces L1 reorg handling by resetting the Espresso streamer if the batch position moves backwards and adds a mechanism to discard duplicate message positions during transaction parsing. Feedback suggests optimizing the reorg logic to prevent redundant resets during startup and moving the duplicate message check before RLP decoding to enhance efficiency.

Comment thread arbnode/batch_poster.go Outdated
Comment on lines +1858 to +1864
if uint64(batchPosition.MessageCount) < b.espressoStreamer.GetCurrentMessagePos() {
// Batch position moved backwards probably due to l1 reorg, try and resync the blocks from hotshot
log.Warn("resetting espresso streamer to parent chain (L1 reorg?)", "messageCount", batchPosition.MessageCount)
b.resetStreamerToParentChainOrConfigHotshotBlock(batchPosition.MessageCount, ctx)
} else {
b.espressoStreamer.AdvanceTo(uint64(batchPosition.MessageCount))
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current logic performs a reorg check and potential reset on every iteration where a new batch is being built. On startup (b.espressoRestarting == true), this leads to redundant operations: 1. b.espressoStreamer.GetCurrentMessagePos() is initialized to 1, so if batchPosition.MessageCount is 0, it triggers a warning and a reset at line 1861. 2. Then, line 1869 calls the exact same reset function again. 3. If batchPosition.MessageCount > 0, it calls AdvanceTo and then immediately Reset at line 1869, making the AdvanceTo call redundant. Wrapping the reorg check in if !b.espressoRestarting avoids these redundant calls and prevents a false-positive reorg warning on startup.

			if !b.espressoRestarting {
				if uint64(batchPosition.MessageCount) < b.espressoStreamer.GetCurrentMessagePos() {
					// Batch position moved backwards probably due to l1 reorg, try and resync the blocks from hotshot
					log.Warn("resetting espresso streamer to parent chain (L1 reorg?)", "messageCount", batchPosition.MessageCount)
					b.resetStreamerToParentChainOrConfigHotshotBlock(batchPosition.MessageCount, ctx)
				} else {
					b.espressoStreamer.AdvanceTo(uint64(batchPosition.MessageCount))
				}
			}

Comment thread espressostreamer/espresso_streamer.go Outdated
Comment on lines +394 to +397
if _, exists := s.messageWithMetadataAndPos[indices[i]]; exists {
log.Warn("duplicate message position, discarding", "pos", indices[i])
continue
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This check for duplicate message positions is a good addition for robustness. However, it would be more efficient to perform this check before the RLP decoding of the message (around line 382), as indices[i] is already available. This would save CPU cycles by avoiding unnecessary decoding for messages that are already in the buffer.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

✅ All tests successful. No failed tests were found.

📣 Thoughts on this report? Let Codecov know! | Powered by Codecov

@lukeiannucci lukeiannucci marked this pull request as ready for review May 5, 2026 21:03
@claude
Copy link
Copy Markdown

claude Bot commented May 5, 2026

Claude finished @lukeiannucci's task in 1m 46s —— View job


PR #1045 Review — Reset streamer on L1 reorg

  • Read existing review comments
  • Read PR diff and changed files
  • Analyze code changes in context
  • Post review feedback

Overall: This PR looks good. The changes are well-structured and address a real failure mode — an L1 reorg causing batchPosition.MessageCount to move backwards, leaving the streamer out of sync.

Summary of changes

  1. batch_poster.go — Restructured the startup vs. steady-state logic. The AdvanceTo call was previously unconditional before the restart check; now it's correctly placed in the else branch alongside new reorg detection. On each iteration where !espressoRestarting, if the batch position moved backwards, the streamer is reset via resetStreamerToParentChainOrConfigHotshotBlock. This cleanly separates startup reset from reorg reset.

  2. espresso_streamer.go — Added GetCurrentMessagePos() (properly guarded with RLock) and a duplicate message position check in parseEspressoTransaction. The duplicate check was also moved before RLP decoding, which addresses the Gemini review feedback about avoiding unnecessary decode work.

  3. espresso_caff_node_test.go — Mock updated to satisfy the expanded interface.

What looks good

  • The reorg detection logic (batchPosition.MessageCount < GetCurrentMessagePos()) is the right signal — a backwards move in the batch position is the clearest indicator of an L1 reorg affecting the batch poster.
  • The Reset() function correctly clears the entire buffer and resets both the message position and hotshot block number, which is the right thing to do after a reorg.
  • The startup path remains unchanged in behavior — it still resets and re-enqueues pending transactions.
  • The duplicate message check is a good defensive measure that prevents buffer corruption during reorg recovery, when HotShot blocks may be re-fetched.

Minor feedback

I left two inline comments:

  • A note on the race window between GetCurrentMessagePos() and Reset() — it's safe in practice since Reset() clears everything, but worth being aware of.
  • A suggestion to include the hotshot block number in the duplicate message warning log for easier debugging during reorg scenarios.

No blocking issues found.

Comment thread arbnode/batch_poster.go
if uint64(batchPosition.MessageCount) < b.espressoStreamer.GetCurrentMessagePos() {
// Batch position moved backwards probably due to l1 reorg, try and resync the blocks from hotshot
log.Warn("resetting espresso streamer to parent chain (L1 reorg?)", "messageCount", batchPosition.MessageCount)
b.resetStreamerToParentChainOrConfigHotshotBlock(batchPosition.MessageCount, ctx)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The reorg detection compares uint64(batchPosition.MessageCount) against GetCurrentMessagePos(), but the Reset() call passes batchPosition.MessageCount (type arbutil.MessageIndex) — which resetStreamerToParentChainOrConfigHotshotBlock then casts to uint64 internally. This is fine and consistent with the startup path on line 1862, just noting that the types match up.

One thing worth considering: after the reorg reset, the streamer's messageWithMetadataAndPos buffer is fully cleared (line 193 of espresso_streamer.go). If the streamer's background goroutine has already fetched and buffered messages for positions beyond the new currentMessagePos, those will be lost and need to be re-fetched. This is the correct behavior for a reorg scenario, but I want to confirm: is there any risk that the background fetcher could race and re-populate stale data between the GetCurrentMessagePos() check and the Reset() call? Both happen under different lock acquisitions.

The window is small, but in theory:

  1. GetCurrentMessagePos() acquires RLock, reads pos, releases
  2. Background goroutine acquires Lock, inserts messages at old positions
  3. Reset() (via resetStreamerToParentChainOrConfigHotshotBlock) acquires Lock, clears everything

Step 3 would clean up step 2's stale data, so this is safe in practice. Just flagging for awareness.

Comment thread espressostreamer/espresso_streamer.go
Copy link
Copy Markdown
Collaborator

@Sneh1999 Sneh1999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not create a mock L1 docker image similar to what we did in proxy and use that to do the reorg?

Comment thread arbnode/batch_poster.go
}
b.espressoRestarting = false
} else {
if uint64(batchPosition.MessageCount) < b.espressoStreamer.GetCurrentMessagePos() {
Copy link
Copy Markdown
Collaborator

@Sneh1999 Sneh1999 May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if the MessagePos is after the current message pos and has not yet arrived in the streamer? Will we keep resetting then in this loop?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont quite understand the question. But how it works now, is we only advance the streamer when the batch is posted (its part of this else check). So streamer current position will never be ahead of batch position, only equal to it. So, it will only reset if for some reason the batch position is moved backwards. Does this make sense?

if err != nil {
log.Warn("failed to decode message", "err", err)
// Instead of returnning an error, we should just skip this message
if _, exists := s.messageWithMetadataAndPos[indices[i]]; exists {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this code doing? I am not sure I understand

Copy link
Copy Markdown
Author

@lukeiannucci lukeiannucci May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it just checks if the position is already in the streamer. Without this check it was overwriting an existing message with the later message instead of just taking the first message

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants