Skip to content

fix(exporter): reduce mutex hold time in process pending events#5134

Open
hugehoo wants to merge 16 commits intothomaspoignant:mainfrom
hugehoo:perf/reduce-mutex-hold-time-in-process-pending-events
Open

fix(exporter): reduce mutex hold time in process pending events#5134
hugehoo wants to merge 16 commits intothomaspoignant:mainfrom
hugehoo:perf/reduce-mutex-hold-time-in-process-pending-events

Conversation

@hugehoo
Copy link
Copy Markdown

@hugehoo hugehoo commented Apr 18, 2026

Description

This PR reduces lock contention in the exporter event store while preserving event delivery guarantees for concurrent flushes.

What was the problem?

  • ProcessPendingEvents() held the event store mutex while running processEventsFunc, which can perform slow I/O such as exporter uploads.
  • While that lock was held, Add() could be blocked unnecessarily.
  • The consumer offset update also incorrectly wrote lastOffset instead of the offset passed into updateConsumerOffset().
  • Concurrent Flush() calls needed to stay safe after shortening the event store lock scope.

How is it resolved?

  • ProcessPendingEvents() now:
    • fetches pending events under lock,
    • releases the lock before calling processEventsFunc,
    • reacquires the lock only to update the consumer offset.
  • updateConsumerOffset() now uses the provided offset argument instead of always writing lastOffset.
  • DataExporter.Flush() now serializes concurrent flushes with a mutex so concurrent calls do not double-process or lose events.
  • Added regression tests for:
    • Add() not being blocked by slow pending-event processing,
    • concurrent flushes not exporting duplicate events,
    • concurrent test bookkeeping in event_store_test.go.

Closes issue(s)

Resolve #5133

Checklist

  • I have tested this code
  • I have added unit test to cover this code
  • I have updated the documentation (README.md and /website/docs)
  • I have followed the contributing guide

@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 18, 2026

Deploy Preview for go-feature-flag-doc-preview canceled.

Name Link
🔨 Latest commit 19cbcb3
🔍 Latest deploy log https://app.netlify.com/projects/go-feature-flag-doc-preview/deploys/69f21d2fa432d10009e8215f

@github-actions github-actions Bot requested a review from thomaspoignant April 18, 2026 15:07
@hugehoo hugehoo changed the title Perf/reduce mutex hold time in process pending events feat: reduce mutex hold time in process pending events Apr 18, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves concurrency by reducing lock contention in the EventStore and adding a mutex to prevent concurrent flushes in the DataExporter. It also fixes a bug in updateConsumerOffset where the global last offset was incorrectly used instead of the consumer-specific offset. Feedback suggests addressing a potential race condition in EventStore when the same consumer processes events concurrently and recommends an optimization to return early if no events are found during processing.

Comment thread exporter/event_store.go Outdated
Comment on lines 93 to 109
e.mutex.Lock()
defer e.mutex.Unlock()

eventList, err := e.fetchPendingEvents(consumerID)
e.mutex.Unlock()
if err != nil {
return err
}

err = processEventsFunc(context.Background(), eventList.Events)
if err != nil {
return err
}

e.mutex.Lock()
err = e.updateConsumerOffset(consumerID, eventList.NewOffset)
if err != nil {
return err
}
return nil
e.mutex.Unlock()
return err
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The change to release the global mutex before calling processEventsFunc successfully reduces lock contention for Add() operations. However, it introduces a race condition where multiple concurrent calls to ProcessPendingEvents for the same consumerID can fetch and process the same set of events multiple times (since the offset is only updated after processing).

While the addition of flushMu in DataExporter mitigates this for that specific caller, the EventStore itself is no longer safe for concurrent processing by the same consumer. If the EventStore is intended to be a robust standalone component, consider implementing a per-consumer lock or a "busy" state within eventStoreImpl. This would ensure atomicity for a single consumer's processing while still allowing other consumers and Add() operations to proceed concurrently.

Comment thread exporter/event_store.go Outdated
@hugehoo hugehoo changed the title feat: reduce mutex hold time in process pending events fix(exporter): reduce mutex hold time in process pending events Apr 18, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.89%. Comparing base (e19ef05) to head (19cbcb3).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5134      +/-   ##
==========================================
+ Coverage   85.81%   85.89%   +0.07%     
==========================================
  Files         153      153              
  Lines        6558     6558              
==========================================
+ Hits         5628     5633       +5     
+ Misses        699      696       -3     
+ Partials      231      229       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@thomaspoignant
Copy link
Copy Markdown
Owner

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces more granular locking in the event exporter to prevent slow export operations from blocking the addition of new events. Key changes include the addition of a per-consumer mutex in the event store and a refactor of the ProcessPendingEvents method to release the global lock during event processing. Feedback highlights a data race in the consumer lookup within ProcessPendingEvents that needs to be addressed by holding the global mutex during the lookup. Additionally, the flushMu in DataExporter is now redundant as the event store handles per-consumer synchronization internally.

Comment thread exporter/event_store.go Outdated
Comment thread exporter/data_exporter.go Outdated
Comment on lines +105 to +106
d.flushMu.Lock()
defer d.flushMu.Unlock()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The flushMu mutex in dataExporterImpl is redundant. The eventStore.ProcessPendingEvents method now implements internal per-consumer synchronization using currentConsumer.mutex. Since each DataExporter instance is associated with a specific consumerID, the event store already ensures that concurrent flushes for the same consumer are serialized. Removing this lock reduces unnecessary overhead.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hugehoo as Gemini is saying this mutex is not needed here.
Can you guide me why you want to add one?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added flushMu as a defensive guard because, ProcessPendingEvents() released the store lock before export and only updated the consumer offset afterward. That made concurrent Flush() calls for the same consumer look risky, so I serialized them at the DataExporter layer.
but after saw what Gemini said, I agree this is not the right place to keep that guarantee. The atomicity should live in EventStore, not in DataExporter.

Copy link
Copy Markdown
Owner

@thomaspoignant thomaspoignant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hugehoo the different review from gemini help to understand that while you improve the add function speed we are not able to assure atomicity per consumer which can be an issue.

I am happy to discuss about the implementation and your ideas to improve it.

Comment thread exporter/data_exporter.go Outdated
Comment on lines +105 to +106
d.flushMu.Lock()
defer d.flushMu.Unlock()
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hugehoo as Gemini is saying this mutex is not needed here.
Can you guide me why you want to add one?

@hugehoo
Copy link
Copy Markdown
Author

hugehoo commented Apr 23, 2026

@hugehoo the different review from gemini help to understand that while you improve the add function speed we are not able to assure atomicity per consumer which can be an issue.

I am happy to discuss about the implementation and your ideas to improve it.

thanks for the review! I updated commit as you and Gemini's feedbacks.

@hugehoo
Copy link
Copy Markdown
Author

hugehoo commented Apr 23, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the EventStore to use per-consumer locking, ensuring that slow event exports do not block the addition of new events and preventing duplicate processing during concurrent flushes. It also fixes a bug where consumer offsets were incorrectly updated and adds several tests to verify the new concurrency behavior. Feedback suggests updating the ProcessPendingEvents signature to accept and propagate a context.Context for better management of long-running I/O operations.

Comment thread exporter/event_store.go Outdated
@thomaspoignant
Copy link
Copy Markdown
Owner

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves the concurrency model of the event store by introducing per-consumer mutexes and fine-grained locking in ProcessPendingEvents. It also fixes a bug in updateConsumerOffset where the wrong offset was being applied and adds comprehensive tests for concurrent scenarios. Review feedback suggests refactoring internal methods to use consumer pointers directly to avoid redundant lookups and potential race conditions if a consumer is re-added during processing. Additionally, it was noted that ProcessPendingEvents should ideally propagate a context to allow for the cancellation of long-running export operations.

Comment thread exporter/event_store.go Outdated
Comment thread exporter/event_store.go Outdated
@hugehoo
Copy link
Copy Markdown
Author

hugehoo commented Apr 24, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces fine-grained locking in the EventStore to prevent event processing from blocking the addition of new events. Key changes include adding a per-consumer mutex, refactoring ProcessPendingEvents to use more granular locking, and updating internal methods to work with consumer pointers. The PR also adds several tests to verify concurrent behavior and fixes a typo in the documentation. The review feedback suggests further improving performance by using read locks instead of write locks when fetching events and updating offsets, as these operations do not modify the global store state.

Comment thread exporter/event_store.go Outdated
Comment thread exporter/event_store.go
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@hugehoo hugehoo requested a review from thomaspoignant April 24, 2026 16:29
Comment thread exporter/event_store.go
Copy link
Copy Markdown
Owner

@thomaspoignant thomaspoignant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hugehoo thanks a lot for this update, this is a really good one 🙇.
I will merge this PR and it will be part of the next version.

@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

(change) reduce lock contention in exporter event processing

2 participants