Skip to content

ringbuf: add zero-copy read with deferred consumer advancement#1968

Open
orishuss wants to merge 1 commit intocilium:mainfrom
orishuss:ringbuf-zero-copy-commit
Open

ringbuf: add zero-copy read with deferred consumer advancement#1968
orishuss wants to merge 1 commit intocilium:mainfrom
orishuss:ringbuf-zero-copy-commit

Conversation

@orishuss
Copy link
Copy Markdown
Contributor

@orishuss orishuss commented Mar 23, 2026

Summary

Adds ReadZeroCopy and Commit to ringbuf.Reader, enabling zero-copy reads from the mmap'd ring buffer with batched consumer position advancement.

Motivation

ReadInto copies every record from mmap into a user buffer and advances the consumer position per-record via atomic store. In high-throughput scenarios (100K+ events/sec), the memmove and per-record atomics become a measurable CPU cost.

This change allows callers to:

  1. Read records directly from the mmap'd region (no copy)
  2. Process multiple records
  3. Advance the consumer position once via Commit

Changes

  • Reader.ReadZeroCopy(rec *Record) error — sets rec.RawSample to a slice of the mmap'd ring, does not advance consumer position
  • Reader.Commit() — single atomic store to release all consumed space
  • readRecord refactored to delegate to readRecordZeroCopy + copy + advance (no duplication)
  • ReadInto and ReadZeroCopy share poll logic via readWait helper
  • Windows: real zero-copy via ringReader delegation (same as Linux)
  • Tests: single read, multi read with discards, commit releases space, deadline, no-op commit, benchmark

API

// Zero-copy read — slice valid until Commit.
err := reader.ReadZeroCopy(&rec)

// Batch commit after processing N records.
reader.Commit()

Existing Read/ReadInto semantics are unchanged.

Related

This PR differs by deferring consumer advancement, eliminating per-record atomic stores in addition to the copy.

@orishuss orishuss requested review from a team and florianl as code owners March 23, 2026 10:14
@orishuss orishuss force-pushed the ringbuf-zero-copy-commit branch 3 times, most recently from 2c99f76 to da08000 Compare March 23, 2026 10:46
@orishuss orishuss force-pushed the ringbuf-zero-copy-commit branch from da08000 to 609e92e Compare March 23, 2026 10:50
Copy link
Copy Markdown
Contributor

@florianl florianl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reader needs to be concurrency safe, I think (and also dicussed in #915) and I don't see this change meeting this criteria.


err := rr.readRecordZeroCopy(rec)
if err != nil {
return err
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we get an error here and don't call rr.advance() don't we get stuck? What is the idea for recovery?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point - an error after isDiscard=true records will not advance. fixing

// to the next record.
atomic.StoreUintptr(rr.cons_pos, cons)
rr.pendingCons = cons
rr.hasPending = true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What clears hasPending if we never reach a non-discard record?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the event that all records in readRecordZeroCopy are isDiscard=true, advance must stil be called by the user. the user's readWait still waits for a record, so it should still get it eventually and the user may successsfully get a non-discard record.
However, even if an error is returned to the user - for example a timeout - Commit must still be called. This is a very important detail that must be added to the documentation.

At first when I read your comment I wanted to address it by simply advancing the ring buffer for isDiscard=true records and for errors, but this can't be done, because there may be user pointer to the internal ring buffer, which wasn't freed (Commit wasn't called).


// Set by readRecordZeroCopy to prevent readRecord from reusing
// RawSample as a write destination (it points into read-only mmap memory).
zeroCopy bool
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Embedding call-scoped state in a user-visible type creates hidden coupling. A Record reused across two different Reader instances carries a zeroCopy flag from reader A into reader B, silently changing allocation behavior. The pending-commit state belongs to the reader or ring, not the record.

Copy link
Copy Markdown
Contributor Author

@orishuss orishuss Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was exactly my intention.
If a record points to kernel memory, it must not be written to. This is a property of the buf [] (record).
Indeed, if someone uses a raw record to read without copying, and then uses the same record on another ring buffer, I would want an allocation, because the old memory isn't writable.
This field should be renamed - to something like kernelBacked.

}

// Commits the pending consumer position from readRecordZeroCopy calls.
func (rr *ringReader) advance() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After atomic.StoreUintptr() returns, the Kernel may overwrite the mmap'd region immediately. Any RawSample slice handed out by ReadZeroCopy is now a dangling view into potentially live Kernel memory. The documentation warns callers, but nothing prevents silent use-after-commit — Go's GC keeps the mapping alive while rec.RawSample is reachable, so there's no crash, just corrupt data.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes :)
This will be a problem in every implementation of such a feature.
This may be "hidden" because of the function name. May be ReadZeroCopy should be ReadUnsafe?

@ti-mo
Copy link
Copy Markdown
Contributor

ti-mo commented Apr 1, 2026

Okay, here's my ideal scenario as someone watching this from a distance (and this now being the 3rd proposal 🙂):

  • let's introduce a new ringbuf.UnsafeReader that packages all the zero-copy bits
  • ringbuf-only
    • conceptually I don't see this work for perf given the multiple buffers involved there (I could be wrong ofc), and ringbuf being the efficient, future-proof replacement
    • separate UnsafeReader type makes this separation clear
  • both callback and read/commit should be supported, invoking a callback will always commit implicitly
  • not necessarily Record-driven; callbacks could receive a []byte and the function taking/invoking the callback could return the nr of bytes available in the ring
  • concurrency safety obviously can't be a priority; this is fine if documented
  • reuse bits from Reader wherever it makes sense, this may involve some refactoring

@florianl @orishuss @jschwinger233 WDYT?

@orishuss orishuss force-pushed the ringbuf-zero-copy-commit branch 2 times, most recently from d7805dc to 1c6b21c Compare April 9, 2026 13:32
@orishuss
Copy link
Copy Markdown
Contributor Author

orishuss commented Apr 9, 2026

Hi @ti-mo, I like your idea. A separation between Reader and UnsafeReader would make it clear to the user that they have to be careful, therefore committing where necessary.
However, I fixed @florianl's comments and I think the code is good now. Making this abstraction between 2 readers, while 99% of the code between them would be the same, is a bit overkill in my opinion.
I also renamed ReadZeroCopy to ReadUnsafe, to make it clear that caution is warranted.
Also: I can add a ReadCallback function if you think there's value in that. Do you still need it, given that the user will be able to ReadUnsafe+Commit?

Does this seem sufficient, or would you still like a separation to two readers?

Signed-off-by: Ori Shussman <orishuss@gmail.com>
@orishuss orishuss force-pushed the ringbuf-zero-copy-commit branch from 1c6b21c to d1f0ef5 Compare April 9, 2026 14:17
@orishuss
Copy link
Copy Markdown
Contributor Author

orishuss commented Apr 9, 2026

About concurrency, I can think of 2 ways forward:

  1. Leave it as-is, and make the documentation clear on that Commit should happen from one routine after all the read records finished processing.
  2. Have the reader aware of which records were read (specifically their offsets), and have Commit(offset) instead of Commit(). then the reader can only actually advance when the first record was returned. For example if 3 routines read records with offsets 1000,1100,1200, and Commit(1100), Commit(1200) are called, nothing is advanced, and we in fact wait for Commit(1000) to be called. This requires maintaining a slice of sorted read records within the reader, and I can't think of another way to allow multiple goroutines to handle memory from within the ringbuffer without copying it, while not forcing them to stop and wait for a single goroutine to commit all. This of-course has the problem that a single stuck goroutine could get the entire ring buffer stuck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants