Skip to content

aya: PerfEventArrayBuffer::next_event#1554

Merged
tamird merged 2 commits into
aya-rs:mainfrom
Brskt:perf-event-array-next-event
May 4, 2026
Merged

aya: PerfEventArrayBuffer::next_event#1554
tamird merged 2 commits into
aya-rs:mainfrom
Brskt:perf-event-array-next-event

Conversation

@Brskt
Copy link
Copy Markdown
Contributor

@Brskt Brskt commented Apr 28, 2026

PerfEventArrayBuffer exposes try_fold, fold, and for_each that
drain available events through a closure. The closure receives
PerfEvent<'_>, which is Sample { head: &[u8], tail: &[u8] } (slices
borrowed directly from the kernel-mapped ring buffer; tail is empty
unless the sample straddles the wrap boundary) or Lost { count }.
data_tail is committed once per drain instead of once per event.
Follow-up to #1550.

A first commit fixes a latent acquire-fence bug in read_events
between the data_head load and the record reads, present since 2021.

A second commit replaces read_events with the closure-based API
and migrates the perf event array integration test to for_each.
data_head is read with ptr::read_volatile + Acquire fence to pair
with the kernel's smp_wmb() + WRITE_ONCE() publish; data_tail is
read with a plain field access since userspace is the sole writer;
writes use ptr::write_volatile + SeqCst fence to pair with the
kernel's READ_ONCE(). The shared u64 path matches
tools/include/linux/ring_buffer.h on every supported arch, including
32-bit hosts that lack a 64-bit acquire-load. Removes read_events,
Events, PerfBufferError::NoBuffers, PerfBufferError::MoreSpaceNeeded,
and the bytes dependency.

A final commit migrates the perf event array integration test to
for_each

Added/updated tests?

  • Yes

Checklist

  • Rust code has been formatted with cargo +nightly fmt.
  • All clippy lints have been fixed.
    You can find failing lints with cargo xtask clippy.
  • Unit tests are passing locally with cargo test.
  • The integration tests are passing locally.
  • I have blessed any API changes with cargo xtask public-api --bless.

This change is Reviewable

@Brskt Brskt requested a review from a team as a code owner April 28, 2026 08:45
@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 28, 2026

Deploy Preview for aya-rs-docs ready!

Built without sensitive environment variables

Name Link
🔨 Latest commit 4ba65d2
🔍 Latest deploy log https://app.netlify.com/projects/aya-rs-docs/deploys/69f87aab960a6600085148be
😎 Deploy Preview https://deploy-preview-1554--aya-rs-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
🤖 Make changes Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Copy Markdown
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't looked at the new API yet but can we aim to replace the old with the new? I also see that the new API allocates; can we make it zero copy? I wouldn't copy the ring buffer API, see #1479 .

@tamird partially reviewed 1 file and made 3 comments.
Reviewable status: 0 of 5 files reviewed, 2 unresolved discussions (waiting on Brskt).


aya/src/maps/perf/perf_buffer.rs line 221 at r1 (raw file):

        };

        let head = unsafe { (*header).data_head } as usize;

Instead of manual fences, can we treat these as atomics? We do something like this in the ring buffer implementation.


aya/src/maps/perf/perf_buffer.rs line 223 at r1 (raw file):

        let head = unsafe { (*header).data_head } as usize;
        let mut tail = unsafe { (*header).data_tail } as usize;
        // Pair with the kernel's smp_store_release on `data_head`: subsequent

This kind of thing needs citations. See the ring buffer implementation which I believe has specific kernel code references.

@Brskt Brskt force-pushed the perf-event-array-next-event branch 2 times, most recently from 5df6323 to 2ca2924 Compare April 29, 2026 15:26
Copy link
Copy Markdown
Contributor Author

@Brskt Brskt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done. read_events removed, next_event is zero-copy. PerfSample::as_slices exposes both halves of wrap-spanning samples (no Deref<[u8]> since the perf ring isn't double-mapped). Drop advances data_tail.

@Brskt made 3 comments.
Reviewable status: 0 of 6 files reviewed, 2 unresolved discussions (waiting on tamird).


aya/src/maps/perf/perf_buffer.rs line 221 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

Instead of manual fences, can we treat these as atomics? We do something like this in the ring buffer implementation.

Ring_buf uses AtomicUsize because its kernel positions are unsigned long. Perf data_head/data_tail are fixed __u64, so the equivalent AtomicU64 doesn't compile on 32-bit MIPS. Using ptr::read_volatile + atomic::fence helpers instead.


aya/src/maps/perf/perf_buffer.rs line 223 at r1 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

This kind of thing needs citations. See the ring buffer implementation which I believe has specific kernel code references.

Done.

Copy link
Copy Markdown
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you ignored this part:

I wouldn't copy the ring buffer API, see #1479 .

@tamird partially reviewed 6 files, made 8 comments, and resolved 2 discussions.
Reviewable status: 3 of 6 files reviewed, 7 unresolved discussions (waiting on Brskt).


aya/src/maps/perf/perf_buffer.rs line 69 at r5 (raw file):

/// See [`PerfEventArrayBuffer::next_event`].
///
/// [`PerfEventArrayBuffer::next_event`]: crate::maps::perf::PerfEventArrayBuffer::next_event

s/crate::maps::perf/super/ ?


aya/src/maps/perf/perf_buffer.rs line 72 at r5 (raw file):

#[derive(Debug)]
pub enum PerfEvent<'a> {
    /// A `PERF_RECORD_SAMPLE` event. The sample bytes emitted by

instead of saying PERF_RECORD_SAMPLE you should use the doc alias attribute, that way a search in the docs for PERF_RECORD_SAMPLE would produce the right result.

Ditto below.


aya/src/maps/perf/perf_buffer.rs line 97 at r5 (raw file):

/// the kernel observes the buffer as full and stops writing new events.
///
/// [`PerfEventArrayBuffer::next_event`]: crate::maps::perf::PerfEventArrayBuffer::next_event

same


aya/src/maps/perf/perf_buffer.rs line 103 at r5 (raw file):

    data_tail: NonNull<u64>,
    advance: u64,
    _marker: PhantomData<&'a mut ()>,

There should be a comment here indicating this is the lifetime of the PerfBuffer from which this data is owned and also explaining why it isn't written as literally &'a mut PerfBuffer.


aya/src/maps/perf/perf_buffer.rs line 133 at r5 (raw file):

    pub const fn is_empty(&self) -> bool {
        self.len() == 0
    }

do we need this?

Code quote:

    /// Total length of the sample in bytes.
    pub const fn len(&self) -> usize {
        let Self { head, tail, .. } = self;
        head.len() + tail.len()
    }

    /// Returns `true` if the sample is empty.
    pub const fn is_empty(&self) -> bool {
        self.len() == 0
    }

aya/src/maps/perf/perf_buffer.rs line 246 at r5 (raw file):

                    load_volatile_u64(&raw const (*header).data_tail) as usize,
                )
            };

this code is repeated above. we are lacking some abstractions that would make this code reasonable to read. See also comments below. LLMs are not good at this, it's going to take significant human steering to get this right.

Code quote:

            // SAFETY: `header` is valid for the lifetime of `self`.
            let (head, tail) = unsafe {
                (
                    load_acquire_u64(&raw const (*header).data_head) as usize,
                    load_volatile_u64(&raw const (*header).data_tail) as usize,
                )
            };

aya/src/maps/perf/perf_buffer.rs line 357 at r5 (raw file):

///
/// [1]: https://github.com/torvalds/linux/blob/05f7e89a/kernel/events/ring_buffer.c#L113-L114
fn load_acquire_u64(ptr: *const u64) -> u64 {

at the caller we have a reference, is there a good reason to pass a pointer here?

also, we always use load_acquire_u64 to read data_head and always use load_volatile_u64 to read data_tail, can we make these into newtypes so we have reasonable type safety? same for store_release_u64.

lastly: why do we need read_volatile? if userspace is the sole writer, can we just plain old read the thing?

@Brskt Brskt force-pushed the perf-event-array-next-event branch from 2ca2924 to 52debfc Compare April 30, 2026 09:14
Copy link
Copy Markdown
Contributor Author

@Brskt Brskt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be ok now ?

@Brskt made 8 comments.
Reviewable status: 0 of 6 files reviewed, 7 unresolved discussions (waiting on tamird).


aya/src/maps/perf/perf_buffer.rs line 69 at r5 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

s/crate::maps::perf/super/ ?

Done.


aya/src/maps/perf/perf_buffer.rs line 72 at r5 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

instead of saying PERF_RECORD_SAMPLE you should use the doc alias attribute, that way a search in the docs for PERF_RECORD_SAMPLE would produce the right result.

Ditto below.

Done.


aya/src/maps/perf/perf_buffer.rs line 97 at r5 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

same

Done.


aya/src/maps/perf/perf_buffer.rs line 103 at r5 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

There should be a comment here indicating this is the lifetime of the PerfBuffer from which this data is owned and also explaining why it isn't written as literally &'a mut PerfBuffer.

The struct is gone. PerfEvent::Sample { head: &'a [u8], tail: &'a [u8] } carries the lifetime on the slices.


aya/src/maps/perf/perf_buffer.rs line 133 at r5 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

do we need this?

Gone with PerfSample.


aya/src/maps/perf/perf_buffer.rs line 246 at r5 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

this code is repeated above. we are lacking some abstractions that would make this code reasonable to read. See also comments below. LLMs are not good at this, it's going to take significant human steering to get this right.

Done. Pulled out data_head/data_tail/data_tail_ptr/data_pages helpers.


aya/src/maps/perf/perf_buffer.rs line 357 at r5 (raw file):

at the caller we have a reference, is there a good reason to pass a pointer here?

Yes for data_head (kernel writes it; &u64 would be UB). No for data_tail; folded into a method.

also, we always use load_acquire_u64 to read data_head and always use load_volatile_u64 to read data_tail, can we make these into newtypes so we have reasonable type safety? same for store_release_u64.

Done via typed methods (data_head, data_tail, data_tail_ptr on PerfBuffer).

lastly: why do we need read_volatile? if userspace is the sole writer, can we just plain old read the thing?

We don't. data_tail is now a plain read.

Copy link
Copy Markdown
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tamird partially reviewed 3 files, made 4 comments, and resolved 7 discussions.
Reviewable status: 3 of 6 files reviewed, 4 unresolved discussions (waiting on Brskt).


aya/src/maps/perf/perf_buffer.rs line 159 at r8 (raw file):

        // for the lifetime of `self`. The kernel writes this field
        // concurrently, so we must use `read_volatile`; an `&u64` would be UB.
        let value = unsafe { ptr::read_volatile(&raw const (*self.buf().as_ptr()).data_head) };

there are two different unsafe operations here, no? can we separate so each can have its own comment?

Code quote:

        // SAFETY: `self.buf()` points to the mmap'd `perf_event_mmap_page`
        // for the lifetime of `self`. The kernel writes this field
        // concurrently, so we must use `read_volatile`; an `&u64` would be UB.
        let value = unsafe { ptr::read_volatile(&raw const (*self.buf().as_ptr()).data_head) };

aya/src/maps/perf/perf_buffer.rs line 176 at r8 (raw file):

    /// commits the local position with a release-store on drop.
    fn data_tail_ptr(&mut self) -> NonNull<u64> {
        // SAFETY: same as `data_head`.

see above: this doesn't make sense to me. the thing that is happening here is effectively NonNull field projection, which is different from what's happening in data_head.


aya/src/maps/perf/perf_buffer.rs line 244 at r8 (raw file):

                        // `TailTracker::drop`).
                        let head = unsafe { slice::from_raw_parts(base.add(sample_start), first) };
                        let tail = unsafe { slice::from_raw_parts(base, sample_size - first) };

in the ring buffer implementation we use a neat trick: mmap the data page twice, which ensures you can always read an event contiguously. It's a bigger change here, but could we do it?

Code quote:

                        let head = unsafe { slice::from_raw_parts(base.add(sample_start), first) };
                        let tail = unsafe { slice::from_raw_parts(base, sample_size - first) };

aya/src/maps/perf/perf_buffer.rs line 362 at r8 (raw file):

    let len = out_buf.len();
    let end = (start_off + len) % mmap_size;
    let start = start_off % mmap_size;

does this mean start_off can be past-the-end? that doesn't seem right to me, there is already mod happening in the caller

@Brskt Brskt force-pushed the perf-event-array-next-event branch 2 times, most recently from 9bd5a65 to 1e1cddc Compare May 1, 2026 10:44
Copy link
Copy Markdown
Contributor Author

@Brskt Brskt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Brskt made 4 comments.
Reviewable status: 0 of 6 files reviewed, 4 unresolved discussions (waiting on tamird).


aya/src/maps/perf/perf_buffer.rs line 159 at r8 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

there are two different unsafe operations here, no? can we separate so each can have its own comment?

Done.


aya/src/maps/perf/perf_buffer.rs line 176 at r8 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

see above: this doesn't make sense to me. the thing that is happening here is effectively NonNull field projection, which is different from what's happening in data_head.

Done. Two unsafe blocks now: one for the field projection, one for NonNull::new_unchecked.


aya/src/maps/perf/perf_buffer.rs line 244 at r8 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

in the ring buffer implementation we use a neat trick: mmap the data page twice, which ensures you can always read an event contiguously. It's a bigger change here, but could we do it?

Not from userspace. perf_mmap rejects any vm_pgoff other than 0 or rb->aux_pgoff (core.c), and __perf_mmap_to_page is bounded by nr_pages (ring_buffer.c:805). The trick would need a kernel patch to rb_alloc + __perf_mmap_to_page mirroring bpf_ringbuf_area_alloc; libbpf, tools/lib/perf, and bcc all do the wrap memcpy for the same reason.


aya/src/maps/perf/perf_buffer.rs line 362 at r8 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

does this mean start_off can be past-the-end? that doesn't seem right to me, there is already mod happening in the caller

Yes. event_start is modded but the call sites add sizeof(header) (and sizeof(u64) for LOST), which can push past mmap_size. Moved the mod to both call sites and added debug_assert!(start < mmap_size).

The kernel publishes new perf records via smp_store_release on
data_head, so userspace must issue an acquire barrier after the
data_head load and before reading the record bytes. Without it,
weakly ordered targets (arm64, ppc64, riscv) may speculatively
read bytes that have not yet been published.

The existing SeqCst fence before the data_tail store is the
release-side barrier; it does nothing for head-side ordering.

See the comment on `data_head` in include/uapi/linux/perf_event.h
and perf_event_open(2).
@Brskt Brskt force-pushed the perf-event-array-next-event branch from 1e1cddc to 9826089 Compare May 1, 2026 12:22
Copy link
Copy Markdown
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tamird partially reviewed 6 files, made 6 comments, and resolved 4 discussions.
Reviewable status: 1 of 6 files reviewed, 6 unresolved discussions (waiting on Brskt).


aya/src/maps/perf/perf_buffer.rs line 207 at r11 (raw file):

        let data_head = self.data_head();
        let initial_tail = self.data_tail();
        let mut tracker = TailTracker::new(self.data_tail_ptr(), initial_tail);

in the ringbuf implementation i used scopeguard::defer, which I found to be easier to reason about because it didn't require bouncing between this function and the Drop impl. do you think that would work here?


aya/src/maps/perf/perf_buffer.rs line 212 at r11 (raw file):

        while tracker.tail != data_head {
            let event_start = (tracker.tail % mmap_size as u64) as usize;
            // SAFETY: the kernel guarantees event headers are 8-byte aligned

would you mind adding a citation for this and perhaps a debug assert?

also if the kernel promises they are aligned, why are we using read_unaligned?


aya/src/maps/perf/perf_buffer.rs line 254 at r11 (raw file):

                        let tail = unsafe { slice::from_raw_parts(base, sample_size - first) };
                        (head, tail)
                    };

this code is very similar to the body of fill_from_mmap. Extract this so the helper does this (return two slices) and the caller can do the trivial copy into a 4-byte buffer for reading the length?

Code quote:

                    let (head, tail) = if sample_start + sample_size <= mmap_size {
                        // SAFETY: the if-condition guarantees the slice is
                        // entirely within the mmap data area. The kernel will
                        // not overwrite these bytes until `data_tail` is
                        // committed (via `TailTracker::drop`).
                        let s =
                            unsafe { slice::from_raw_parts(base.add(sample_start), sample_size) };
                        (s, &[][..])
                    } else {
                        let first = mmap_size - sample_start;
                        // SAFETY: `[sample_start, mmap_size)` and
                        // `[0, sample_size - first)` are disjoint ranges within
                        // the mmap data area; together they cover the wrapping
                        // sample exactly. The kernel will not overwrite either
                        // range until `data_tail` is committed (via
                        // `TailTracker::drop`).
                        let head = unsafe { slice::from_raw_parts(base.add(sample_start), first) };
                        let tail = unsafe { slice::from_raw_parts(base, sample_size - first) };
                        (head, tail)
                    };

aya/src/maps/perf/perf_buffer.rs line 260 at r11 (raw file):

                    let mut count_buf = [0u8; size_of::<u64>()];
                    fill_from_mmap(
                        (event_start + size_of::<perf_event_header>() + size_of::<u64>())

if i'm reading this right we skip over the header plus 8 bytes and then read 8 bytes. Can you write a comment explaining why please?


aya/src/maps/perf/perf_buffer.rs line 372 at r11 (raw file):

/// `base` must point to a valid mmap region of at least `mmap_size` bytes.
fn fill_from_mmap(start: usize, base: *const u8, mmap_size: usize, out_buf: &mut [u8]) {
    debug_assert!(start < mmap_size);

consider emitting the values in case of mismatch. here and below


aya/src/maps/perf/perf_buffer.rs line 376 at r11 (raw file):

    debug_assert!(len <= mmap_size);

    if start + len <= mmap_size {
if let Some(second) = (start + len).checked_sub(mmap_size) {
    let first = mmap_size - start;
    # split
} else {
    # contiguous
}

if you enjoy checked arithmetic

@Brskt Brskt force-pushed the perf-event-array-next-event branch from 9826089 to 956dbf5 Compare May 3, 2026 12:33
Copy link
Copy Markdown
Contributor Author

@Brskt Brskt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Brskt made 6 comments.
Reviewable status: 0 of 6 files reviewed, 6 unresolved discussions (waiting on tamird).


aya/src/maps/perf/perf_buffer.rs line 207 at r11 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

in the ringbuf implementation i used scopeguard::defer, which I found to be easier to reason about because it didn't require bouncing between this function and the Drop impl. do you think that would work here?

Yes, done. Used scopeguard::guard instead of defer! since the loop needs to mutate the captured tail/dirty state.


aya/src/maps/perf/perf_buffer.rs line 212 at r11 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

would you mind adding a citation for this and perhaps a debug assert?

also if the kernel promises they are aligned, why are we using read_unaligned?

Citation and debug_assert_eq!(event_start % 8, 0) added. No good reason for read_unaligned, switched to read.


aya/src/maps/perf/perf_buffer.rs line 254 at r11 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

this code is very similar to the body of fill_from_mmap. Extract this so the helper does this (return two slices) and the caller can do the trivial copy into a 4-byte buffer for reading the length?

Done. Extracted into slice_from_mmap returning (&[u8], &[u8]); size and count copy into their stack buffers.


aya/src/maps/perf/perf_buffer.rs line 260 at r11 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

if i'm reading this right we skip over the header plus 8 bytes and then read 8 bytes. Can you write a comment explaining why please?

Comment added explaining the skip past id.


aya/src/maps/perf/perf_buffer.rs line 372 at r11 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

consider emitting the values in case of mismatch. here and below

Done. Values now emitted on mismatch.


aya/src/maps/perf/perf_buffer.rs line 376 at r11 (raw file):

Previously, tamird (Tamir Duberstein) wrote…
if let Some(second) = (start + len).checked_sub(mmap_size) {
    let first = mmap_size - start;
    # split
} else {
    # contiguous
}

if you enjoy checked arithmetic

Done. Switched to checked_sub.

Copy link
Copy Markdown
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really good! Thanks for iterating.

@tamird reviewed 6 files and all commit messages, made 5 comments, and resolved 6 discussions.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on Brskt).


-- commits line 54 at r15:
this should probably be squashed into the prior commit since it would otherwise not compile. right?


aya/src/maps/perf/perf_buffer.rs line 226 at r14 (raw file):

                }
            }
        });

i wonder if you could write let (tail, dirty) = &mut *guard; here to avoid all the guard.0, guard.1 below


aya/src/maps/perf/perf_buffer.rs line 243 at r14 (raw file):

            // [1]: https://github.com/torvalds/linux/blob/05f7e89a/kernel/events/core.c#L8451
            let event: perf_event_header = unsafe { ptr::read(base.add(event_start).cast()) };
            let event_size = event.size as usize;

can you help me understand the relationship between event_size and all the reads we do below?

currently we call slice_from_mmap twice down there but it seem to me we could do it just once for the whole event (and possibly out here, unconditionally).

that might even allow slice_from_mmap to be inlined here


aya/src/maps/perf/perf_buffer.rs line 297 at r14 (raw file):

                    }
                }
                _ => continue,

this predates you but do you know if we expect other event types here? would be good to write a comment and/or a debug assertion

@Brskt Brskt force-pushed the perf-event-array-next-event branch from 956dbf5 to 1e93001 Compare May 3, 2026 21:53
Copy link
Copy Markdown
Contributor Author

@Brskt Brskt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Brskt made 4 comments.
Reviewable status: 0 of 6 files reviewed, 4 unresolved discussions (waiting on tamird).


-- commits line 54 at r15:

Previously, tamird (Tamir Duberstein) wrote…

this should probably be squashed into the prior commit since it would otherwise not compile. right?

Squashed.


aya/src/maps/perf/perf_buffer.rs line 226 at r14 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

i wonder if you could write let (tail, dirty) = &mut *guard; here to avoid all the guard.0, guard.1 below

Sure, done.


aya/src/maps/perf/perf_buffer.rs line 243 at r14 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

can you help me understand the relationship between event_size and all the reads we do below?

currently we call slice_from_mmap twice down there but it seem to me we could do it just once for the whole event (and possibly out here, unconditionally).

that might even allow slice_from_mmap to be inlined here

event_size covers the whole record. Inlined the wrap into the SAMPLE arm and dropped slice_from_mmap.


aya/src/maps/perf/perf_buffer.rs line 297 at r14 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

this predates you but do you know if we expect other event types here? would be good to write a comment and/or a debug assertion

No, only SAMPLE and LOST. Added a comment and debug_assert!.

Copy link
Copy Markdown
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty much there, just a few nits and a question about the lost events.

@tamird reviewed 6 files and all commit messages, made 5 comments, and resolved 4 discussions.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on Brskt).


aya/src/maps/perf/perf_buffer.rs line 247 at r17 (raw file):

            let event_size = event.size as usize;
            let event_type = event.type_;
            *tail = tail.wrapping_add(event_size as u64);

what is the type of event.size? we're as usizeing it above and then as u64 here


aya/src/maps/perf/perf_buffer.rs line 250 at r17 (raw file):

            *dirty = true;

            let perf_event = match event_type {

consider just event.type_ here and dropping the local variable


aya/src/maps/perf/perf_buffer.rs line 289 at r17 (raw file):

                    // `{ header, u64 id, u64 lost, sample_id }` [1]; skip past
                    // `id` to read the `lost` count, which lives at an 8-aligned
                    // offset and cannot span the wrap.

is there a citation for this?

Code quote:

                    // `id` to read the `lost` count, which lives at an 8-aligned
                    // offset and cannot span the wrap.

aya/src/maps/perf/perf_buffer.rs line 299 at r17 (raw file):

                    PerfEvent::Lost { count }
                }
                _ => {

you can put event_type here if you write match event.type_ above

Copy link
Copy Markdown
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tamird made 1 comment.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on Brskt).


aya/src/maps/perf/perf_buffer.rs line 216 at r17 (raw file):

        //
        // [1]: https://github.com/torvalds/linux/blob/05f7e89a/kernel/events/ring_buffer.c#L202
        let mut guard = scopeguard::guard((initial_tail, false), |(tail, dirty)| {

another thought: we don't need dirty, do we? we can close over initial_tail and replace if dirty with if tail != initial_tail?

@Brskt Brskt force-pushed the perf-event-array-next-event branch from 1e93001 to d787758 Compare May 4, 2026 10:24
Copy link
Copy Markdown
Contributor Author

@Brskt Brskt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Brskt made 5 comments.
Reviewable status: 0 of 6 files reviewed, 5 unresolved discussions (waiting on tamird).


aya/src/maps/perf/perf_buffer.rs line 216 at r17 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

another thought: we don't need dirty, do we? we can close over initial_tail and replace if dirty with if tail != initial_tail?

Correct, we don't need it, removed.


aya/src/maps/perf/perf_buffer.rs line 247 at r17 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

what is the type of event.size? we're as usizeing it above and then as u64 here

u16. Inlined as u64::from(event.size).


aya/src/maps/perf/perf_buffer.rs line 250 at r17 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

consider just event.type_ here and dropping the local variable

Done.


aya/src/maps/perf/perf_buffer.rs line 289 at r17 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

is there a citation for this?

Added.


aya/src/maps/perf/perf_buffer.rs line 299 at r17 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

you can put event_type here if you write match event.type_ above

Done.

Copy link
Copy Markdown
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tamird reviewed 6 files and all commit messages, made 1 comment, and resolved 4 discussions.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on Brskt).


aya/src/maps/perf/perf_buffer.rs line 289 at r17 (raw file):

Previously, Brskt wrote…

Added.

this is the same citation as used above for the header. sorry to belabor this. can you help me understand why this cannot span the wrap but the sample case above can?

Replace `PerfEventArrayBuffer::read_events` with closure-based
`try_fold`, `fold`, and `for_each`. The closure receives
`PerfEvent<'_>`, which is `Sample { head, tail }` (slices borrowed
from the kernel-mapped ring buffer; `tail` is empty unless the
sample straddles the wrap boundary) or `Lost { count }`.

The drain accumulates per-event `data_tail` advances locally and
emits a single release-store on scope exit via `scopeguard::guard`,
amortizing the kernel-visible barrier and remaining panic-safe.

Read `data_head` with `ptr::read_volatile` + `Acquire` fence to
pair with the kernel's `smp_wmb()` + `WRITE_ONCE()` publish.
Read `data_tail` with a plain field access since userspace is the
sole writer. Write `data_tail` with `ptr::write_volatile` +
`SeqCst` fence to pair with the kernel's `READ_ONCE()`.

Remove `Events`, `PerfBufferError::NoBuffers`,
`PerfBufferError::MoreSpaceNeeded`, and the `bytes` dependency.

Migrate the perf event array integration test to the new API:
the closure destructures `PerfEvent::Sample { head, .. }` directly,
dropping the `BytesMut` canary and `PADDED_SAMPLE_SIZE` plumbing.
@Brskt Brskt force-pushed the perf-event-array-next-event branch from d787758 to 4ba65d2 Compare May 4, 2026 10:53
Copy link
Copy Markdown
Contributor Author

@Brskt Brskt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Brskt made 1 comment.
Reviewable status: 0 of 6 files reviewed, 1 unresolved discussion (waiting on tamird).


aya/src/maps/perf/perf_buffer.rs line 289 at r17 (raw file):

Previously, tamird (Tamir Duberstein) wrote…

this is the same citation as used above for the header. sorry to belabor this. can you help me understand why this cannot span the wrap but the sample case above can?

I'm sorry for the misunderstanding. Fixed 8 bytes at an 8-aligned offset cannot cross an 8-aligned boundary; the sample payload is variable-length and can. Updated citation.

Copy link
Copy Markdown
Member

@tamird tamird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codex review

@tamird reviewed 1 file and all commit messages, made 1 comment, and resolved 1 discussion.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on Brskt).

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Can't wait for the next one!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors PerfEventArrayBuffer’s event-draining API from a copy-into-BytesMut model (read_events) to a closure-based model (try_fold/fold/for_each) that yields PerfEvent<'_> borrowed directly from the kernel-mapped perf ring buffer, and updates integration/unit tests plus dependencies accordingly.

Changes:

  • Replaces read_events/Events with PerfEvent<'_> and the closure-based draining APIs, committing data_tail once per drain.
  • Updates perf buffer synchronization (Acquire fence after data_head read; volatile tail commit with fencing) and rewrites unit tests to cover the new API including wraparound.
  • Updates integration tests and removes the bytes dependency; adds scopeguard for panic-safe tail commit.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
aya/src/maps/perf/perf_buffer.rs Implements new PerfEvent-based draining (try_fold/fold/for_each) and updated memory ordering + tests.
aya/src/maps/perf/perf_event_array.rs Exposes the new public buffer APIs and updates docs/examples to the event-closure model.
test/integration-test/src/tests/perf_event_array.rs Migrates integration test from read_events to for_each + PerfEvent.
aya/Cargo.toml Drops bytes, adds scopeguard.
test/integration-test/Cargo.toml Drops bytes dependency.
xtask/public-api/aya.txt Public API snapshot updated for the new perf buffer API surface.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +84 to +86
head: &'a [u8],
/// Second chunk; empty unless the sample straddles the ring boundary.
tail: &'a [u8],
count: u64,
},
}

@tamird tamird merged commit be9d3fa into aya-rs:main May 4, 2026
25 of 26 checks passed
@tamird
Copy link
Copy Markdown
Member

tamird commented May 4, 2026

Thanks!

@Brskt Brskt deleted the perf-event-array-next-event branch May 4, 2026 11:28
@tamird
Copy link
Copy Markdown
Member

tamird commented May 5, 2026

@Brskt would you mind fixing up the build failure here?

@Brskt
Copy link
Copy Markdown
Contributor Author

Brskt commented May 5, 2026

@Brskt would you mind fixing up the build failure here?

Sure, I will take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants