maps: add ring buffer batch API#1479
Conversation
✅ Deploy Preview for aya-rs-docs ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Pull request overview
This PR adds a batch API for ring buffer consumption that amortizes atomic writes of the consumer position. Instead of committing the position after each item, the batch API commits once when the batch is dropped, improving performance when consuming multiple items.
Changes:
- Added
RingBuf::batch()method to create a batch reader - Added
RingBufBatchstruct that defers consumer position commits - Refactored internal logic to support both immediate and batched commits
- Updated all tests and aya-log to use the new batch API
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| aya/src/maps/ring_buf.rs | Core implementation of batch API: new RingBufBatch struct, refactored ProducerData::next() to support deferred commits, split ConsumerPos operations into consume() and commit() |
| aya-log/src/lib.rs | Updated flush() method to use batch API for improved performance |
| test/integration-test/src/tests/uprobe_cookie.rs | Updated test to use batch API, simplified error handling |
| test/integration-test/src/tests/ring_buf.rs | Updated all ring buffer tests to use batch API, removed unused anyhow import, simplified error handling |
| test/integration-test/src/tests/load.rs | Updated test to use batch API with proper scoping |
| xtask/public-api/aya.txt | Added new public API entries for RingBuf::batch() and RingBufBatch |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
582a40e to
3e58029
Compare
|
@codex review |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Codex Review: Didn't find any major issues. Can't wait for the next one! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
6104cba to
403a44f
Compare
ajwerner
left a comment
There was a problem hiding this comment.
Did something concrete motivate this change? When I first saw the title I was an expecting a different sort of API that might enable the consumer to do vectorized processing of items and then to selectively choose where to commit from the returned batch. None of the existing APIs let the consumer client examine multiple items with a shared lifetime so you can't really make a tight loop over a batch. Maybe that doesn't matter.
I can see how this patch helps reduce contention between the producer and consumer on the consumer position. I'm okay with this change but would love to see a motivating benchmark or something.
| Item::Discard { len } => consume(consumer, advanced, len), | ||
| Item::Data(data) => { | ||
| // This must be deferred in case `f` panics. | ||
| scopeguard::defer! { consume(consumer, advanced, data.len()) }; |
There was a problem hiding this comment.
If f panics do you definitely want to consume the item? I think it's a reasonable contract for the use case because panicking in a loop is bad, but it is worth documenting and I could see it being pushed into f if needed (like the caller of this function in some way tracks if it panicked)
There was a problem hiding this comment.
That's a reasonable thing to call out. It also suggests that you might want control over consumption - you can imagine a combinator like TakeWhile that needs to peek and thus may elect to halt iteration without consuming the last item.
There was a problem hiding this comment.
Actually the take_while docs explicitly mention this; the rejected element is itself consumed.
https://doc.rust-lang.org/stable/std/iter/trait.Iterator.html#method.take_while
Still, this is a unlike a normal iterator because it's yielding borrowed data. Do you think it would be useful to allow the caller to explicitly (or implicitly, e.g. via panic) decide to not consume data?
There was a problem hiding this comment.
I think it's not worth doing anything about. If some consumer is so worried about panicking they can copy the data out or something like that.
It was sort of like that initially, but it ended up having worse performance than the existing API. You're right that there's no API for iterating in batches. I'm also not exactly sure how you're implement such an API because if you field 2 items and the later item is dropped, what do you do? How do you express this ordering in the type system?
In addition to reducing contention it also optimizes the consumer; the consumer code path is up to 75% faster. I need to figure out how to properly integrate the benchmarks into the integration test harness. |
|
I haven't reviewed yet, but batching makes sense, I've been working on XDP stuff recently and batching vs non batching makes GB/s of difference |
tamird
left a comment
There was a problem hiding this comment.
Reviews appreciated! I haven't had time to investigate the bench harness - running benchmarks in a VM is annoying (for collecting results)
@tamird made 2 comments.
Reviewable status: 0 of 14 files reviewed, 1 unresolved discussion (waiting on ajwerner and vadorovsky).
| Item::Discard { len } => consume(consumer, advanced, len), | ||
| Item::Data(data) => { | ||
| // This must be deferred in case `f` panics. | ||
| scopeguard::defer! { consume(consumer, advanced, data.len()) }; |
There was a problem hiding this comment.
That's a reasonable thing to call out. It also suggests that you might want control over consumption - you can imagine a combinator like TakeWhile that needs to peek and thus may elect to halt iteration without consuming the last item.
tamird
left a comment
There was a problem hiding this comment.
@tamird made 1 comment.
Reviewable status: 0 of 14 files reviewed, 1 unresolved discussion (waiting on ajwerner and vadorovsky).
| Item::Discard { len } => consume(consumer, advanced, len), | ||
| Item::Data(data) => { | ||
| // This must be deferred in case `f` panics. | ||
| scopeguard::defer! { consume(consumer, advanced, data.len()) }; |
There was a problem hiding this comment.
Actually the take_while docs explicitly mention this; the rejected element is itself consumed.
https://doc.rust-lang.org/stable/std/iter/trait.Iterator.html#method.take_while
Still, this is a unlike a normal iterator because it's yielding borrowed data. Do you think it would be useful to allow the caller to explicitly (or implicitly, e.g. via panic) decide to not consume data?
And also not indicative IMO, we should definitely bench on bare metal Linux hosts. If you tell me how do you intend to benchmark it - or even better - add some benchmarks alongside the tests, I'm happy to check it out on a 24 core Epyc machine. |
Yeah i agree with this of course but we still need to support running the benchmarks in the VM so that we can easily check that they actually work. Maybe that narrows the scope enough to be feasible. I have it in a local branch, I'll try to find time to get this over the finish line. |
ajwerner
left a comment
There was a problem hiding this comment.
@ajwerner reviewed 14 files and all commit messages, made 2 comments, and resolved 1 discussion.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on tamird and vadorovsky).
aya/src/maps/ring_buf.rs line 467 at r3 (raw file):
consumer: &'a mut ConsumerPos, mut advanced: Option<&'a mut bool>, ) -> Option<RingBufItem<'a>> {
I feel like I must be missing something -- this function was introduced to take this advanced parameter but then as far as I can tell, it's only ever called with None. It seems to me like this function was intended to be used by try_fold but then some AI bot got frustrated with a borrow-checker and decided to back that out and just inline the code from here into try_fold?
| Item::Discard { len } => consume(consumer, advanced, len), | ||
| Item::Data(data) => { | ||
| // This must be deferred in case `f` panics. | ||
| scopeguard::defer! { consume(consumer, advanced, data.len()) }; |
There was a problem hiding this comment.
I think it's not worth doing anything about. If some consumer is so worried about panicking they can copy the data out or something like that.
tamird
left a comment
There was a problem hiding this comment.
@tamird made 1 comment.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on ajwerner and vadorovsky).
aya/src/maps/ring_buf.rs line 467 at r3 (raw file):
Previously, ajwerner wrote…
I feel like I must be missing something -- this function was introduced to take this
advancedparameter but then as far as I can tell, it's only ever called with None. It seems to me like this function was intended to be used bytry_foldbut then some AI bot got frustrated with a borrow-checker and decided to back that out and just inline the code from here intotry_fold?
It's been a while since I did this but I think it was used in one of the earlier API shapes I tried.
I think this function just goes away completely if we go all in on the batch API.
ajwerner
left a comment
There was a problem hiding this comment.
@ajwerner made 1 comment.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on tamird and vadorovsky).
aya/src/maps/ring_buf.rs line 467 at r3 (raw file):
Previously, tamird (Tamir Duberstein) wrote…
It's been a while since I did this but I think it was used in one of the earlier API shapes I tried.
I think this function just goes away completely if we go all in on the batch API.
fine, but why keep the advanced param if nobody uses it?
403a44f to
797559b
Compare
tamird
left a comment
There was a problem hiding this comment.
@tamird made 1 comment.
Reviewable status: 0 of 14 files reviewed, 1 unresolved discussion (waiting on ajwerner and vadorovsky).
aya/src/maps/ring_buf.rs line 467 at r3 (raw file):
Previously, ajwerner wrote…
fine, but why keep the
advancedparam if nobody uses it?
It's detritus, I'll remove it. I remember now - an earlier shape of this PR tried to yield a batch of items and do the consumer update only once, but that turned out not to be the expensive part, it was just all the bookkeeping of the RingBufItem itself.
I wasn't suggesting that this parameter would stay, I was just trying to redirect the discussion: does this API look reasonable? do we want to keep the old API?
Anyway, this parameter is removed and I reduced the diff a bit by undoing some of the unnecessary movement.
797559b to
872c9fd
Compare
ajwerner
left a comment
There was a problem hiding this comment.
@ajwerner made 2 comments and resolved 1 discussion.
Reviewable status: 0 of 14 files reviewed, all discussions resolved (waiting on vadorovsky).
aya/src/maps/ring_buf.rs line 467 at r3 (raw file):
does this API look reasonable?
Yes, it's reasonable.
do we want to keep the old API?
I think the old API should stay at least with a deprecation for existing users. For use cases that aren't super high throughput what was there was fine.
Not to be overly inspired by bad code, but cilium/ebpf is extremely widely used in Go and it's just always copying out and committing the consumer offset.
Anyway, this parameter is removed and I reduced the diff a bit by undoing some of the unnecessary movement.
Nice, looks good.
This makes it easier to reuse later in new experimental APIs. Put it in a new impl block to reduce churn.
This implementation should have lower overhead than `RingBuffer::next` and by avoiding the overhead associated with `RingBufItem`.
872c9fd to
5b3badd
Compare
This implementation should have lower overhead than
RingBuffer::nextandby avoiding the overhead associated with
RingBufItem.This change is