aya for sched_ext: Add struct_ops program support and kfunc call resolution#1495
aya for sched_ext: Add struct_ops program support and kfunc call resolution#1495rrnewton wants to merge 1 commit into
Conversation
Add support for BPF_PROG_TYPE_STRUCT_OPS programs and
BPF_MAP_TYPE_STRUCT_OPS maps, enabling use cases like sched_ext
custom schedulers that implement kernel struct interfaces from
Rust eBPF.
aya-obj changes:
- Add EbpfSectionKind::StructOps/StructOpsLink and
ProgramSection::StructOps for .struct_ops ELF sections
- Add Map::StructOps variant with StructOpsMap type
- Parse .struct_ops/.struct_ops.link sections, resolving struct
type names from BTF VAR types
- Make Btf::type_by_id, type_name, string_at public for kernel
BTF introspection
- Make BtfMember and Struct.{size,members} public
- Add Btf::fixup_func_linkage() to patch GLOBAL→STATIC for
struct_ops (Rust compiler emits all as GLOBAL)
- Sanitize EXTERN FUNCs and .struct_ops DATASECs in to_bytes()
to avoid kernel rejection during BPF_BTF_LOAD
- Recompute BTF header in to_bytes() to match serialized sizes
- Detect extern symbol (kfunc) call relocations and patch
src_reg from BPF_PSEUDO_CALL to BPF_PSEUDO_KFUNC_CALL
- Add Object::fixup_kfunc_calls() to resolve kfunc imm fields
against vmlinux BTF
aya changes:
- Add StructOps program type with StructOpsLink
- Wire StructOps into Program enum and all impl_*! macros
- Defer struct_ops map creation until attach time
- Add Ebpf::attach_struct_ops() that loads programs, creates
the struct_ops map with program FDs, and attaches via
BPF_LINK_CREATE
- Cache kernel BTF in Ebpf for reuse during attachment
- Store btf_fd for struct_ops map creation
~215 of the ~855 added lines are unit tests covering section
parsing, BTF fixups, and serialization sanitization.
Tested with a pure-Rust sched_ext FIFO scheduler (scx_purerust)
running for 5+ minutes under normal workload.
❌ Deploy Preview for aya-rs-docs failed.Built without sensitive environment variables
|
There was a problem hiding this comment.
Pull request overview
Adds first-class support for BPF_PROG_TYPE_STRUCT_OPS / BPF_MAP_TYPE_STRUCT_OPS across aya-obj (ELF/BTF/relocations) and aya (loader + attachment), enabling Rust eBPF “struct ops” use cases like sched_ext.
Changes:
- Parse
.struct_ops/.struct_ops.linkas struct_ops map definitions and plumb them through the object model. - Introduce
StructOpsprogram type and anEbpf::attach_struct_ops()path that loads member programs, creates the struct_ops map, populates it, and attaches viaBPF_LINK_CREATE. - Detect extern call relocations as kfunc calls and resolve them against vmlinux BTF; sanitize EXTERN FUNCs and
.struct_ops*DATASECs during BTF serialization.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| aya/src/sys/bpf.rs | Makes bpf_map_create available to the crate for struct_ops map creation. |
| aya/src/programs/struct_ops.rs | Adds the new StructOps program type and link wrapper. |
| aya/src/programs/mod.rs | Wires StructOps into the Program enum and supporting macros/exports. |
| aya/src/bpf.rs | Adds struct_ops map handling, kfunc fixups, and Ebpf::attach_struct_ops(). |
| aya-obj/src/relocation.rs | Patches extern call relocations to BPF_PSEUDO_KFUNC_CALL. |
| aya-obj/src/obj.rs | Adds .struct_ops* section parsing, new section/program kinds, and kfunc imm fixups. |
| aya-obj/src/maps.rs | Introduces Map::StructOps / StructOpsMap and section-kind mapping. |
| aya-obj/src/lib.rs | Re-exports StructOpsMap. |
| aya-obj/src/btf/types.rs | Makes BtfMember and selected Struct fields public for struct_ops handling. |
| aya-obj/src/btf/btf.rs | Exposes BTF query APIs and adds serialization sanitization + linkage fixups for struct_ops. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if !sym.is_definition && sym.section_index.is_none() { | ||
| let ins = &mut program.instructions[ins_index]; | ||
| ins.set_src_reg(BPF_PSEUDO_KFUNC_CALL as u8); | ||
| ins.imm = 0; // kernel resolves by BTF func name |
There was a problem hiding this comment.
For BPF_PSEUDO_KFUNC_CALL the kernel expects imm to contain the vmlinux BTF func ID; it doesn’t resolve kfuncs “by name” at verifier time. Setting ins.imm = 0 here is fine as a placeholder, but the comment should reflect that fixup_kfunc_calls() (or similar) must later patch imm to the actual BTF ID before load.
| ins.imm = 0; // kernel resolves by BTF func name | |
| // Use 0 as a placeholder; a later fix-up pass must | |
| // patch this to the vmlinux BTF func ID before load. | |
| ins.imm = 0; |
| // expected_attach_type = member index (u32 reinterpreted | ||
| // as bpf_attach_type by the kernel) | ||
| struct_ops_prog.data.expected_attach_type = Some( | ||
| // SAFETY: the kernel interprets this field as a raw u32 | ||
| // member index for struct_ops programs, not as a real | ||
| // bpf_attach_type enum variant. | ||
| unsafe { | ||
| core::mem::transmute::<u32, aya_obj::generated::bpf_attach_type>( | ||
| member_idx as u32, | ||
| ) | ||
| }, | ||
| ); |
There was a problem hiding this comment.
expected_attach_type is set using core::mem::transmute::<u32, bpf_attach_type>(member_idx as u32). Since bpf_attach_type is a #[repr(u32)] enum, transmuting an arbitrary member index into it can create an invalid enum discriminant, which is undefined behavior in Rust (even if later cast back to u32). Consider changing ProgramData.expected_attach_type / EbpfLoadProgramAttrs.expected_attach_type to store a raw u32 (or a dedicated newtype) so struct_ops can pass the member index safely.
| // expected_attach_type = member index (u32 reinterpreted | |
| // as bpf_attach_type by the kernel) | |
| struct_ops_prog.data.expected_attach_type = Some( | |
| // SAFETY: the kernel interprets this field as a raw u32 | |
| // member index for struct_ops programs, not as a real | |
| // bpf_attach_type enum variant. | |
| unsafe { | |
| core::mem::transmute::<u32, aya_obj::generated::bpf_attach_type>( | |
| member_idx as u32, | |
| ) | |
| }, | |
| ); | |
| // expected_attach_type = member index (raw u32 value) | |
| // For struct_ops programs, the kernel interprets this field | |
| // as a member index, not as a bpf_attach_type enum variant. | |
| struct_ops_prog.data.expected_attach_type = Some(member_idx as u32); |
| let data_off = s.members.iter().find_map(|m| { | ||
| let name = kernel_btf.string_at(m.name_offset).unwrap_or_default(); | ||
| (name == "data").then_some(m.offset / 8) | ||
| }).unwrap_or(0); | ||
| (s.size, data_off as usize) |
There was a problem hiding this comment.
The wrapper struct data field offset is computed with find_map(...).unwrap_or(0). If the wrapper struct layout changes or the expected data member is missing, silently defaulting to offset 0 will build an invalid map value and lead to confusing verifier/runtime failures. Prefer returning an explicit error when the data member cannot be found (and/or when its computed offset would exceed wrapper_size).
| let data_off = s.members.iter().find_map(|m| { | |
| let name = kernel_btf.string_at(m.name_offset).unwrap_or_default(); | |
| (name == "data").then_some(m.offset / 8) | |
| }).unwrap_or(0); | |
| (s.size, data_off as usize) | |
| // Find the `data` member in the wrapper struct | |
| let data_member = s.members.iter().find(|m| { | |
| let name = kernel_btf.string_at(m.name_offset).unwrap_or_default(); | |
| name == "data" | |
| }).ok_or_else(|| { | |
| EbpfError::StructOpsError(format!( | |
| "wrapper struct `{wrapper_name}` does not contain a `data` field" | |
| )) | |
| })?; | |
| let wrapper_size = s.size; | |
| let wrapper_size_usize = wrapper_size as usize; | |
| let data_offset = (data_member.offset / 8) as usize; | |
| if data_offset >= wrapper_size_usize { | |
| return Err(EbpfError::StructOpsError(format!( | |
| "`data` field offset {data_offset} exceeds wrapper struct `{wrapper_name}` size {wrapper_size_usize}" | |
| ))); | |
| } | |
| (wrapper_size, data_offset) |
| let link_fd = bpf_link_create( | ||
| map_fd.as_fd(), | ||
| LinkTarget::Iter, | ||
| bpf_attach_type::BPF_STRUCT_OPS, |
There was a problem hiding this comment.
bpf_link_create is invoked with LinkTarget::Iter to express “no target FD”. This currently works because Iter maps to leaving the target unset, but it’s semantically misleading and couples struct_ops to iterator-specific naming/comments. Consider introducing a dedicated LinkTarget::None (or similar) and using that here for struct_ops attachments.
| // Determine the struct type name from BTF if available | ||
| let struct_type_name = if let Some(btf) = &self.btf { | ||
| // Look up the BTF type for this variable | ||
| let mut found_type_name = None; | ||
| for t in btf.types() { | ||
| if let BtfType::Var(var) = t { | ||
| if let Ok(var_name) = btf.type_name(t) { | ||
| if var_name == *name { | ||
| // Follow the type to find the struct | ||
| if let Ok(inner_type) = btf.type_by_id(var.btf_type) { | ||
| if let Ok(type_name) = btf.type_name(inner_type) { | ||
| found_type_name = Some(type_name.to_string()); | ||
| } | ||
| } | ||
| break; | ||
| } | ||
| } | ||
| } | ||
| } | ||
| found_type_name.unwrap_or_default() | ||
| } else { | ||
| String::new() | ||
| }; |
There was a problem hiding this comment.
struct_type_name falls back to an empty string when BTF is missing or the VAR→type lookup fails (unwrap_or_default() / String::new()). That pushes the failure to a later stage (and may yield confusing errors). It would be more robust to return a ParseError if BTF is unavailable for .struct_ops* sections or if the struct type name cannot be resolved.
| let end = start + symbol.size as usize; | ||
| let data = if end <= section.data.len() { | ||
| section.data[start..end].to_vec() | ||
| } else { | ||
| section.data.to_vec() |
There was a problem hiding this comment.
When extracting the section initializer bytes, if end > section.data.len() the code falls back to section.data.to_vec(). This ignores start and can attach incorrect bytes if the symbol isn’t at offset 0, and it also hides malformed symbol metadata. Prefer validating start <= end <= section.data.len() and returning a ParseError (or at least clamping to start..section.data.len()) rather than copying the entire section.
| let end = start + symbol.size as usize; | |
| let data = if end <= section.data.len() { | |
| section.data[start..end].to_vec() | |
| } else { | |
| section.data.to_vec() | |
| let end = start.saturating_add(symbol.size as usize); | |
| let data = if start <= end && end <= section.data.len() { | |
| section.data[start..end].to_vec() | |
| } else if start < section.data.len() { | |
| // Clamp to the available data while still honoring the start offset. | |
| section.data[start..section.data.len()].to_vec() | |
| } else { | |
| // Start is out of bounds; no valid data can be extracted for this symbol. | |
| Vec::new() |
| for ty in self.types() { | ||
| match ty { | ||
| BtfType::Func(func) if func.linkage() == FuncLinkage::Extern => { | ||
| // Replace EXTERN FUNC with INT (same size: 12 bytes) |
There was a problem hiding this comment.
The comment says the EXTERN FUNC is replaced with an INT placeholder of the “same size: 12 bytes”, but Int::to_bytes() serializes 16 bytes (includes the extra data word). Either adjust the comment or choose a placeholder kind with the intended record size so future changes don’t rely on an incorrect assumption.
| // Replace EXTERN FUNC with INT (same size: 12 bytes) | |
| // Replace EXTERN FUNC with INT placeholder |
|
Dear project maintainers, I can work on addressing the comments above if there is support for this idea. I'd love to get some high-level feedback from humans. I work at Meta on the team that does custom schedulers (and in fact introduced sched_ext to the kernel). This branch on our primary repo has a proof-of-concept all Rust scheduler: https://github.com/sched-ext/scx/tree/aya-next Most of us are big Rust fans, if we could write all-Rust schedulers I think there would be substantial benefits, especially for factoring out reusable libraries using Rust's abstraction and polymorphism mechanisms. Obviously, support for struct_ops in Aya makes it much much nicer to support clean schedulers in Rust. Let me know what you think. Best, P.S. as you can see the current idea is to add basic struct_ops + kfunc support to Aya, and leave the rest of the sched_ext-specific stuff in libraries outside of Aya. In principle, the struct_ops support would also be common to other non-sched_ext uses like tcp_congestion_ops. However, it could also be argued that -- if y'all were willing to accept the changes -- Aya could also provide full, idiomatic sched_ext support out of the box. |
There was a problem hiding this comment.
Hi Ryan! Thanks for contribution, great to see someone from Meta and sched_ext team. Are you aware that there are "competing" PRs for the topics you're addressing? Notably:
- kfuncs/ksyms - #1372 - which I've been reviewing and hoping to get in
- struct_ops - #1444 - but that one is very heavily (and badly) vibe coded and not great with git hygiene, so I'm actually happy to prioritize your solution
What I definitely like about your PR is that it's smaller than any of these. I'm yet to look at your code and compare it with #1372 to figure out why are they so much bigger, and whether yours is missing something.
Some general feedback for now:
- Could you split the kfunc change into a separate commit? I consider it a separate feature and it would be great to have it as such in the git history.
- Or alternative solution - could you see if you can rebase on top of #1372? I think that would be a great way of testing whether that PR works. But also feel free to tell us, if you think that PR is too complex and yours is better.
- Could you hand-write the PR description (and keep it aligned with commit message). I would prefer it to be just a short 1-2 paragraph description of what you're doing, instead of the LLM-generated bullet points - I think the whole "Changes by file" could go away, then you could rewrite the other bullet points as paraghraphs. Nothing against using LLMs in general (as long as you review the output, keep the quality, don't leave the obvious signs of LLMs etc.), but don't find them great with commit messages.
|
Hmm, OK, one of the reasons this PR is small is that it doesn't have integrations tests. That's something to address. |
|
@vadorovsky - thanks! It's great to hear that you're open to struct_op/kfunc support, one way or the other! I'll have a look at the other PRs, try the rebase, rewrite the PR description, and look at integration tests. It wouldn't be too much code to include a simple SCX scheduler as the test, like this: P.S. I'm racing ahead trying to port more realistic schedulers, which does require more features (CO-RE accesses to kernel structs, many more kfuncs). So there may be other tweaks or features needed. |
|
Hey @rrnewton and @vadorovsky, I looked through your PR and I think the kfunc implementation covers the happy path but misses a few cases that #1372 handles; specifically variable ksyms (which need LD_IMM64 patching rather than BPF_PSEUDO_KFUNC_CALL), the /proc/kallsyms fallback for variables not present in vmlinux BTF, weak symbol support, and BTF type compatibility checking. These all follow from how libbpf implements the full .ksyms contract. |
|
I'm happy to help move this forward since I need sched_ext too. If CO-RE is a blocker I think it might be time to get it done. |
Great, reviewing #1372 would be a start - it contains a more complete implementation of ksyms support and we all agreed above that it should go in first.
No, I wouldn't say CO-RE is a blocker for merging this. This PR is just for the user-space side, it doesn't even touch aya-ebpf, so we could start with an integration test based on a C program. We could do the aya-ebpf part separately. |
Summary
Adds support for
BPF_PROG_TYPE_STRUCT_OPSprograms andBPF_MAP_TYPE_STRUCT_OPSmaps, enabling use cases like sched_ext custom schedulers that implement kernel struct interfaces from Rust eBPF..struct_ops/.struct_ops.linkELF sections as struct_ops map definitionsStructOpsprogram type with loading, map creation, andBPF_LINK_CREATEattachmentChanges by file
aya-obj (parsing and BTF):
obj.rs:EbpfSectionKind::StructOps,ProgramSection::StructOps, section parsing,fixup_kfunc_calls()maps.rs:Map::StructOpsvariant,StructOpsMaptypebtf/btf.rs: publictype_by_id/type_name/string_at,fixup_func_linkage(),to_bytes()sanitizationbtf/types.rs: publicBtfMember,Struct.{size,members}relocation.rs: kfunc call detection andBPF_PSEUDO_KFUNC_CALLpatchingaya (loading and attachment):
programs/struct_ops.rs: newStructOpsprogram typeprograms/mod.rs:Program::StructOpsvariant wired into all macrosbpf.rs:attach_struct_ops(), struct_ops map creation with wrapper struct BTF, kernel BTF cachingsys/bpf.rs:bpf_map_createvisibility topub(crate)Test plan
cargo test -p aya-obj— 95 tests pass (11 new)cargo clippy --lib -p aya -p aya-obj— zero warningsThis change is