Skip to content

[wip] emulated SMMU#3458

Draft
jstarks wants to merge 8 commits into
microsoft:mainfrom
jstarks:smmu
Draft

[wip] emulated SMMU#3458
jstarks wants to merge 8 commits into
microsoft:mainfrom
jstarks:smmu

Conversation

@jstarks
Copy link
Copy Markdown
Member

@jstarks jstarks commented May 11, 2026

No description provided.

Copilot AI review requested due to automatic review settings May 11, 2026 21:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR adds support for an emulated SMMUv3 on aarch64 and updates PCIe MSI routing to support GICv3 ITS (device-id based routing) in addition to GICv2m.

Changes:

  • Introduces SMMUv3 emulation (spec types + translation logic) and plumbs per-device bus-range identity to support ITS/SMMU requester/device ID composition.
  • Adds ACPI IORT generation (and DT iommu-map) for PCIe interrupt/DMA remapping; adds MADT ITS entry and backend ITS capability detection (KVM).
  • Updates MSI/irqfd plumbing to carry an optional device identity (devid) end-to-end.

Reviewed changes

Copilot reviewed 70 out of 71 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
vmm_core/vmotherboard/src/lib.rs Re-exports PCIe bus-range identity type for consumers.
vmm_core/vmotherboard/src/chipset/builder/mod.rs Threads optional PCIe device identity through builder registrations.
vmm_core/vmotherboard/src/chipset/backing/arc_mutex/services.rs Extends PCIe registration API to accept optional device identity.
vmm_core/vmotherboard/src/chipset/backing/arc_mutex/pci.rs Stores and forwards optional device identity during PCIe bus resolution.
vmm_core/vmotherboard/src/chipset/backing/arc_mutex/device.rs Adds builder hook to attach PCIe bus-range identity to devices.
vmm_core/vmotherboard/src/base_chipset.rs Forwards optional device identity into PCIe enumerator device attach.
vmm_core/virt_whp/src/synic.rs Adapts SignalMsi signature to optional device identity.
vmm_core/virt_whp/src/lib.rs Switches to topology-provided MSI controller config; advertises ITS support=false.
vmm_core/virt_whp/src/device.rs Adapts SignalMsi signature to optional device identity.
vmm_core/virt_mshv/src/x86_64/mod.rs Adapts SignalMsi signature to optional device identity.
vmm_core/virt_mshv/src/irqfd.rs Adapts IrqFdRoute::enable signature to accept optional device identity.
vmm_core/virt_mshv/src/aarch64/mod.rs Adapts MSI signaling for new SignalMsi API; advertises ITS support=false.
vmm_core/virt_kvm/src/lib.rs Stores MSI controller config and ITS device FD; prepares KVM backend for ITS.
vmm_core/virt_kvm/src/gsi.rs Plumbs optional devid into KVM irq routing builder path.
vmm_core/virt_kvm/src/arch/x86_64/mod.rs Sets devid=None for x86 MSI routes; adapts SignalMsi signature.
vmm_core/virt_kvm/src/arch/aarch64/mod.rs Probes ITS support, creates in-kernel ITS, adds ITS irqfd/MSI routing support.
vmm_core/virt_hvf/src/lib.rs Advertises ITS support=false.
vmm_core/virt/src/x86/apic_software_device.rs Adapts MSI forwarding to new SignalMsi API.
vmm_core/virt/src/generic.rs Extends PlatformInfo with ITS capability and adapts SignalMsi signature.
vmm_core/virt/src/aarch64/gic_v2m.rs Adapts SignalMsi signature to optional device identity.
vmm_core/virt/src/aarch64/gic_software_device.rs Adapts SignalMsi signature to optional device identity.
vmm_core/src/device_builder.rs Accepts per-device bus-range identity and passes into PCIe device builder.
vmm_core/src/acpi_builder.rs Adds IORT construction + SMMU config, MADT ITS entries, and extensive tests.
vm/vmcore/vm_topology/src/processor/aarch64.rs Replaces gic_v2m with gic_msi controller enum (None/V2m/Its).
vm/vmcore/src/irqfd.rs Extends irqfd route enable API with optional device identity.
vm/kvm/src/lib.rs Adds MSI route devid support and propagates flags into KVM irq routing.
vm/devices/virtio/virtio/src/transport/core.rs Forces access_platform feature bit for virtio devices behind an IOMMU.
vm/devices/user_driver_emulated_mock/src/lib.rs Updates MSI controller mock to ignore device identity.
vm/devices/storage/nvme_test/src/tests/test_helpers.rs Updates MSI test helper to new SignalMsi signature.
vm/devices/storage/nvme/src/tests/test_helpers.rs Updates MSI test helper to new SignalMsi signature.
vm/devices/pci/vpci/src/test_helpers/mod.rs Updates MSI test helper to new SignalMsi signature.
vm/devices/pci/pcie/src/switch.rs Uses port-side-effecting cfg write path; plumbs optional bus-range identity.
vm/devices/pci/pcie/src/root.rs Plumbs optional bus-range identity, ensures port tracks bus-range on cfg writes, adds tests.
vm/devices/pci/pcie/src/port.rs Adds shared assigned-bus-range tracking and cfg-write side effects.
vm/devices/pci/pcie/src/lib.rs Exposes new bus_range + its modules.
vm/devices/pci/pcie/src/its.rs Adds ITS wrappers for SignalMsi and IrqFd that inject device IDs.
vm/devices/pci/pcie/src/bus_range.rs Adds shared atomic bus-range tracking and device/stream ID composition helpers.
vm/devices/pci/pcie/fuzz/fuzz_pcie.rs Updates fuzz harness for new PCIe add-device signature.
vm/devices/pci/pcie/Cargo.toml Adds pal_event dependency for irqfd route wrapper event access.
vm/devices/pci/pci_core/src/test_helpers/mod.rs Updates MSI test helper to new SignalMsi signature.
vm/devices/pci/pci_core/src/msi.rs Updates SignalMsi API; adds route/target helpers to pass optional device identity.
vm/devices/pci/pci_core/src/capabilities/msix.rs Updates MSI-X delivery to new MsiTarget API.
vm/devices/iommu/smmu/src/translate.rs Adds SMMUv3 STE/CD lookup and stage-1 page table walker + tests.
vm/devices/iommu/smmu/src/spec/ste.rs Adds SMMUv3 STE layout/types + tests.
vm/devices/iommu/smmu/src/spec/registers.rs Adds SMMUv3 register offsets/bitfields + tests.
vm/devices/iommu/smmu/src/spec/pt.rs Adds AArch64 stage-1 page table descriptor helpers + tests.
vm/devices/iommu/smmu/src/spec/mod.rs Exposes SMMU spec modules.
vm/devices/iommu/smmu/src/spec/events.rs Adds SMMU event queue entry types + constructors + tests.
vm/devices/iommu/smmu/src/spec/commands.rs Adds SMMU command queue entry types + helpers + tests.
vm/devices/iommu/smmu/src/spec/cd.rs Adds SMMU context descriptor layout/types + tests.
vm/devices/iommu/smmu/src/lib.rs Introduces new smmu crate module surface.
vm/devices/iommu/smmu/Cargo.toml Adds new smmu crate definition + dependencies.
vm/acpi_spec/src/madt.rs Adds MADT GIC ITS structure support.
vm/acpi_spec/src/lib.rs Exposes new ACPI IORT module.
vm/acpi_spec/src/iort.rs Adds IORT node/mapping structures used by ACPI builder.
tmk/tmk_vmm/src/run.rs Updates aarch64 platform config to use gic_msi.
openvmm/openvmm_entry/src/lib.rs Adds CLI/config wiring for GIC MSI controller selection and SMMU instances.
openvmm/openvmm_entry/src/cli_args.rs Adds --gic-msi and --smmu CLI flags for aarch64.
openvmm/openvmm_defs/src/config.rs Adds defaults for ITS/SMMU MMIO layout and SMMU/GIC MSI config structs.
openvmm/openvmm_core/src/worker/vm_loaders/linux.rs Builds DT with ITS and SMMU nodes + iommu-map; passes SMMU configs.
openvmm/openvmm_core/src/worker/dispatch.rs Selects ITS vs v2m, instantiates SMMU devices, wraps per-device MSI/irqfd/memory.
openvmm/openvmm_core/Cargo.toml Adds smmu dependency to OpenVMM core.
openhcl/virt_mshv_vtl/src/lib.rs Updates SignalMsi implementation signature.
openhcl/underhill_core/src/loader/mod.rs Extends loader config to include (placeholder) SMMU base field.
openhcl/bootloader_fdt_parser/src/lib.rs Updates parsed platform config to use gic_msi.
Guide/src/reference/emulated/pcie/overview.md Documents aarch64 MSI routing via ITS vs v2m and the new CLI flag.
Guide/src/reference/devices/firmware/linux_direct.md Updates docs to mention ITS/IORT in ACPI mode for PCIe routing.
Cargo.toml Adds new workspace crate smmu.
Comments suppressed due to low confidence (4)

vmm_core/src/acpi_builder.rs:1

  • The IORT RC mapping logic uses a global rc_mapping_count and defaults an unmapped RC to its_group_offset even when there is no ITS. If has_smmu == true and has_its == false (and not every RC is covered by an SMMU), RCs without an SMMU will incorrectly map to offset IORT_NODE_OFFSET (which will be the first SMMU node), effectively claiming they are behind the wrong SMMU. Fix by computing the mapping count and target per root complex: emit an RC ID mapping only if that RC has an SMMU offset, or if an ITS is actually present; otherwise set that RC node’s mapping_count to 0 and append no IortIdMapping entry.
    vmm_core/src/acpi_builder.rs:1
  • The IORT RC mapping logic uses a global rc_mapping_count and defaults an unmapped RC to its_group_offset even when there is no ITS. If has_smmu == true and has_its == false (and not every RC is covered by an SMMU), RCs without an SMMU will incorrectly map to offset IORT_NODE_OFFSET (which will be the first SMMU node), effectively claiming they are behind the wrong SMMU. Fix by computing the mapping count and target per root complex: emit an RC ID mapping only if that RC has an SMMU offset, or if an ITS is actually present; otherwise set that RC node’s mapping_count to 0 and append no IortIdMapping entry.
    vmm_core/src/acpi_builder.rs:1
  • The test suite exercises IORT generation with ITS and with SMMU+ITS, but doesn’t cover the important configuration where has_smmu == true and has_its == false (including the case where only a subset of RCs are covered by SMMUs). Adding tests for “SMMU without ITS” and “partial RC coverage” would catch incorrect RC mapping counts/targets (and would have exposed the current incorrect unwrap_or(its_group_offset) fallback when no ITS exists).
    vm/devices/pci/pci_core/src/capabilities/msix.rs:217
  • With the new optional devid plumbing intended for ITS routing, this MSI-X delivery path always signals with devid=None, which prevents identifying the correct PCI function for multi-function devices (where ITS device ID must include the function number). If multi-function endpoints are in scope for ITS mode, consider extending the MSI-X interrupt target state to carry the function’s BDF (or RID) and signaling with signal_msi_with_rid(...) (or passing Some(bdf) down to the ITS wrapper) so the composed ITS device ID is accurate.
    fn deliver(&self) {
        let mut state = self.0.lock();
        if state.enabled {
            state.target.signal_msi(state.address, state.data);
        } else {
            state.pending = true;
        }
    }

Comment on lines +502 to +503
// through its SMMU instance.
node = node.add_u32_array(p_iommu_map, &[0, *phandle, 0, 0x10000])?;
Comment on lines +206 to +225
fn compute_start_level(tg0: Tg0, t0sz: u8) -> Option<(u8, u8)> {
let va_bits = 64u8.checked_sub(t0sz)?;
let bits_per_level = tg0.bits_per_level()?;
let page_shift = tg0.page_shift()?;

// Number of address bits resolved by the page table walk (excluding page
// offset). For 4K/9 bits per level: va_bits - 12 bits are resolved by
// the walk.
let resolve_bits = va_bits.checked_sub(page_shift)?;

// Number of full levels needed = ceil(resolve_bits / bits_per_level).
// Start level = 4 - num_levels (levels are numbered 0..3).
let num_levels = resolve_bits.div_ceil(bits_per_level);
if num_levels > 4 {
return None;
}
let start_level = 4 - num_levels;

Some((start_level, va_bits))
}
Comment on lines 176 to 179
if state.pending {
state.target.signal_msi(0, address, data);
state.target.signal_msi(address, data);
state.pending = false;
}
@github-actions github-actions Bot added the unsafe Related to unsafe code label May 11, 2026
@github-actions
Copy link
Copy Markdown

⚠️ Unsafe Code Detected

This PR modifies files containing unsafe Rust code. Extra scrutiny is required during review.

For more on why we check whole files, instead of just diffs, check out the Rustonomicon

jstarks added 8 commits May 13, 2026 11:04
The previous MSI architecture required each ITS wrapper to carry its own
AssignedBusRange and perform BDF resolution internally, and the
MsiConnection had to be constructed with an IrqFd upfront. This made
it impossible to wire MSI for PCIe switch downstream ports (they did not
have access to the right bus range or signal target at construction
time), breaking hotplug on switches and creating a tangle of push-based
state synchronization (set_rid, write_cfg, sync_msi_rid) in the
port layer.

This change restructures the MSI model around two principles:

1. Lazy BDF resolution: MsiConnection::new(bus_range, devfn) takes
   a bus range at construction. When a device signals an MSI with
   devid = None, the MsiTarget resolves the BDF from the bus range
   current secondary bus and the configured devfn. This means the guest
   can reprogram bus numbers and MSI delivery automatically picks up the
   new values -- no push-based synchronization needed.

2. Late-bind connect: Both SignalMsi and IrqFd are connected
   after construction via connect() and connect_irqfd(), not passed
   at creation time. This separates device resolution (which needs the
   target) from interrupt wiring (which needs platform knowledge like
   whether ITS is active), and allows the same pattern for all device
   types.

The ITS wrappers (ItsSignalMsi, ItsIrqFd) are simplified to pure
segment prependers -- they just compose (segment << 16) | bdf from the
already-resolved BDF. They no longer carry bus ranges or perform range
validation.

For the switch, GenericPcieSwitchDefinition now takes an MsiTarget
instead of an MsiConnection. The switch uses MsiTarget::with_bus_range
to re-derive targets using the upstream port bus range, then
with_devfn for each downstream port. This means switch downstream
ports get properly wired MSI targets that share the parent connection
SignalMsi and IrqFd -- fixing hotplug on switches.

The resolve_and_add_pci_device helper is simplified to take &MsiTarget
directly, with callers owning the MsiConnection and handling connect
calls themselves.
Copilot AI review requested due to automatic review settings May 13, 2026 21:01
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 50 out of 51 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (5)

vm/devices/pci/pci_core/src/msi.rs:1

  • devid = None is always resolved into a concrete BDF and forwarded as Some(resolved). This breaks callers/wrappers that rely on None meaning “identity not yet available” (e.g. your updated pcie::its::ItsSignalMsi explicitly drops MSIs when devid is None). With the current code, devices will signal with bus=0 before secondary bus assignment, potentially producing incorrect device IDs and hard-to-debug interrupt routing. Consider preserving None semantics by (a) passing through None when the default bus-range is unassigned (e.g. secondary bus == 0), or (b) forwarding devid unchanged and moving ‘default BDF’ resolution to callers that truly want it.
    vm/devices/iommu/smmu/src/translate.rs:1
  • compute_start_level can return start_level = 4 when resolve_bits == 0 (e.g., VA bits equal to granule page shift). walk_s1 then uses level = start_level and computes shift = page_shift + (3 - level) * bits_per_level, which underflows for level=4 and yields a huge shift amount. This is a concrete correctness bug and can lead to incorrect indexing or panics. Fix by rejecting configurations where resolve_bits == 0 (or any invalid t0sz range for the selected granule) so start_level is always in 0..=3, and/or by using checked arithmetic in the shift computation and returning an F_TRANSLATION fault when parameters are invalid.
// Copyright (c) Microsoft Corporation.

vmm_core/src/device_builder.rs:1

  • build_vpci_device now creates an MsiConnection locally and passes only msi_conn.target() into device resolution, but the MsiConnection is dropped at the end of the function and there is no subsequent connect(...) / connect_irqfd(...). That means the vPCI device’s MSI target will remain disconnected and cannot be wired up later. To fix, either (1) accept an msi_target: &MsiTarget (similar to build_pcie_device) from the caller that owns/keeps the MsiConnection, or (2) restore returning the MsiConnection so the caller can connect it after building.
    vmm_tests/vmm_tests/tests/tests/multiarch/pcie.rs:1
  • This writes directly to the first nvme* block device detected. On some guests that can be the boot/root disk or otherwise mounted, making the test destructive/flaky (and potentially corrupting the VM state before shutdown). Recommend selecting the specific NVMe behind the SMMU root complex via a stable path (e.g., /dev/disk/by-path for the segment/port), and additionally filtering out any device that backs / (e.g., compare against findmnt -n -o SOURCE / / lsblk -no NAME,MOUNTPOINT).
    vmm_core/src/acpi_builder.rs:1
  • The updated comment for id_count no longer matches the earlier note in this code path that referenced the IORT spec’s ‘minus 1’ behavior. If IortIdMapping::new expects the IORT-defined encoding (commonly “number of IDs minus 1”), then 0xFFFF represents 0x10000 IDs, but the current comment reads like it represents exactly 0xFFFF IDs. Please align the comment with the actual IORT field semantics (and/or rename the constructor parameter) to avoid future off-by-one regressions.

Comment on lines +126 to +149
/// Returns the output address for a 16KB granule.
pub fn output_address_16k(&self, level: u8) -> u64 {
let raw = self.addr_bits() << 12;
match level {
// L1 block: 32MB (bits [47:25]), but 16K L1 blocks are unusual
1 => {
if self.is_block() {
raw & !((1u64 << 25) - 1)
} else {
raw
}
}
// L2 block: 32MB (bits [47:25])
2 => {
if self.is_block() {
raw & !((1u64 << 25) - 1)
} else {
raw
}
}
3 => raw, // page address, 16KB aligned
_ => raw,
}
}
Comment on lines +3527 to +3533
if let Some(shared) = smmu_shared {
let inner_msi =
base_signal_msi.unwrap_or_else(|| partition.as_signal_msi(Vtl::Vtl0).unwrap());
let (translating_gm, smmu_msi) =
shared.create_device_context(bus_range.clone(), 0, gm, inner_msi);
let irqfd =
base_irqfd.map(|fd| shared.create_irqfd(0, fd) as Arc<dyn vmcore::irqfd::IrqFd>);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Guide unsafe Related to unsafe code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants