Skip to content

nvme: fix io_count underflow panic on invalid namespace commands#3474

Open
jstarks wants to merge 2 commits into
microsoft:mainfrom
jstarks:nvme_fix
Open

nvme: fix io_count underflow panic on invalid namespace commands#3474
jstarks wants to merge 2 commits into
microsoft:mainfrom
jstarks:nvme_fix

Conversation

@jstarks
Copy link
Copy Markdown
Member

@jstarks jstarks commented May 13, 2026

When an I/O command targets a non-existent namespace, the I/O worker creates an inline completion without incrementing io_count. However, the completion-processing code unconditionally decremented io_count, causing a subtraction overflow panic.

This was a pre-existing bug that became exposed when the AMD IOMMU is active, because the NVMe controller's queues may be probed before the guest IOMMU driver has fully programmed translation tables.

Fix by tracking whether each completion originated from a dispatched I/O (which incremented io_count) or from an inline response (which did not). Only decrement io_count for dispatched I/Os.

Add a regression test that creates an I/O queue pair and sends a read command to a non-existent namespace (NSID=0xFFFF), verifying it returns INVALID_NAMESPACE_OR_FORMAT without panicking.

When an I/O command targets a non-existent namespace, the I/O worker
creates an inline completion without incrementing io_count. However,
the completion-processing code unconditionally decremented io_count,
causing a subtraction overflow panic.

This was a pre-existing bug that became exposed when the AMD IOMMU is
active, because the NVMe controller's queues may be probed before the
guest IOMMU driver has fully programmed translation tables.

Fix by tracking whether each completion originated from a dispatched I/O
(which incremented io_count) or from an inline response (which did not).
Only decrement io_count for dispatched I/Os.

Add a regression test that creates an I/O queue pair and sends a read
command to a non-existent namespace (NSID=0xFFFF), verifying it returns
INVALID_NAMESPACE_OR_FORMAT without panicking.
Copilot AI review requested due to automatic review settings May 13, 2026 18:15
@jstarks jstarks requested review from a team as code owners May 13, 2026 18:15
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR attempts to fix NVMe I/O worker io_count underflow when invalid namespace commands are completed inline, and adds a regression-style controller test.

Changes:

  • Tracks whether a completion should decrement io_count.
  • Skips decrementing for invalid namespace inline completions.
  • Adds an invalid-namespace I/O command test.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
vm/devices/storage/nvme/src/workers/io.rs Updates I/O completion handling to conditionally decrement io_count.
vm/devices/storage/nvme_test/src/tests/controller_tests.rs Adds a test for invalid namespace I/O command completion behavior.

Comment thread vm/devices/storage/nvme/src/workers/io.rs Outdated
Comment thread vm/devices/storage/nvme_test/src/tests/controller_tests.rs Outdated
Comment thread vm/devices/storage/nvme_test/src/tests/controller_tests.rs Outdated
Comment thread vm/devices/storage/nvme/src/workers/io.rs Outdated
PR review surfaced two related ways the I/O worker could underflow or leak sq.io_count. Inline invalid-namespace completions skipped the io_count increment on dispatch, and the worker relied on a per-completion flag to skip the matching decrement on the posting paths. The flag was set incorrectly for completions that took the slow path through state.completions and were drained later as Event::CompletionReady: the decrement was skipped even though dispatch had already incremented, leaking io_count and eventually throttling the SQ permanently once the leak reached MAX_IO_QUEUE_DEPTH. The delete-SQ path had the symmetric bug, unconditionally decrementing io_count for queued completions that never incremented it, causing a subtraction-overflow panic.

Both bugs disappear if the inline invalid-namespace path also briefly increments io_count. The flag and its match-time bookkeeping go away, the decrement at the bottom of the loop becomes unconditional, and delete_sq no longer has to special-case inline completions. The increment now lives at the top of the Event::Sq arm so both branches share it.

Three regression tests live in the nvme crate alongside the bug rather than in nvme_test, which has its own copy of the I/O worker that doesn't share the buggy code. The first exercises the basic inline path; the second deletes an SQ while inline completions sit in state.completions; the third saturates a single-entry CQ with sixteen FLUSH commands against a ram-disk-backed namespace, which stalls the SQ after roughly nine completions without the fix. The corresponding (incorrectly-placed) test in nvme_test is removed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants