nvme: fix io_count underflow panic on invalid namespace commands#3474
Open
jstarks wants to merge 2 commits into
Open
nvme: fix io_count underflow panic on invalid namespace commands#3474jstarks wants to merge 2 commits into
jstarks wants to merge 2 commits into
Conversation
When an I/O command targets a non-existent namespace, the I/O worker creates an inline completion without incrementing io_count. However, the completion-processing code unconditionally decremented io_count, causing a subtraction overflow panic. This was a pre-existing bug that became exposed when the AMD IOMMU is active, because the NVMe controller's queues may be probed before the guest IOMMU driver has fully programmed translation tables. Fix by tracking whether each completion originated from a dispatched I/O (which incremented io_count) or from an inline response (which did not). Only decrement io_count for dispatched I/Os. Add a regression test that creates an I/O queue pair and sends a read command to a non-existent namespace (NSID=0xFFFF), verifying it returns INVALID_NAMESPACE_OR_FORMAT without panicking.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR attempts to fix NVMe I/O worker io_count underflow when invalid namespace commands are completed inline, and adds a regression-style controller test.
Changes:
- Tracks whether a completion should decrement
io_count. - Skips decrementing for invalid namespace inline completions.
- Adds an invalid-namespace I/O command test.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
vm/devices/storage/nvme/src/workers/io.rs |
Updates I/O completion handling to conditionally decrement io_count. |
vm/devices/storage/nvme_test/src/tests/controller_tests.rs |
Adds a test for invalid namespace I/O command completion behavior. |
PR review surfaced two related ways the I/O worker could underflow or leak sq.io_count. Inline invalid-namespace completions skipped the io_count increment on dispatch, and the worker relied on a per-completion flag to skip the matching decrement on the posting paths. The flag was set incorrectly for completions that took the slow path through state.completions and were drained later as Event::CompletionReady: the decrement was skipped even though dispatch had already incremented, leaking io_count and eventually throttling the SQ permanently once the leak reached MAX_IO_QUEUE_DEPTH. The delete-SQ path had the symmetric bug, unconditionally decrementing io_count for queued completions that never incremented it, causing a subtraction-overflow panic. Both bugs disappear if the inline invalid-namespace path also briefly increments io_count. The flag and its match-time bookkeeping go away, the decrement at the bottom of the loop becomes unconditional, and delete_sq no longer has to special-case inline completions. The increment now lives at the top of the Event::Sq arm so both branches share it. Three regression tests live in the nvme crate alongside the bug rather than in nvme_test, which has its own copy of the I/O worker that doesn't share the buggy code. The first exercises the basic inline path; the second deletes an SQ while inline completions sit in state.completions; the third saturates a single-entry CQ with sixteen FLUSH commands against a ram-disk-backed namespace, which stalls the SQ after roughly nine completions without the fix. The corresponding (incorrectly-placed) test in nvme_test is removed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
When an I/O command targets a non-existent namespace, the I/O worker creates an inline completion without incrementing io_count. However, the completion-processing code unconditionally decremented io_count, causing a subtraction overflow panic.
This was a pre-existing bug that became exposed when the AMD IOMMU is active, because the NVMe controller's queues may be probed before the guest IOMMU driver has fully programmed translation tables.
Fix by tracking whether each completion originated from a dispatched I/O (which incremented io_count) or from an inline response (which did not). Only decrement io_count for dispatched I/Os.
Add a regression test that creates an I/O queue pair and sends a read command to a non-existent namespace (NSID=0xFFFF), verifying it returns INVALID_NAMESPACE_OR_FORMAT without panicking.