Fix VMBus channel count validation for netvsc devices#4513
Fix VMBus channel count validation for netvsc devices#4513knelsonmeister wants to merge 2 commits into
Conversation
The verify_vmbus_count test was failing because it expected 8 channels (based on legacy VRSS_CHANNEL_DEFAULT=8) but observed 16 channels on VMs with >16 vCPUs. Root cause: Linux commit 646f071d315b changed VRSS_CHANNEL_DEFAULT from 8 to 16 and added netif_get_num_default_rss_queues() to the channel calculation. However, DEFAULT_MAX_NUM_RSS_QUEUES (8) caps that function's return value, so max(16, <=8) = 16 always for >16 vCPU VMs. Fix: Update the validation logic to accept channel counts matching either: - Pre-646f071d formula: min(64, max(16, core_count // 2)) - Post-646f071d formula: min(64, max(16, thread_count // 2)) Both formulas yield 16 for typical VM sizes, but differ on VMs where core_count != thread_count (SMT) or very large VMs. This handles the case where different Azure kernel versions may be present. Tested on: D64ds_v6 (x86_64, 64 vCPUs), D32ps_v5 (ARM64, 32 vCPUs) with Ubuntu 22.04, 24.04, 25.04 kernels - all PASSED. Fixes: ADO Bug 57291856
There was a problem hiding this comment.
Pull request overview
Updates LISA’s VMBus channel-count validation for netvsc devices to handle the Linux kernel change that increases default channel counts on larger VM sizes, preventing false failures in verify_vmbus_devices_channels on VMs with >16 vCPUs.
Changes:
- Expands the
netvscexpected-channel validation to accept both pre- and post-646f071d315bformulas for vCPU counts >16. - Improves in-test diagnostics/logging to clearly indicate which formula matched (or that neither matched).
✅ AI Test Selection — PASSED1 test case(s) selected (view run) Marketplace image: canonical 0001-com-ubuntu-server-jammy 22_04-lts-gen2 latest
Test case details
|
| new_expected = min(64, max(16, thread_count // 2)) | ||
| old_expected = min(64, max(16, core_count // 2)) | ||
| if actual_channels == old_expected: | ||
| expected_network_channel_count = old_expected |
There was a problem hiding this comment.
Nice compatibility fix. I think this would be easier to read if we computed the allowed channel counts directly instead of updating expected_network_channel_count through fallback branches.
For example:
allowed = {min(thread_count, 8)}
if thread_count <= 16:
allowed.add(thread_count)
else:
allowed.add(min(64, max(16, core_count // 2)))
allowed.add(min(64, max(16, thread_count // 2)))
assert_that(actual_channels).is_in(allowed)
That makes the old/new kernel behavior explicit and keeps the core_count vs thread_count logic easier to follow.
Summary
Fix the
verify_vmbus_counttest failing on VMs with >16 vCPUs where the test expected 8 channels but observed 16.Root Cause
Linux commit 646f071d315b changed:
VRSS_CHANNEL_DEFAULTfrom 8 to 16num_chntomax(VRSS_CHANNEL_DEFAULT, netif_get_num_default_rss_queues())Since
DEFAULT_MAX_NUM_RSS_QUEUES = 8caps the return ofnetif_get_num_default_rss_queues(), the effective formula for VMs with >16 vCPUs is alwaysmax(16, ≤8) = 16channels.Fix
Update the validation logic to accept channel counts matching either:
min(64, max(16, core_count // 2))min(64, max(16, thread_count // 2))Both formulas yield 16 for typical VM sizes but differ on VMs where core_count ≠ thread_count (SMT) or very large VMs.
Testing
Related