Skip to content

Fix VMBus channel count validation for netvsc devices#4513

Open
knelsonmeister wants to merge 2 commits into
mainfrom
knelsonmeister/lsvmbus
Open

Fix VMBus channel count validation for netvsc devices#4513
knelsonmeister wants to merge 2 commits into
mainfrom
knelsonmeister/lsvmbus

Conversation

@knelsonmeister

@knelsonmeister knelsonmeister commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

Summary

Fix the verify_vmbus_count test failing on VMs with >16 vCPUs where the test expected 8 channels but observed 16.

Root Cause

Linux commit 646f071d315b changed:

  • VRSS_CHANNEL_DEFAULT from 8 to 16
  • Default num_chn to max(VRSS_CHANNEL_DEFAULT, netif_get_num_default_rss_queues())

Since DEFAULT_MAX_NUM_RSS_QUEUES = 8 caps the return of netif_get_num_default_rss_queues(), the effective formula for VMs with >16 vCPUs is always max(16, ≤8) = 16 channels.

Fix

Update the validation logic to accept channel counts matching either:

  • Pre-646f071d formula: min(64, max(16, core_count // 2))
  • Post-646f071d formula: min(64, max(16, thread_count // 2))

Both formulas yield 16 for typical VM sizes but differ on VMs where core_count ≠ thread_count (SMT) or very large VMs.

Testing

VM Size Arch Kernel Channels Result
D64ds_v6 x86_64 6.8.0-1052-azure 16 PASS
D64ds_v6 x86_64 6.17.0-1017-azure 16 PASS
D64ds_v6 x86_64 6.14.0-1017-azure 16 PASS
D32ps_v5 ARM64 6.8.0-1052-azure 16 PASS

Related

The verify_vmbus_count test was failing because it expected 8 channels
(based on legacy VRSS_CHANNEL_DEFAULT=8) but observed 16 channels on
VMs with >16 vCPUs.

Root cause: Linux commit 646f071d315b changed VRSS_CHANNEL_DEFAULT from
8 to 16 and added netif_get_num_default_rss_queues() to the channel
calculation. However, DEFAULT_MAX_NUM_RSS_QUEUES (8) caps that function's
return value, so max(16, <=8) = 16 always for >16 vCPU VMs.

Fix: Update the validation logic to accept channel counts matching either:
- Pre-646f071d formula: min(64, max(16, core_count // 2))
- Post-646f071d formula: min(64, max(16, thread_count // 2))

Both formulas yield 16 for typical VM sizes, but differ on VMs where
core_count != thread_count (SMT) or very large VMs. This handles the
case where different Azure kernel versions may be present.

Tested on: D64ds_v6 (x86_64, 64 vCPUs), D32ps_v5 (ARM64, 32 vCPUs)
with Ubuntu 22.04, 24.04, 25.04 kernels - all PASSED.

Fixes: ADO Bug 57291856
Copilot AI review requested due to automatic review settings June 3, 2026 01:14

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates LISA’s VMBus channel-count validation for netvsc devices to handle the Linux kernel change that increases default channel counts on larger VM sizes, preventing false failures in verify_vmbus_devices_channels on VMs with >16 vCPUs.

Changes:

  • Expands the netvsc expected-channel validation to accept both pre- and post-646f071d315b formulas for vCPU counts >16.
  • Improves in-test diagnostics/logging to clearly indicate which formula matched (or that neither matched).

@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown

✅ AI Test Selection — PASSED

1 test case(s) selected (view run)

Marketplace image: canonical 0001-com-ubuntu-server-jammy 22_04-lts-gen2 latest

Count
✅ Passed 1
❌ Failed 0
⏭️ Skipped 0
Total 1
Test case details
Test Case Status Time (s) Message
verify_vmbus_devices_channels (lisa_0_0) ✅ PASSED 19.853

new_expected = min(64, max(16, thread_count // 2))
old_expected = min(64, max(16, core_count // 2))
if actual_channels == old_expected:
expected_network_channel_count = old_expected

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice compatibility fix. I think this would be easier to read if we computed the allowed channel counts directly instead of updating expected_network_channel_count through fallback branches.

For example:

allowed = {min(thread_count, 8)}
if thread_count <= 16:
    allowed.add(thread_count)
else:
    allowed.add(min(64, max(16, core_count // 2)))
    allowed.add(min(64, max(16, thread_count // 2)))

assert_that(actual_channels).is_in(allowed)

That makes the old/new kernel behavior explicit and keeps the core_count vs thread_count logic easier to follow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants