Skip to content

Crucial fixes for xfstests and kernel_source_packager code #4526

Open
psachdeva-ms wants to merge 7 commits into
microsoft:mainfrom
psachdeva-ms:master
Open

Crucial fixes for xfstests and kernel_source_packager code #4526
psachdeva-ms wants to merge 7 commits into
microsoft:mainfrom
psachdeva-ms:master

Conversation

@psachdeva-ms

Copy link
Copy Markdown
Collaborator

Changes are put in the order of importance

  • Adds parallel execution support for the xfstests Azure File Share test suites (SMB/CIFS and NFSv4.1), splitting test cases across multiple workers (default 4) to cut total runtime.
  • Also fixes a paramiko SSH transport-lock deadlock during parallel result collection via a two-phase (parallel-exec → sequential-collect) approach
  • Adds graceful handling of the "expunged" result category
  • Adds a PV2 storage-account fallback in get_or_create_file_share.

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Refactoring
  • Documentation update

Checklist

  • Description is filled in above
  • No credentials, secrets, or internal details are included
  • Peer review requested (if not, add required peer reviewers after raising PR)
  • Tests executed and results posted below

Test Validation

Key Test Cases:
verify_azure_file_share|verify_azure_file_share_nfsv4

Impacted LISA Features:
AzureFileShare

Tested Azure Marketplace Images:

  • canonical ubuntu-24_04-lts server latest
  • canonical 0001-com-ubuntu-server-jammy 22_04-lts-gen2 latest

Test Results

Image VM Size Result
canonical ubuntu-24_04-lts server latest Standard_D8s_v5 PASSED
canonical 0001-com-ubuntu-server-jammy 22_04-lts-gen2 lates Standard_D8s_v5 PASSED

psachdeva-ms and others added 7 commits June 10, 2026 11:36
The data plane SDK now natively supports provisioned_iops and
provisioned_bandwidth_mibps parameters in create_share() API.

Signed-off-by: Piyush Sachdeva <psachdeva@microsoft.com>
Signed-off-by: Piyush Sachdeva <s.piyush1024@gmail.com>
- Add error handling for PV2 unsupported storage accounts
- Automatically retry without provisioned_iops/provisioned_bandwidth_mibps
  when UnsupportedHeader error occurs
- Increase quota to 100 GiB minimum when falling back to non-PV2
  (PV1 minimum requirement)

Signed-off-by: Piyush Sachdeva <psachdeva@microsoft.com>
Signed-off-by: Piyush Sachdeva <s.piyush1024@gmail.com>
Parallel xfstests workers deadlocked during result collection when
multiple threads simultaneously made rapid SSH calls (shell.exists,
Cat.run, copy_back) competing for paramiko's transport lock. Workers
would hang for hours until timeout.

Fix by splitting parallel execution into two phases:
- Phase 1: Execute xfstests across all workers in parallel via
  _execute_test(), which issues a single long-running check command
  per worker (no rapid SSH-call churn, no deadlock).
- Phase 2: Collect results sequentially via check_test_results(),
  avoiding the rapid SSH channel churn that triggers the deadlock.

Because Phase 2 runs on the main thread one worker at a time, subtest
notifications are now sent inline during collection. This removes the
need for the deferred-notification path, so the following are deleted:
- XfstestsParallelRunner.send_deferred_notifications()
- XfstestsRunResult.raw_message / data_disk fields
- the send_notifications parameter on check_test_results()
Subtests are now tagged with the filesystem section (cifs/nfs) instead
of the per-worker id (cifs_worker_N), matching non-parallel runs.

Also simplify run_test() back to the original run_async + pgrep
pattern for single-worker filesystem tests (btrfs, ext4), keeping the
parallel-specific _execute_test() as a private method used only by
XfstestsParallelRunner.run_parallel().

Signed-off-by: Piyush Sachdeva <psachdeva@microsoft.com>
Signed-off-by: Piyush Sachdeva <s.piyush1024@gmail.com>
Signed-off-by: Piyush Sachdeva <psachdeva@microsoft.com>
Signed-off-by: Piyush Sachdeva <s.piyush1024@gmail.com>
- Add function to support deleting detached worktrees and linked branches.
- Modularize the code by moving checkout of target ref in a method.
- Don't let the user set a local tracking branch for the worktree, rather use a
  pattern: "<remote_name>-<target_ref>"

Signed-off-by: Piyush Sachdeva <psachdeva@microsoft.com>
Signed-off-by: Piyush Sachdeva <s.piyush1024@gmail.com>
Signed-off-by: Piyush Sachdeva <psachdeva@microsoft.com>
Signed-off-by: Piyush Sachdeva <s.piyush1024@gmail.com>
Signed-off-by: Piyush Sachdeva <psachdeva@microsoft.com>
Signed-off-by: Piyush Sachdeva <s.piyush1024@gmail.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves LISA’s Azure File Share xfstests execution by parallelizing xfstests runs (with a safer “parallel execute → sequential collect” flow to avoid SSH/paramiko contention), and includes related robustness updates in kernel source worktree handling and Azure file share provisioning.

Changes:

  • Add a two-phase parallel execution model for xfstests workers, collecting results sequentially to avoid SSH channel deadlocks.
  • Update kernel repo worktree checkout logic (tag-aware checkout + detached worktree cleanup).
  • Add Azure Files share creation fallback when PV2 headers aren’t supported, and bump azure-storage-file-share dependency.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
pyproject.toml Bumps azure-storage-file-share dependency version.
lisa/transformers/kernel_source_packager.py Adds worktree cleanup and tag-aware ref checkout logic for kernel source packaging.
lisa/transformers/kernel_source_installer.py Adjusts ccache directory handling for kernel builds.
lisa/tools/git.py Updates worktree_remove parameter type to align with worktree listing output usage.
lisa/sut_orchestrator/azure/common.py Adds PV2-parameter fallback logic when creating Azure file shares.
lisa/microsoft/testsuites/xfstests/xfstests.py Implements parallel xfstests execution with sequential result collection and related refactors.
lisa/microsoft/testsuites/xfstests/xfstesting.py Removes now-unneeded deferred notification calls after parallel execution changes.

Comment on lines +447 to +449
# Reconstruct paths from run_id and worker index
worker_path = self.worker_paths[idx]
console_log_name = f"xfstest_{exec_result.run_id}.log"
Comment on lines +816 to +836
# run ./check command
self.run_async(
cmd,
sudo=True,
shell=True,
force_run=True,
cwd=self.get_xfstests_path(),
)

pgrep = self.node.tools[Pgrep]
# this is the actual process name, when xfstests runs.
# monitor till process completes or timesout
try:
pgrep.wait_processes("check", timeout=timeout)
finally:
run_result = self.check_test_results(
log_path=log_path,
test_section=test_section if test_section else "generic",
result=result,
data_disk=data_disk,
)
Comment on lines +429 to +434
result = self._node.execute(
f"git tag -l {ref}",
shell=True,
cwd=target_path,
)
return bool(result.stdout.strip())
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants