Skip to content

Fix race condition in 04-persistent-data-volume.sh#4831

Merged
AkihiroSuda merged 1 commit intolima-vm:masterfrom
jandubois:fix-data-volume-race
Apr 22, 2026
Merged

Fix race condition in 04-persistent-data-volume.sh#4831
AkihiroSuda merged 1 commit intolima-vm:masterfrom
jandubois:fix-data-volume-race

Conversation

@jandubois
Copy link
Copy Markdown
Member

@jandubois jandubois commented Apr 11, 2026

A race between growpart/e2fsck and udev device probing can delete the /dev/disk/by-label/data-volume symlink, causing mount failures and potential data loss on Alpine ramdisk VMs.

growpart triggers a kernel partition table re-read, which generates a udev re-probe for the data partition. When e2fsck runs concurrently, udevd's libblkid probe reads a partially modified ext4 superblock, fails with "incorrect ext4 checksum", and removes the symlink.

Core fixes:

  1. Resolve the device via blkid output (which probes devices directly) instead of the udev symlink. This eliminates the dependency on udev state entirely. The sed pattern anchors LABEL= on whitespace so it does not match PARTLABEL= on util-linux blkid, and quits after the first match so a stray duplicate label never produces a newline-separated device list.

  2. Add udevadm settle after growpart to prevent the concurrent probe from racing with e2fsck. Both settle calls (after growpart and after mkfs) tolerate failure with || true; the blkid resolution already removes the core dependency on udev state, so a missing or stuck udevadm must not crash the boot under set -e.

  3. Fix the else-branch "disk in use" check to reject any disk that already has partitions or carries a filesystem signature, not just mounted devices in /proc/mounts. The old check could misidentify a partitioned-but-unmounted or raw-formatted data disk as unused and reformat it.

Derive DATA_DISK via lsblk --output pkname instead of stripping a single trailing digit, so the parent disk is resolved correctly for any partition naming scheme. Scope the /proc/mounts awk substitution to $1.

Ref: rancher-sandbox/rancher-desktop#10133
Closes: #4830

A race between growpart/e2fsck and udev device probing can delete the
/dev/disk/by-label/data-volume symlink, causing mount failures and
potential data loss on Alpine ramdisk VMs.

growpart triggers a kernel partition table re-read, which generates a
udev re-probe for the data partition. When e2fsck runs concurrently,
udevd's libblkid probe reads a partially modified ext4 superblock,
fails with "incorrect ext4 checksum", and removes the symlink.

Core fixes:

1. Resolve the device via blkid output (which probes devices directly)
   instead of the udev symlink. This eliminates the dependency on udev
   state entirely. The sed pattern anchors LABEL= on whitespace so it
   does not match PARTLABEL= on util-linux blkid, and quits after the
   first match so a stray duplicate label never produces a newline-
   separated device list.

2. Add udevadm settle after growpart to prevent the concurrent probe
   from racing with e2fsck. Both settle calls (after growpart and after
   mkfs) tolerate failure with || true; the blkid resolution already
   removes the core dependency on udev state, so a missing or stuck
   udevadm must not crash the boot under set -e.

3. Fix the else-branch "disk in use" check to reject any disk that
   already has partitions or carries a filesystem signature, not just
   mounted devices in /proc/mounts. The old check could misidentify a
   partitioned-but-unmounted or raw-formatted data disk as unused and
   reformat it.

Derive DATA_DISK via lsblk --output pkname instead of stripping a
single trailing digit, so the parent disk is resolved correctly for any
partition naming scheme. Scope the /proc/mounts awk substitution to $1.

Ref: rancher-sandbox/rancher-desktop#10133
Signed-off-by: Jan Dubois <jan.dubois@suse.com>
@jandubois jandubois force-pushed the fix-data-volume-race branch from 17462f4 to 34a1f47 Compare April 20, 2026 18:59
@jandubois jandubois marked this pull request as ready for review April 20, 2026 19:02
@jandubois
Copy link
Copy Markdown
Member Author

Round 2 AI review: https://jandubois.github.io/lima/20260420-120300-pr-4831.html

I don't think any of the issues are worth addressing.

@jandubois jandubois requested a review from a team April 21, 2026 20:31
Copy link
Copy Markdown
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@AkihiroSuda AkihiroSuda merged commit 1d55d6d into lima-vm:master Apr 22, 2026
60 of 62 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Race condition in 04-persistent-data-volume.sh causes mount failure and potential data loss

2 participants