Bug Report: ZFS reversion from /dev/disk/by-partuuid/ to device path/device node

I'm running Zorin OS Core 16.2 with the ZFS filesystem on a HP 17-cp1035cl laptop.

My drive setup:
bpool
\ bpool l2arc cache

rpool - rpool mirror
\ rpool l2arc cache

It is that rpool l2arc cache drive which is repeatedly undergoing reversion. The other cache drive works just fine. The drives are identical 32 GB USB sticks... I've switched them and the problem still occurs on the rpool l2arc cache drive, no matter which USB stick is performing that role (perhaps because enumeration takes a bit longer because the rpool l2arc cache drive has more data on it?).

A long-standing (I've seen it discussed as far back as 2012) problem is a reversion of the drives from /dev/disk/by-partuuid/ (/dev/disk/by-partuuid/9044761d-9a05-435e-8681-050da7312951) to device path/device node (/dev/sdd), which upon subsequent reboots, if the drive ordering changes, causes the drive status to go "FAULT".

It apparently stems from the fact that the drive isn't enumerated quickly enough during boot (using /dev/disk/by-partuuid/), so some later process enumerates that drive as device path/device node.

Ubuntu has a fix for it called MountAll. Can the devs push out a package containing that, with all the necessary dependencies and set up for Zorin OS... or adjust the code so the enumeration doesn't time out during boot?

You could hard code the USBs UUID to mount at boot by fstab. This would ensure the drives are mounted in a specific order, but it will complain if they are not present upon boot.

Unfortunately, devices owned by ZFS don't really work when put into fstab... they're not really 'mounted' (if you go into the Disks application, it'll show they're unmounted, and you cannot mount them in the traditional manner).

Given that the device which is glitching is USB-attached, I'm experimenting with increasing the /sys/module/usb_storage/parameters/delay_use setting from 1 second to 2 seconds.

That'll slow boot time down by N * (new setting - old setting) seconds, where N is the number of USB-attached drives, but if it properly enumerates them, it's a trade-off I'm willing to accept.

If you don't mind me asking, why is full disk encryption so important to you?

With the access to the repositories, why not use popOS, which i think is the distro with zfs in use by default, and install zorin desktop?

I'm not using encryption, I'm using the ZFS file system.

Zfs is an encrypted file system, is it not? I don't trust oracle, hat's off to you for trusting a company that is hypocritical regarding things they didn't make money on, even if open source and freely shared for decades.

It can be encrypted, or not. Just as it can be compressed, or not.

Another thing I'm trying, recommended by OpenZFS:

To prevent udisks2 from creating /dev/mapper entries that must be manually removed or maintained during zvol remove / rename, create a udev rule such as /etc/udev/rules.d/80-udisks2-ignore-zfs.rules with the following contents:

ENV{ID_PART_ENTRY_SCHEME}=="gpt", ENV{ID_FS_TYPE}=="zfs_member", ENV{ID_PART_ENTRY_TYPE}=="6a898cc3-1dd2-11b2-99a6-080020736631", ENV{UDISKS_IGNORE}="1"

... where "6a898cc3-1dd2-11b2-99a6-080020736631" is the PARTUUID of each of the ZFS partitions, which you can see via sudo blkid.

Another thing I'm trying:
sudoedit /etc/default/zfs

# Wait for this many seconds in the initrd pre_mountroot?
# This delays startup and should be '0' on most systems.
# Only applicable for Debian GNU/Linux {dkms,initramfs}.
ZFS_INITRD_PRE_MOUNTROOT_SLEEP='1'

# Wait for this many seconds in the initrd mountroot?
# This delays startup and should be '0' on most systems. This might help on
# systems which have their ZFS root on a USB disk that takes just a little
# longer to be available
# Only applicable for Debian GNU/Linux {dkms,initramfs}.
ZFS_INITRD_POST_MODPROBE_SLEEP='1'

Seems an awful lot of work. I like to keep it simple 'cos I'm stupid (kiss) - Ext4 forever!

This might have something to do with it:
sudoedit /etc/grub.d/10_linux

# Default to disabling partition uuid support to maintain compatibility with older kernels.
GRUB_DISABLE_LINUX_PARTUUID=${GRUB_DISABLE_LINUX_PARTUUID-true}

I've been importing the drives for bpool and rpool with /dev/disk/by-partuuid.

I've commented that line, updated grub and rebooted a couple times. I'll continue to use the system and see if the reversion from /dev/disk/by-partuuid to /dev/sdX continues. If not, that would appear to be yet another Ubuntu dev-introduced bug (the other being the grub recordfail issue).

sudoedit /etc/grub.d/10_linux_zfs

#    local initial_pools="$(zpool list | awk '{if (NR>1) print $1}')"
# CHANGED 24 DEC 2022 to use only 'zpool list' specific commands. Returns same as line above.
    local initial_pools="$(zpool list -Ho name)"

That didn't work. I'm reverting the change.

Now I'm trying this:
sudoedit /lib/systemd/system/zfs-import-scan.service
Comment the line: ConditionPathExists=!/etc/zfs/zpool.cache
sudo systemctl enable zfs-import-scan
sudo systemctl start zfs-import-scan
Go into Stacer application and enable zfs-import-scan to run at boot.
sudo zpool set cachefile=none bpool
sudo zpool set cachefile=none rpool

There are reports that the zpool.cache setup is borked, that updating zpool.cache programmatically for one pool somehow wipes out the information in that file for other pools, so the rpool isn't imported at boot in its entirety, which may be what's causing this.

The zpool.cache setup is being considered for deprecation, with all pools imported at boot by scanning, so that's what I've done.

I get a complaint in the logs:

Sender: gnome-shell
Message: Unable to mount volume rpool: Gio.IOErrorEnum: Error mounting /dev/sdb1 at /media/owner/rpool: unknown filesystem type 'zfs_member'

... but sudo zpool status shows that everything's online and working:

pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 0 days 00:03:59 with 0 errors on Sat Dec 24 05:08:30 2022
config:

	NAME                                      STATE     READ WRITE CKSUM
	rpool                                     ONLINE       0     0     0
	  mirror-0                                ONLINE       0     0     0
	    039f5cf3-27cb-7844-b211-fe486fc2e3af  ONLINE       0     0     0
	    4c69078a-70bc-49bb-b39c-3ae75e5b694e  ONLINE       0     0     0
	logs	
	  mirror-3                                ONLINE       0     0     0
	    26bc760d-7692-1a4f-adc8-6ac5c71bf44a  ONLINE       0     0     0
	    cb0deab2-54a5-8146-a5eb-fb5f67c3574b  ONLINE       0     0     0

As you can see from above, I've repurposed my L2ARC cache drives as ZIL SLOGs to improve write performance. I'll probably burn through them in short order due to ZIL being write-heavy, but they're cheapie 32 GB USB sticks, so after that, I'll get better SLOG devices (and a couple cheapie 32 GB USB sticks to use as L2ARC cache drives). I've got the mirror drive to increase read performance.

As always, I'll use the machine and see if the changes fix the problem, then report back.