-
-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random boot failues due to partition confusion because disko now relies on by-partlabel
#551
Comments
hmm, you would need to override the device of each partition. you can set it to /dev/disk/by-id/$my_id-part$x. Another alternative would be to override the disk name. so something like:
that should ensure a unique id according to the device? not sure if that's good enough for you. otherwise overriding each partition is exactly what you want, although a bit tedious if you have a lot of them |
That's what I did yesterday to work around the issue for the time being, but I was hoping there would be a better approach. The current partition numbers obviously only correspond to the order with legacy tables from before and if I were to resetup now, then the partitions would be in a different order because of alphabetical sorting and it would break again. Adding a partition would also break it if it isn't ordered last. Previously we could choose where to place the partitions so this would not happen. I then tried to use the canonical device name via
This only solves part of the problem. I'd like to use the unique device identifier for the name anyway, but that results in huge partlabels. Hashing it here would make it shorter, but if you also have a longer name for the partition it will still exceed the limit. So that would need to be hashed too. It's a bit awkward having to set a hashed name for each device and partition from now on. Please excuse my ignorance, but can you explain to me why the legacy tables are deprecated now? I searched the git history but failed to find any explanation or rationale for why they are deprecated. From my (limited) point of view we previously had stable and predictable disk names, which were humanly readable at runtime, no issues with long names, no possibility for duplicates and thus confusion and we also had predictable partition order. Anyone who would intentionally want to create a name clash with a USB stick would have had to emulate a whole disk with one of my disk ids, now it just requires to trivially set a partition label to make my system boot something else. What am I missing that makes the new approach preferable? |
you can add a priority to each partition to ensure a stable partition order. just think of them as an index in that case.
yeah I haven't figured out the limit yet, maybe its 64 characters? we could automatically hash them if they exceed the limit so you wouldn't have to override them on each device.
the old legacy tables used lists, which are hard to override or reference in nixos modules. also the device numbering was cumbersome and not portable (meaning, that if you changed your device from something like /dev/sda to /dev/disk/by-id/... the whole logic for device numbering changed) and this made testing the code harder. With the old tables the vm tests or image generation doesn't work because we can't really override all the device names in every partition. With the new attrset based tables we now have the possibility to set the device to an environment variable and have portable install scripts that way. the old tables also broke easily if people set it to devices like /dev/sda and /dev/sdb and they sometimes flipped. Also the old device nubmering code https://github.com/nix-community/disko/blob/master/lib/default.nix#L76 was brittle and incomplete. |
Also another feature this enables is the non destructive updates I'm working on right now. Since you can add partitions later and they won't mess up the old partitions for mounting. |
Yeah that'd definitely be a lot easier.
I see, thanks for the explanation. I'll test the partlabel hashing approach. |
FWIW I ran into this too, the approach I've taken is to add a |
Well, I'm not getting random boot failures since in my case it's not random at all. Errors before unlocking because it wants I have no clue on what to do. |
Probably open a new issue with more insight into your config would help. Since disko is not run automatically to you probably have to either xhange the label in your config or change the partition labels with gdisk/parted or something like that |
complete flake configuration: |
RESOLVED... |
Hello @Lassulus! My machine has several disks inside. The first one is for NixOS, the second is for testing NixOS (I use VMs to test it too), the third is for Windows and others are for RAID. I use disko-install to install NixOS on the first or second disk. I wasn't aware of the new implementation relying on /dev/disk/by-partlabel instead of /dev/disk/by-id. disko-install surprisingly overwrote the Windows disk with important data. Fortunately, I was able to recover the data, but I got some new gray hairs. :) Now I'm using a workaround with a device ID hash for the name. Thank you for the hint! I think using /dev/disk/by-id for a device should work as expected without additional knowledge of internal implementation and hacks. Can I hope this issue will be resolved in the future? |
I'm confused in how that change could result in overriding your windows partition? does the windows partition share label names with the nixos ones? it seems like a different issue here that maybe merits it's own new issue :) |
@nick-korsakov Could you run In summary of this thread here, the workaround is definitely to use Looking at the old code I am definitely in favor of the change to |
I just updated disko and switched to
type = "gpt";
which a warning informed me to do. Now my device now randomly fails to boot. This is the case whenever I have another USB stick plugged into my system that coincidentally shares some partition labels (likedisk-root-root
) with the partition labels of my actual system.The new
gpt.nix
implementation seems to rely solely on/dev/disk/by-partlabel
instead of/dev/disk/by-id
which was used previously. As a consequence of this change, my system now sometimes confuses a USB stick partition and the system confusion which share the same name, which seemingly dependent on udev device initialization order. So sometimes my system uses the wrong one and it doesn't boot.Is there any way to switch back to
/dev/disk/by-id
? I'd prefer to have stable and unique ids to refer to my partitions that don't cause random clashes. If there is absolutely no way to useby-id
now because we don't know the partition order anymore, would it be possible to require manually setting a unique id on the parent gpt layout that is prepended to all labels? This mostly mitigate the issue, although one could still easily craft a USB stick to disrupt a system.The text was updated successfully, but these errors were encountered: