DKMS question

This is not a complaint about Zorin's kernel version. I knew I could mess stuff up, did, and got myself out of it already. That said, the mess is preamble to the question, so:

I updated my kernel to 6.13 using Mainline. This was successful (yay), but I came back up with only one monitor. Starting to go through the usual commands to remove and replace video drivers, apt told me there WERE no packages matching the usual Nvidia regex, which was my clue that this was going to be a brief experiment indeed. I adjusted GRUB so I could choose a Zorin-supplied kernel, booted back up, uninstalled 6.13, and am good to go.

The question here is, why'd the video driver fail? It's my (limited) understanding that DKMS exists to keep from having to recompile the kernel for driver changes. Something wasn't happy, which is unfortunate, but right now I'm concerned with correcting my understanding of DKMS, not getting on 6.13.

(For the inevitable, "Why do you need the latest? " the answer is that it features improvements for AMD CPUs with 3D-Vcache, which I am using.)

You have a nvidia card ?

1 Like

If you have nvidia:

"#

Updating Nvidia Drivers on Ubuntu 22.04

When updating the mainline kernel in Ubuntu 22.04, the Nvidia driver may not be recognized due to issues with the kernel headers or the driver installation process. To resolve this, you can try reinstalling the kernel headers and then the Nvidia driver. For example, you can install the kernel headers with the command sudo apt install linux-headers-$(uname -r) and then reinstall the Nvidia driver using sudo ubuntu-drivers autoinstall .

If the issue persists, you might need to remove the newly installed kernel and all associated Nvidia drivers, then reboot your system. The command to remove the kernel is sudo apt remove linux-*5.17* .

Ensure that secure boot is correctly configured if it is enabled, as it can interfere with the loading of Nvidia drivers."

Source: Brave A.I.

Yep, 4090.

This is basically what I ended up doing, except I did it using the Mainline app, which provides a GUI for installation, removal, and locking of kernels on Ubuntu. I ended up not having to remove the video driver, as it was somehow "gone" while running 6.13, according to apt, but when I booted back up in 6.8, the 565 drivers were back. I just removed and updated them to 570s anyway.

I have the 570 drivers running great with the latest Xanmod kernel. Works really well.

Install kernel:
sudo apt install build-essential dkms linux-headers-$(uname -r) -y

wget -qO - https://dl.xanmod.org/archive.key | sudo gpg --dearmor -vo /etc/apt/keyrings/xanmod-archive-keyring.gpg

echo 'deb [signed-by=/etc/apt/keyrings/xanmod-archive-keyring.gpg] http://deb.xanmod.org releases main' | sudo tee /etc/apt/sources.list.d/xanmod-release.list

sudo apt update && sudo apt install linux-xanmod-x64v3

sudo update-initramfs -u -k all

sudo update-grub

sudo reboot

Install NVidia 570

sudo add-apt-repository ppa:graphics-drivers/ppa -y
sudo apt update

sudo apt install nvidia-driver-570 -y
sudo update-initramfs -u -k all
sudo update-grub
sudo reboot

4 Likes

I'll give this a try shortly, thank you!

If you are going to go to 570, I recommend you remove the older Nvidia drivers first. Could have issues other wise.

Before uninstalling, unload any active NVIDIA kernel modules:

sudo rmmod nvidia_drm nvidia_modeset nvidia_uvm nvidia

Now, remove all NVIDIA packages:

sudo apt remove --purge '^nvidia-.*'

sudo apt autoremove -y

sudo apt autoclean

Delete residual configuration files:
sudo rm -rf /etc/X11/xorg.conf

sudo rm -rf /usr/share/X11/xorg.conf.d/10-nvidia.conf

sudo rm -rf /lib/modules/$(uname -r)/kernel/drivers/video/nvidia*

Regenerate Initramfs & Update GRUB

sudo update-initramfs -u

sudo update-grub

Reboot

4 Likes

Interesting. You've got extra steps in removing stuff; I usually just do the sudo apt remove --purge '^nvidia-.*'

After you learned of the issue, you might have been able to use

dkms status

to see whether there was a module build issue or

sudo dkms autoinstall

to try reinstalling the modules needed.
If the system booted without loading the modules, you may have had luck using

sudo update-initramfs -u -k all

sudo update-grub

These assume that you checked what @swarfendor437 suggested above: Secure Boot and Whether the mainline headers were properly installed.

If none of that worked... The next thing I can think of is that the modules necessary are not included in the 6.13 kernel.
This is the definition of a Regression.

If there are incompatibilities with the Nvidia source code and the new kernel API changes, you can get a module build failure.
It is also possible that Nvidia has not yet validated their files for 6.13.

Sadly, either of the above fall into the "Likely" category. It might seem likelier that the former is at play than the latter but... Nvidia still has strained relations with Linux, even if they have improved.

2 Likes

Lots of good information in this thread, thank you. As for the secure boot issue Swarfendor mentioned, I keep it off on my personal machines and on on company machines (that only run Windows anyway). I'll keep these things in mind as I try what JGlover suggested above in a bit. (His tip came in as my food was delivered, and I haven't eaten yet today. :P)

Yep. I've worked in game QA. LOOOOTS of experience testing for regressions.

2 Likes

Upon trying this, only nvidia_uvm unloads; the others are in use and refuse. Still, things appear to have gone cleanly. I should've read the commands thoroughly before beginning, as it's 6.13 that I'm after, specifically for the improvements to 3D-Vcache processor support, but as long as things work, I'm perfectly happy to be on 6.12 in the meantime.

I notice that your instructions don't call for installing nvidia-dkms-570. Is there any particular reason why not? That's what this thread was really about: not getting me a new kernel, but correcting my misunderstanding of DKMS.

Edit to include a thank you for this. Seriously, how rude can I get? I appreciate this information, and I'll keep an eye out for a 6.13 I can apply this way.

From my notes, and if I am understanding your ask, here is how I installed 570 and additional things (more related to gaming and the like)

Install the NVIDIA 570 Proprietary Drivers

Add the NVIDIA PPA to Get the Latest Drivers:

sudo add-apt-repository ppa:graphics-drivers/ppa -y
sudo apt update

Install NVIDIA 570 Drivers:

sudo apt install nvidia-driver-570 -y
sudo update-initramfs -u -k all
sudo update-grub
sudo reboot

Install Vulkan for Better Gaming Performance

sudo apt install libvulkan1 libvulkan-dev vulkan-tools -y

Enable Performance Mode for NVIDIA

Ensure your system is using the NVIDIA GPU and not the integrated one.

sudo prime-select nvidia
sudo reboot

(Optional) Install NVIDIA OpenCL and CUDA (for GPU Acceleration)

sudo apt install nvidia-cuda-toolkit -y

Also, I Xanmod may have the Edge version of the kernel which is 6.13 - though I have not tried it as of yet. https://xanmod.org/

1 Like

What I was asking was if there's a reason nvidia-dkms-570 isn't included in what you install.

They do it seems, thanks.

Their table below the instructions is a bit odd though. It makes it look like my processor falls under x86-64-v4 rather than v3, and that column says "no kernel benefit." Given the 3d-vcache improvements in 6.13, and that most (all?) such CPUs are Zen 4 or 5, that seems odd. There's a LOT I don't know where this stuff is concerned though; I may be misunderstanding something.

Have you tried this with 6.13 that just went from edge to main by any chance? nvidia-dkms-570 and nvidia-driver-570 both fail to install. Output points me to a make.log in /var/lib/dkms/nvidia/570.86.16/build/.

Is this something that needs to be waited on (that is, the 570 drivers just aren't ready for the new kernel) or is this something I can or must troubleshoot locally?

When I doubled down (if I'm going to mess stuff up, why not go all in and learn more?) and updated initramfs and grub with the bad state, it came back up VERY like the initial broken state that prompted this thread, when trying to set 6.13 using the mainline utility.

Well, I'm here again, so the output of that is nvidia/570.86.16: added.

This returns the same log as mentioned above. That log is too long to be included here.

If I just need to wait longer, that's fine, but it would be nice if there's a way to check when things are ready short of trying to install and breaking stuff. The answer is clearly "before Zorin" and "after the kernel is released," but that's a significant stretch of time.

Waiting is all you can do, for now. The 6.13 kernel is very new.

2 Likes

That, then, circles back to the original intent of this thread. While I'm grateful for the information and tips I've received, what I was after was clarification of my (now obviously mistaken) understanding of DKMS. I believed the purpose was to remove the need to recompile per kernel. That seems not to be the case, so... what is the case?

No, it is the case...

But in this instance, you are trying to use DKMS with the Nvidia driver on a Just Released Kernel. It's not yet added to the DKMS 570 file.

These files will work with a broad range of kernels. But you cannot expect them to include kernels not yet released at the time that file was compiled...

1 Like

I see now, thanks. The problem was that I had interpreted it as something forward looking. For example, that DKMS was a method that allowed the driver to bring whatever the kernel would need to understand it, such that if a kernel supported DKMS, it would be fine. I understand now: it supports a range via inclusion rather than "DKMS enables universal support." This was the answer I'd been looking for at the very start of the thread.

Ah, I see, now.
Yes, it would be ideal if DKMS could just install, assigning modules to whatever kernel you have. Sadly, this is not possible.
Kernel releases can include changes, moved files, configuration alterations... As well as different API (App Program Interface) and ABI (App Binary Interface) builds.
A module may not build correctly against a newer kernel due to deprecated or modified APIs. The DKMS file (Or those that make it) cannot anticipate these kinds of changes in advance.

When a new kernel is released, it must be tested and examined for compatibility before it can be included.