Zorin 18 on Ryzen 8945HS Radeon 780M - DirectMap4K consuming half of total RAM

I bought a Minisforums UM890 with AMD Ryzen 9 8945HS CPU with integrated Radeon 780M graphics and 64GB of RAM.

Initially I just swapped the Zorin 17 (kernel 6.8) SSD from my Intel notebook and updated the packages and it was working great for months.

Recently I backed up /home, wiped the SSD, installed Zorin 18 with LUKS encryption (as I had with Zorin 17) and restored /home from the backup. Since that time, I've been running into memory exhaustion despite running the same apps that run just fine on my Intel notebooks with just 16GB of RAM.

After upgrading to Zorin 18 R3 (update to the 6.17 kernel) I still get oom-killers when the only desktop app I'm running is Brave. A symptom of the issue is that the DirectMap4K entry in /proc/meminfo grows to consume over half my system memory which eventually invokes the oom-killer. This is making my 64GB Ryzen 9 less capable than my 16GB Intel notebook. Very frustrating.

Gemini seems to think Wayland is triggering a scatter/gather bug but also seems to think that was fixed in 6.15+, and I'm now running 6.17 and still experiencing the issue.

Has anyone else run into this issue? Any suggestions?

Have you tried selecting X11 (Xorg) instead of Wayland to see if that helps.
From your login screen
Click on user to reveal a cog/gearwheeel icon bottom right of screen.
Select the Xorg or X11 option (not Wayland).
Login and test effect.

1 Like

Did You tried it with disabling Hardware Acceleration in the Brave Settings?

Yes. I tried. Eventually happens, but not as fast as with Wayland.

No, but after I posted this morning, I rebooted, logged back in without launching any apps, and locked the screen. In under 3 hours, DirectMap4K had grown to 32GB.

Poking around again in journalctl, I see that there is a pattern where I get messages like the following in the hours leading up to the oom-killer:

$ journalctl --since "Feb 1 00:00:00" |grep -i -e hogged -e oom-killer -e rebooting
Feb 03 11:19:02 host systemd-logind[1308]: System is rebooting.
Feb 03 12:05:53 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND
Feb 03 12:06:03 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 5 times, consider switching to WQ_UNBOUND
Feb 03 13:58:34 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 7 times, consider switching to WQ_UNBOUND
Feb 03 13:59:16 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 11 times, consider switching to WQ_UNBOUND
Feb 03 14:00:35 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 19 times, consider switching to WQ_UNBOUND
Feb 03 14:03:14 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 35 times, consider switching to WQ_UNBOUND
Feb 03 14:08:31 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 67 times, consider switching to WQ_UNBOUND
Feb 03 14:19:24 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 131 times, consider switching to WQ_UNBOUND
Feb 03 18:42:32 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 259 times, consider switching to WQ_UNBOUND
Feb 03 19:59:29 host systemd-logind[1088]: System is rebooting.
Feb 03 21:31:39 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND
Feb 03 21:37:20 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 5 times, consider switching to WQ_UNBOUND
Feb 03 21:54:23 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 7 times, consider switching to WQ_UNBOUND
Feb 04 12:06:20 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 11 times, consider switching to WQ_UNBOUND
Feb 05 14:32:48 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 19 times, consider switching to WQ_UNBOUND
Feb 11 10:15:54 host systemd-logind[1280]: System is rebooting.
Feb 11 11:12:07 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND
Feb 11 11:12:17 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 5 times, consider switching to WQ_UNBOUND
Feb 11 11:12:37 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 7 times, consider switching to WQ_UNBOUND
Feb 11 11:13:16 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 11 times, consider switching to WQ_UNBOUND
Feb 11 11:14:36 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 19 times, consider switching to WQ_UNBOUND
Feb 11 11:17:15 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 35 times, consider switching to WQ_UNBOUND
Feb 11 11:22:32 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 67 times, consider switching to WQ_UNBOUND
Feb 11 11:33:08 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 131 times, consider switching to WQ_UNBOUND
Feb 11 11:54:51 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 259 times, consider switching to WQ_UNBOUND
Feb 11 19:03:46 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND
Feb 11 19:03:56 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 5 times, consider switching to WQ_UNBOUND
Feb 11 19:04:16 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 7 times, consider switching to WQ_UNBOUND
Feb 11 19:04:56 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 11 times, consider switching to WQ_UNBOUND
Feb 11 19:06:15 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 19 times, consider switching to WQ_UNBOUND
Feb 11 19:08:54 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 35 times, consider switching to WQ_UNBOUND
Feb 11 19:14:12 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 67 times, consider switching to WQ_UNBOUND
Feb 11 19:24:47 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 131 times, consider switching to WQ_UNBOUND
Feb 11 19:45:58 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 259 times, consider switching to WQ_UNBOUND
Feb 11 20:28:19 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 515 times, consider switching to WQ_UNBOUND
Feb 11 21:53:18 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 1027 times, consider switching to WQ_UNBOUND
Feb 12 00:42:44 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 2051 times, consider switching to WQ_UNBOUND
Feb 12 01:23:23 host kernel: brave invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=300
Feb 12 06:25:37 host kernel: gnome-shell invoked oom-killer: gfp_mask=0x40cc0(GFP_KERNEL|__GFP_COMP), order=0, oom_score_adj=0
Feb 12 06:25:43 host kernel: fwupd invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
Feb 12 06:25:43 host kernel: systemd-udevd invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=-1000
Feb 12 06:25:43 host kernel: 9 invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=100
Feb 12 06:25:43 host kernel: systemd-udevd invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=-1000
Feb 12 06:25:43 host kernel: 9 invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=100
Feb 12 06:25:43 host kernel: systemd invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
Feb 12 06:25:43 host kernel: 9 invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=100
Feb 12 06:25:43 host kernel: polkitd invoked oom-killer: gfp_mask=0x140cca(GFP_HIGHUSER_MOVABLE|__GFP_COMP), order=0, oom_score_adj=0
Feb 12 06:25:43 host kernel: workqueue: delayed_fput hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND
Feb 12 06:25:43 host kernel: workqueue: delayed_fput hogged CPU for >10000us 5 times, consider switching to WQ_UNBOUND
Feb 12 06:25:45 host kernel: workqueue: delayed_fput hogged CPU for >10000us 7 times, consider switching to WQ_UNBOUND
Feb 12 06:32:09 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 4099 times, consider switching to WQ_UNBOUND
Feb 12 06:54:15 host systemd-logind[1450]: System is rebooting.
Feb 12 06:59:18 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND
Feb 12 06:59:28 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 5 times, consider switching to WQ_UNBOUND
Feb 12 06:59:48 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 7 times, consider switching to WQ_UNBOUND
Feb 12 07:00:28 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 11 times, consider switching to WQ_UNBOUND
Feb 12 07:01:47 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 19 times, consider switching to WQ_UNBOUND
Feb 12 07:04:26 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 35 times, consider switching to WQ_UNBOUND
Feb 12 07:09:44 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 67 times, consider switching to WQ_UNBOUND
Feb 12 07:20:19 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 131 times, consider switching to WQ_UNBOUND
Feb 12 07:41:30 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 259 times, consider switching to WQ_UNBOUND
Feb 12 08:23:51 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 515 times, consider switching to WQ_UNBOUND
Feb 12 09:48:34 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 1027 times, consider switching to WQ_UNBOUND

So it's looking like an issue with amdgpu. Searching on the above workqueue line, I also just found this bug report for jetkvm which references the same amdgpu workqueue messages and suggests that the issue is triggered by monitor sleep/wake events. The user reports the issue on Ubuntu 24.04.3 LTS which should closely match Zorin 18 R3 by my understanding.

I went back and reviewed my logs. I have tracked the values of the DirectMap entries in /proc/meminfo for the past couple of weeks. I've labeled those logs as Wayland or Xorg. While I find that Xorg did grow to 8GB of DirectMap4K, I don't see any oom-killers while running Xorg. I thought I had at least one instance, but have did not.

Looking at the times that DirectMap4K jumps in value, I see that gnome-shell was active in the logs:

Xorg:

Feb 03 14:00:32 host rtkit-daemon[1431]: Successfully made thread 5052 of process 5037 owned by '1000' RT at priority 20.
Feb 03 14:00:32 host rtkit-daemon[1431]: Supervising 8 threads of 5 processes of 1 users.
Feb 03 14:00:32 host rtkit-daemon[1431]: Successfully made thread 5052 of process 5037 owned by '1000' high priority at nice level 0.
Feb 03 14:00:32 host rtkit-daemon[1431]: Supervising 8 threads of 5 processes of 1 users.
Feb 03 14:00:32 host rtkit-daemon[1431]: Supervising 7 threads of 4 processes of 1 users.
Feb 03 14:00:32 host rtkit-daemon[1431]: Supervising 7 threads of 4 processes of 1 users.
Feb 03 14:00:32 host rtkit-daemon[1431]: Successfully made thread 5052 of process 5037 owned by '1000' RT at priority 20.
Feb 03 14:00:32 host rtkit-daemon[1431]: Supervising 8 threads of 5 processes of 1 users.
Feb 03 14:00:35 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 19 times, consider switching to WQ_UNBOUND
Feb 03 14:00:35 host rtkit-daemon[1431]: Successfully made thread 5052 of process 5037 owned by '1000' high priority at nice level 0.
Feb 03 14:00:35 host rtkit-daemon[1431]: Supervising 8 threads of 5 processes of 1 users.
Feb 03 14:00:35 host rtkit-daemon[1431]: Supervising 7 threads of 4 processes of 1 users.
Feb 03 14:00:35 host rtkit-daemon[1431]: Supervising 7 threads of 4 processes of 1 users.
Feb 03 14:00:35 host rtkit-daemon[1431]: Successfully made thread 5052 of process 5037 owned by '1000' RT at priority 20.

Feb 03 12:06:04 host dbus-daemon[1063]: [system] Activating via systemd: service name='net.reactivated.Fprint' unit='fprintd.service' requested by ':1.144' (uid=1000 pid=5037 comm="/usr/bin/gnome-shell" label="unconfined")
Feb 03 14:21:38 host dbus-daemon[1063]: [system] Activating via systemd: service name='net.reactivated.Fprint' unit='fprintd.service' requested by ':1.144' (uid=1000 pid=5037 comm="/usr/bin/gnome-shell" label="unconfined")

Wayland:

Feb 11 20:08:18 host rtkit-daemon[11917]: Supervising 1 threads of 1 processes of 1 users.
Feb 11 20:08:18 host rtkit-daemon[11917]: Supervising 0 threads of 0 processes of 1 users.
Feb 11 20:08:18 host rtkit-daemon[11917]: Supervising 0 threads of 0 processes of 1 users.
Feb 11 20:08:18 host rtkit-daemon[11917]: Successfully made thread 2633 of process 2618 owned by '1000' RT at priority 20.
Feb 11 20:08:18 host rtkit-daemon[11917]: Supervising 1 threads of 1 processes of 1 users.
Feb 11 20:08:22 host gnome-shell[2618]: Failed to make thread 'KMS thread' normally scheduled: Message recipient disconnected from message bus without replying
Feb 11 20:08:22 host systemd[1]: rtkit-daemon.service: Main process exited, code=killed, status=9/KILL
Feb 11 20:08:22 host systemd[1]: rtkit-daemon.service: Failed with result 'signal'.
Feb 11 20:08:25 host dbus-daemon[1424]: [system] Activating via systemd: service name='org.freedesktop.RealtimeKit1' unit='rtkit-daemon.service' requested by ':1.71' (uid=1000 pid=2618 comm="/usr/bin/gnome-shell" label="unconfined")
Feb 11 20:08:25 host systemd[1]: Starting rtkit-daemon.service - RealtimeKit Scheduling Policy Service...
Feb 11 20:08:25 host dbus-daemon[1424]: [system] Successfully activated service 'org.freedesktop.RealtimeKit1'
Feb 11 20:08:25 host rtkit-daemon[15896]: Successfully called chroot.
Feb 11 20:08:25 host rtkit-daemon[15896]: Successfully dropped privileges.
Feb 11 20:08:25 host systemd[1]: Started rtkit-daemon.service - RealtimeKit Scheduling Policy Service.
Feb 11 20:08:25 host rtkit-daemon[15896]: Successfully limited resources.
Feb 11 20:08:25 host rtkit-daemon[15896]: Running.
Feb 11 20:08:25 host rtkit-daemon[15896]: Canary thread running.
Feb 11 20:08:25 host rtkit-daemon[15896]: Watchdog thread running.
Feb 11 20:08:25 host rtkit-daemon[15896]: Successfully made thread 2633 of process 2618 owned by '1000' high priority at nice level 0.
Feb 11 20:08:25 host rtkit-daemon[15896]: Supervising 1 threads of 1 processes of 1 users.
Feb 11 20:08:25 host rtkit-daemon[15896]: Supervising 0 threads of 0 processes of 1 users.
Feb 11 20:08:25 host rtkit-daemon[15896]: Supervising 0 threads of 0 processes of 1 users.
Feb 11 20:08:25 host rtkit-daemon[15896]: Successfully made thread 2633 of process 2618 owned by '1000' RT at priority 20.
Feb 11 20:08:25 host rtkit-daemon[15896]: Supervising 1 threads of 1 processes of 1 users.
Feb 11 20:08:25 host rtkit-daemon[15896]: Successfully made thread 2633 of process 2618 owned by '1000' high priority at nice level 0.

Feb 11 20:28:16 host rtkit-daemon[15896]: Successfully made thread 2633 of process 2618 owned by '1000' high priority at nice level 0.
Feb 11 20:28:16 host rtkit-daemon[15896]: Supervising 1 threads of 1 processes of 1 users.
Feb 11 20:28:16 host rtkit-daemon[15896]: Supervising 0 threads of 0 processes of 1 users.
Feb 11 20:28:16 host rtkit-daemon[15896]: Supervising 0 threads of 0 processes of 1 users.
Feb 11 20:28:16 host rtkit-daemon[15896]: Successfully made thread 2633 of process 2618 owned by '1000' RT at priority 20.
Feb 11 20:28:16 host rtkit-daemon[15896]: Supervising 1 threads of 1 processes of 1 users.
Feb 11 20:28:19 host kernel: workqueue: dm_irq_work_func [amdgpu] hogged CPU for >10000us 515 times, consider switching to WQ_UNBOUND
Feb 11 20:28:19 host rtkit-daemon[15896]: Successfully made thread 2633 of process 2618 owned by '1000' high priority at nice level 0.
Feb 11 20:28:19 host rtkit-daemon[15896]: Supervising 1 threads of 1 processes of 1 users.
Feb 11 20:28:19 host rtkit-daemon[15896]: Supervising 0 threads of 0 processes of 1 users.
Feb 11 20:28:19 host rtkit-daemon[15896]: Supervising 0 threads of 0 processes of 1 users.
Feb 11 20:28:19 host rtkit-daemon[15896]: Successfully made thread 2633 of process 2618 owned by '1000' RT at priority 20.
Feb 11 20:28:19 host rtkit-daemon[15896]: Supervising 1 threads of 1 processes of 1 users.
Feb 11 20:28:23 host rtkit-daemon[15896]: Successfully made thread 2633 of process 2618 owned by '1000' high priority at nice level 0.

More recently, I have also been watching radeontop. While running in Wayland, there were times when the screen was locked that GTT might grow close or up to the GTT limit of half of my memory. With Xorg, I've only seen GTT up to a few GB, but usually under 1GB. So perhaps that's why I am seeing the oom-killers under Wayland and not Xorg.

Now I'm trying to figure out the conditions to reproduce the gnome-shell activity when the screen is locked under both Wayland and Xorg.

After a lot of troubleshooting with AI, I ended up this understanding:

When I lock my screen and it transitions to a sleep mode, Wayland attempts to update a placeholder screen for gnome-shell and the apps, triggering bugs in amdgpu. I would see GTT and DirectMap4K rise to 32GB or more in the case of DirectMap4K.

Component What it did wrong Result
Wayland (Mutter) Kept requesting display refreshes during a "stalled" blanking state. Triggered the allocation loop.
amdgpu Driver Got stuck in an interrupt loop (dm_irq) and stopped releasing buffers. GTT usage exploded.
Linux Kernel (TTM) Failed to unmap the 4KB pages from the DirectMap after the GTT reset. Permanent RAM loss until reboot.

At first glance, the grub options below seem to have mitigated the memory leaks I was experiencing. I'll see how things play out longer term.

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash iommu=pt amdgpu.sg_display=0 amdgpu.dcfeaturemask=0x8 amdgpu.dcdebugmask=0x410 amdgpu.gttsize=4096 consoleblank=0 amdgpu.runpm=0"

Also had to disable screen blanking in Settings > Power