My computer will not come back from a black screen (not s3 sleep) because of a GPU hang.
I used to have this crash constantly on Manjaro and I assumed it was some Arch related thing so I just went back to Zorin since it didn't crash, but now Zorin is also crashing.
This bug has been a thorn in my side for months and nothing I've found online has helped so far.
I can still SSH to the machine and retrieve the following information:
$ uname -r
5.11.0-37-generic
$ lspci | grep -i --color 'vga\|3d\|2d'
00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06)
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cape Verde XT [Radeon HD 7770/8760 / R7 250X]
$ journalctl --reverse --lines=5
Dec 01 16:04:11 REXTRON-Z kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
Dec 01 16:04:11 REXTRON-Z kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 7:0:00000000
Dec 01 16:04:08 REXTRON-Z kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
Dec 01 16:04:08 REXTRON-Z kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 7:0:00000000
Dec 01 16:04:05 REXTRON-Z kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
Extract the file.
Open a root elevated instance of File manager or using the terminal, copy just the files (not the directory folder) to /lib/firmware/i915/
In terminal run:
I followed your instructions and installed the firmware you provided however I got another GPU Hang today while using my system. Thank you for your help thus far by the way, appreciated
Troubleshooting info:
$ journalctl --reverse
Dec 03 10:23:21 REXTRON-Z kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 7:1:00dfffff, in Xorg [3074]
Dec 03 10:23:18 REXTRON-Z kernel: i915 0000:00:02.0: [drm] Xorg[3074] context reset due to GPU hang
Dec 03 10:23:18 REXTRON-Z kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
Dec 03 10:23:18 REXTRON-Z kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 7:1:00dfffff, in Xorg [3074]
Dec 03 10:23:17 REXTRON-Z NetworkManager[1921]: <info> [1638548597.2404] device (wlp116s0): supplicant interface state: scanning -> inactive
Dec 03 10:23:16 REXTRON-Z kernel: i915 0000:00:02.0: [drm] Xorg[3074] context reset due to GPU hang
Dec 03 10:23:15 REXTRON-Z kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
Dec 03 10:23:15 REXTRON-Z kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 7:1:00dfffff, in Xorg [3074]
Dec 03 10:23:12 REXTRON-Z kernel: i915 0000:00:02.0: [drm] Xorg[3074] context reset due to GPU hang
Dec 03 10:23:12 REXTRON-Z NetworkManager[1921]: <info> [1638548592.9052] device (wlp116s0): supplicant interface state: inactive -> scanning
Dec 03 10:23:12 REXTRON-Z kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
Dec 03 10:23:12 REXTRON-Z kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 7:1:00dfffff, in Xorg [3074]
Dec 03 10:23:09 REXTRON-Z kernel: i915 0000:00:02.0: [drm] Xorg[3074] context reset due to GPU hang
Dec 03 10:23:09 REXTRON-Z kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
Dec 03 10:23:09 REXTRON-Z kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 7:1:00dfffff, in Xorg [3074]
Dec 03 10:23:08 REXTRON-Z kernel: Asynchronous wait on fence 0000:00:02.0:gnome-shell[3256]:118f32 timed out (hint:intel_atomic_commit_ready [i915])
Dec 03 10:23:08 REXTRON-Z kernel: Asynchronous wait on fence 0000:00:02.0:gnome-shell[3256]:118f32 timed out (hint:intel_atomic_commit_ready [i915])
Dec 03 10:23:08 REXTRON-Z NetworkManager[1921]: <info> [1638548588.2567] device (wlp116s0): supplicant interface state: scanning -> inactive
Dec 03 10:23:07 REXTRON-Z kernel: i915 0000:00:02.0: [drm] Xorg[3074] context reset due to GPU hang
Dec 03 10:23:06 REXTRON-Z kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
Dec 03 10:23:06 REXTRON-Z kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 7:1:00dfffff, in Xorg [3074]
Dec 03 10:23:03 REXTRON-Z kernel: i915 0000:00:02.0: [drm] Xorg[3074] context reset due to GPU hang
Dec 03 10:23:03 REXTRON-Z NetworkManager[1921]: <info> [1638548583.9061] device (wlp116s0): supplicant interface state: inactive -> scanning
Dec 03 10:23:03 REXTRON-Z kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0
Dec 03 10:23:03 REXTRON-Z kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 7:1:00dfffff, in Xorg [3074]
Dec 03 10:22:59 REXTRON-Z NetworkManager[1921]: <info> [1638548579.2686] device (wlp116s0): supplicant interface state: scanning -> inactive
Dec 03 10:22:54 REXTRON-Z NetworkManager[1921]: <info> [1638548574.9052] device (wlp116s0): supplicant interface state: inactive -> scanning
Dec 03 10:22:50 REXTRON-Z NetworkManager[1921]: <info> [1638548570.2843] device (wlp116s0): supplicant interface state: scanning -> inactive
Dec 03 10:22:45 REXTRON-Z NetworkManager[1921]: <info> [1638548565.9052] device (wlp116s0): supplicant interface state: inactive -> scanning
Dec 03 10:22:41 REXTRON-Z NetworkManager[1921]: <info> [1638548561.2390] device (wlp116s0): supplicant interface state: scanning -> inactive
Dec 03 10:22:36 REXTRON-Z NetworkManager[1921]: <info> [1638548556.9051] device (wlp116s0): supplicant interface state: inactive -> scanning
Dec 03 10:22:32 REXTRON-Z NetworkManager[1921]: <info> [1638548552.2458] device (wlp116s0): supplicant interface state: scanning -> inactive
$ cat /sys/class/drm/card1/error
GPU HANG: ecode 7:1:00dfffff, in Xorg [3074]
Kernel: 5.11.0-37-generic x86_64
Driver: 20201103
Time: 1638548583 s 879016 us
Boottime: 6535 s 983559 us
Uptime: 6535 s 103193 us
Capture: 4296526272 jiffies; 317792 ms ago
Active process (on ring rcs0): Xorg [3074]
Reset count: 0
Suspend count: 0
Platform: HASWELL
Subplatform: 0x0
PCI ID: 0x0412
PCI Revision: 0x06
PCI Subsystem: 1458:d000
IOMMU enabled?: 0
RPM wakelock: yes
PM suspended: no
GT awake: yes
EIR: 0x00000000
IER: 0xfc080421
GTIER[0]: 0x00401821
PGTBL_ER: 0x00000000
FORCEWAKE: 0x00000000
DERRMR: 0xffffffff
fence[0] = 00000000
fence[1] = 00000000
fence[2] = 00000000
fence[3] = 00000000
fence[4] = 00000000
fence[5] = 00000000
fence[6] = 00000000
fence[7] = 00000000
fence[8] = 00000000
fence[9] = 00000000
fence[10] = 00000000
fence[11] = 00000000
fence[12] = 3b340b300b0b001
fence[13] = 95f50b3065cc001
fence[14] = cf450b309f1c001
fence[15] = 00000000
fence[16] = 00000000
fence[17] = 00000000
fence[18] = 00000000
fence[19] = 00000000
fence[20] = 00000000
fence[21] = 00000000
fence[22] = 00000000
fence[23] = 00000000
fence[24] = 00000000
fence[25] = 00000000
fence[26] = 00000000
fence[27] = 00000000
fence[28] = 00000000
fence[29] = 00000000
fence[30] = 00000000
fence[31] = 00000000
ERROR: 0x00000000
DONE_REG: 0xffffffff
ERR_INT: 0x00000000
rcs0 command stream:
CCID: 0x7ffa9109
START: 0x00301000
HEAD: 0xbaa0097c [0x00000978]
TAIL: 0x00001208 [0x00000a78, 0x00000a90]
CTL: 0x00003001
MODE: 0x00004000
HWS: 0x7fffe000
ACTHD: 0x00000000 baa0097c
IPEIR: 0x00000000
IPEHR: 0xff000000
ESR: 0x00000001
INSTDONE: 0xffdfffff
SC_INSTDONE: 0xffffffff
SAMPLER_INSTDONE[0][0]: 0xffffffff
ROW_INSTDONE[0][0]: 0xffffffff
batch: [0x00000000_13e2b000, 0x00000000_13e2c000]
BBADDR: 0x00000000_13e2b32c
BB_STATE: 0x00000020
INSTPS: 0x80000101
INSTPM: 0x00006280
FADDR: 0x00000000 00301b40
RC PSMI: 0x00000010
FAULT_REG: 0x00000000
GFX_MODE: 0x00002a00
PP_DIR_BASE: 0x7fda0000
hung: 1
engine reset count: 0
Active context: Xorg[3074] prio 0, guilty 0 active 0, runtime total 0ns, avg 0ns
rcs0 --- WA context = 0x00000000 7ffba000
# garbage text removed - edison
available engines: 47
slice total: 1, mask=0001
subslice total: 2
slice0: 2 subslices, mask=00000003
EU total: 20
EU per subslice: 10
has slice power gating: no
has subslice power gating: no
has EU power gating: no
slice0: 2 subslice(s) (0x00000003):
subslice0: 10 EUs (0x3ff)
subslice1: 10 EUs (0x3ff)
Num Pipes: 3
PWR_WELL_CTL2: c0000000
Pipe [0]:
Power: on
SRC: 077f0437
STAT: 00000000
Plane [0]:
CNTR: d9000400
STRIDE: 00005a00
SURF: 09f2b000
TILEOFF: 00000000
Cursor [0]:
CNTR: 00000000
POS: 00000000
BASE: 00000000
Pipe [1]:
Power: on
SRC: 077f0437
STAT: 00000000
Plane [1]:
CNTR: d9000400
STRIDE: 00005a00
SURF: 09f3a000
TILEOFF: 00000000
Cursor [1]:
CNTR: 00000000
POS: 00000000
BASE: 00000000
Pipe [2]:
Power: on
SRC: 00000000
STAT: 00000000
Plane [2]:
CNTR: 00000000
STRIDE: 00000000
SURF: 00000000
TILEOFF: 00000000
Cursor [2]:
CNTR: 00000000
POS: 00000000
BASE: 00000000
CPU transcoder: A
Power: on
CONF: c0000000
HTOTAL: 0897077f
HBLANK: 0897077f
HSYNC: 080307d7
VTOTAL: 04640437
VBLANK: 04640437
VSYNC: 0440043b
CPU transcoder: B
Power: on
CONF: c0000000
HTOTAL: 0897077f
HBLANK: 0897077f
HSYNC: 080307d7
VTOTAL: 04640437
VBLANK: 04640437
VSYNC: 0440043b
CPU transcoder: C
Power: on
CONF: 00000000
HTOTAL: 00000000
HBLANK: 00000000
HSYNC: 00000000
VTOTAL: 00000000
VBLANK: 00000000
VSYNC: 00000000
CPU transcoder: EDP
Power: on
CONF: 00000000
HTOTAL: 00000000
HBLANK: 00000000
HSYNC: 00000000
VTOTAL: 00000000
VBLANK: 00000000
VSYNC: 00000000
gen: 7
gt: 2
iommu: disabled
memory-regions: 5
page-sizes: 1000
platform: HASWELL
ppgtt-size: 31
ppgtt-type: 1
dma_mask_size: 40
is_mobile: no
is_lp: no
require_force_probe: no
is_dgfx: no
has_64bit_reloc: no
gpu_reset_clobbers_display: no
has_reset_engine: no
has_fpga_dbg: yes
has_global_mocs: no
has_gt_uc: no
has_l3_dpf: yes
has_llc: yes
has_logical_ring_contexts: no
has_logical_ring_elsq: no
has_logical_ring_preemption: no
has_master_unit_irq: no
has_pooled_eu: no
has_rc6: yes
has_rc6p: no
has_rps: yes
has_runtime_pm: yes
has_snoop: no
has_coherent_ggtt: yes
unfenced_needs_alignment: no
hws_needs_physical: no
cursor_needs_physical: no
has_csr: no
has_ddi: yes
has_dp_mst: yes
has_dsb: no
has_dsc: no
has_fbc: yes
has_gmch: no
has_hdcp: no
has_hotplug: yes
has_hti: no
has_ipc: no
has_modular_fia: no
has_overlay: no
has_psr: yes
has_psr_hw_tracking: yes
overlay_needs_physical: no
supports_tv: no
rawclk rate: 125000 kHz
CS timestamp frequency: 12500000 Hz
Has logical contexts? yes
scheduler: 0
i915.vbt_firmware=(null)
i915.modeset=-1
i915.lvds_channel_mode=0
i915.panel_use_ssc=-1
i915.vbt_sdvo_panel_type=-1
i915.enable_dc=-1
i915.enable_fbc=0
i915.enable_psr=-1
i915.psr_safest_params=no
i915.enable_psr2_sel_fetch=no
i915.disable_power_well=1
i915.enable_ips=1
i915.invert_brightness=0
i915.enable_guc=0
i915.guc_log_level=-1
i915.guc_firmware_path=(null)
i915.huc_firmware_path=(null)
i915.dmc_firmware_path=(null)
i915.mmio_debug=0
i915.edp_vswing=0
i915.reset=3
i915.inject_probe_failure=0
i915.fastboot=-1
i915.enable_dpcd_backlight=-1
i915.force_probe=
i915.fake_lmem_start=0
i915.enable_hangcheck=yes
i915.load_detect_test=no
i915.force_reset_modeset_test=no
i915.error_capture=yes
i915.disable_display=no
i915.verbose_state_checks=yes
i915.nuclear_pageflip=no
i915.enable_dp_mst=yes
i915.enable_gvt=no
To anyone reading this in the future, I never solved this problem, however I did revert to using kernel 4.4.297-1-MANJARO and it works, though that is on Manjaro. Perhaps an even older kernel would work on Zorin as well.