Nvidia drivers non-functional for Geforce GTX 1660

I have been using zorin 15 for many years and with the update option now available I upgraded to zorin 16.3

For the first 2 days things were running fine, except for a weird bug where heroic launcher (flatpak) installed 3 separate nvidia drivers and launched games with a blank screen.

While fiddling with fixing that I noticed that my system was running on nvidia-525 drivers so I updated to nvidia-535. After reboot I got:

[ 8.032517] nvidia-gpu 0000:26:00.3: i2c timeout error e0000000
[ 8.032542] ucsi_ccg 3-0008: i2c_transfer failed -110
[ 8.032556] ucsi_ccg 3-0008: ucsi_ccg_init failed - -110

followed by a blank screen with a cursor flashing in the top left corner with no mouse and the second monitor was not receiving any signal.

so I blacklisted the i2c driver and received the same problem (without the error message). I then swapped back to nvidia-525 and same problem, without the error text first.



Deciding that a reinstall was in order I then performed a fresh install (with update during install and 3rd party software on install enabled) and ran into the same i2c timeout followed by a blank screen error.

So i blacklisted i2c on the new install too, no more i2c timeout error anymore however no change to the blank screen.

I then manually ran

ubuntu-drivers autoupdate

on the existing install (by doing a chroot using the live disk) and after that I had image and boot was successfully completing.

However it was booting using llvm drivers so I had "unknown monitor", reduced resolution, no second monitor, etc.



I tried manually installing the nvidia-535-open-server (recommended, tested) drivers as well as nvidia-525-open drivers;

I have tried

prime-select nvidia

and they all revert to llvm

I am currently running on nouveau to have decent resolution and both monitors but i would really like to get nvidia drivers running for wine applications.

ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:03.1/0000:26:00.0 ==
modalias : pci:v000010DEd00002184sv000019DAsd00002543bc03sc00i00
vendor : NVIDIA Corporation
model : TU116 [GeForce GTX 1660]
driver : nvidia-driver-535-open - distro non-free
driver : nvidia-driver-535-server - distro non-free
driver : nvidia-driver-525-server - distro non-free
driver : nvidia-driver-450-server - distro non-free
driver : nvidia-driver-418-server - distro non-free
driver : nvidia-driver-535 - distro non-free
driver : nvidia-driver-525-open - distro non-free
driver : nvidia-driver-535-server-open - distro non-free recommended
driver : nvidia-driver-525 - distro non-free
driver : nvidia-driver-470 - distro non-free
driver : nvidia-driver-470-server - distro non-free
driver : xserver-xorg-video-nouveau - distro free builtin

as can be seen, all of the drivers are valid for the gfx card and the kernel can detect and communicate with the device just fine.



Here are what I think are the relevant extracts from logs:

Are you using USB-C to connect your monitor?

Myself and others have noticed that the Nvidia 535 drivers on some machines just simply do not work correctly. On my Nvidia 3060, it won't report the wattage and the GPU won't properly increase fan speed in response to usage.
:roll_eyes:
If your i2c driver had no trouble before, then I would not recommend blacklisting it.
https://bugzilla.kernel.org/show_bug.cgi?id=206653

Test removing the blacklist on i2c.
Instead, completely purge all Nvidia and its configurations.

sudo apt remove --purge '^nvidia-.*'

Reinstall Nvidia:

sudo ubuntu-drivers install

Launch Software & Updates, navigate to the Additional Drivers tab and select the 470 (proprietary) driver and test if it is working.

no, not plugged in using USB-C, one of the monitors is on an HDMI cable and the other is on an old parallel cable (using adapters to plug into a DP-port). Neither of the monitors is new enough to have USB-type (or even DP-type) connection ports.

(from the bugzilla link you posted:)

NVIDIA GTX 1660 Ti doesn't have USB Type-C interface. (See Grafische kaart van de NVIDIA GeForce GTX 16-serie)

NVIDIA I2C driver is loaded based on "PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID PCI_CLASS_SERIAL_UNKNOWN" (refer A) and then it loads ucsi_ccg driver which fails after i2c transfer timeouts since there is no Type-C interface.

I already tried the blacklisting option, (twice, its described above) and it does get rid of the timeout error. However, the screen still launches blank with a cursor (or using llvm as the driver)

I dont think that the i2c driver didnt trouble me before, from what i understand, its a new feature of ubuntu 20 so zorin 15 didnt include it.

Yes, the 535 driver gives problems, however the 525 driver isnt working either....

I havent tried a purge, and i havent tried the 470 driver either, should i try the 470 driver, then if it doesnt work purge and then retry it, or should i just immediately purge and try the 470 driver.

when picking a driver, which are more preferable? the "open", the "server" or the basic version?

Please correct anything I am misunderstanding. Reading your posts above, it looked like you went from 15 to 16, but were on 16 for a time without trouble.
You then noticed that Flatpak Installed some drivers - But I am not aware of Flatpak ever handling Nvidia drivers, due to the absolute necessity for System Handling of those drivers. Flatpak is containerized.
You can double check with

flatpak list

I recommend Purging all Nvidia drivers first and then immediately run

sudo ubuntu-drivers install

Do not reboot, log out or shut down in between.
After running the above install, then open Software & Updates > Additional Drivers and try the 470 (Proprietary) driver. I suggest not trying to use "server" or "open" or the 470 (Proprietary Tested).
Just 470 (Proprietary).

Yes it ran in 16 for between 24 and 48 hrs, however it was the 'first' run of 16 (after the upgrade zorin program finished) and iirc the completed window does not mention rebooting being required.

I was fiddling around, learning my way around the new system, but i'm about 85% sure that it was still running on the nvidia-525 driver that had initialized in zorin 15 BEFORE the update.

I was trying out Heroic Launcher (flatpak) (in zorin 15 i used PoL for my wine needs) and due to the containerized nature flatpak needs to install nvidia drivers (in addition to the system installed version) and its all handled automatically by the Herioc launcher installer.

I was having a bit of trouble with launching the game i chose as a test product in herioc but the same game was launching just fine in PoL (so i started looking around and learning the gui and asking questions on the heroic discord) and they got me to check the nvidia installed in flatpak (yes, using flatpak list) (i didnt know about the flatpak nvidia drivers until then either) and i found that 525, 535 and another had been installed (i guess because i had 525 and the other already loaded on my system from using zorin 15 and upgrading nvidia drivers from other to 525 at some point and 535 was installed because its latest ..... but thats a completely uneducated guess at this point).

When looking into that i found that my system nvidia was not updated either so i updated that.

While trying to use gimp to cut screenshots to show the people on the heroic discord what was happening i got an error from gimp that the gagl library was too old for this version of gimp, attempting to update the gagl (apt search gagl, oh good, the version is high enough, apt install gagl) i got a message that dpkg needed to be initialized and apt provided the command to run, i ran it, installed gagl and received the same error message from gimp.

so i rebooted and got i2c timeout errors followed by a blank screen.

So while the nvidia 525 driver was functioning perfectly in zorin 16, it got initialized by the ubuntu 18 kernel from zorin 15 and not actually by zorin 16 itself.

.....

after the completely clean install showed that the blank screen issue was not related to the upgrade directly, was not related to i2c and was not limitted to 535, i left all of that out of the initial description since it was irrelevant.

I will do so tomorrow and report back, thank you for the assistance.

1 Like

well this is wierd....

The following packages will be REMOVED
nvidia-compute-utils-525* nvidia-compute-utils-535-server*
nvidia-dkms-525-open* nvidia-dkms-535-server-open* nvidia-kernel-common-525*
nvidia-kernel-common-535-server* nvidia-prime* nvidia-settings*
0 to upgrade, 0 to newly install, 8 to remove and 5 not to upgrade.
After this operation, 52.2 kB disk space will be freed.
Do you want to continue? [Y/n]

both nvidia-535-server-open and nvidia-525-open were installed but not purged and each is 400+ MB, i can swap to them without needing to redownload them (i know since it takes a few hrs for me to download them

0 to upgrade, 19 to newly install, 0 to remove and 5 not to upgrade.
Need to get 36.5 MB/312 MB of archives.
After this operation, 911 MB of additional disk space will be used.

The command

sudo ubuntu-drivers install

takes several hours to run?

downloads do

sudo ubuntu-drivers install

was surprisingly fast, even though it was 911MB only took about 45 mins.

then open Software & Updates > Additional Drivers and try the 470 (Proprietary) driver.

is busy taking a long time, its only about 20% along on the progress bar

....so far, i am actually finding zorin 16 significantly faster than zorin 15
starting times for libre office, gimp, etc are noticably improved

It can take a while. Might get up and do other tasks while waiting...

nope. its a bust

temp1

temp2

...
Now it won't even work when I try swap back to neaveou, it stays on llvm
and doing ubuntu-drivers autoinstall isnt making neaveou work either (even though thats what worked before the purge).

oh, and it looks like my network adapter is borked too now.

What behavior is it showing on the 470 driver?
You did have a blank screen earlier...

urm.. Odd... May need to make a separate thread on that.

U can see the behavior of the nvidia-470 driver in the images above.

The gui (software & updates -> additional drivers) believes the 470 driver is running, the system (settings -> about) knows that its running on llvm instead.

So borked resolution, no ability to change resolution, no second monitor, the operating monitor recognized as 'unknown monitor'


No blank screen
No different behavior than when trying nvidia-535 or nvidia-525

The only difference between swapping drivers for testing without the purge and swapping them with a purge is that now neaveou no longer works

1 Like

What is your terminal output for

glxinfo | grep OpenGL

lsmod

It looks like the drivers are installed - but the modules are not loaded.

glxinfo | grep OpenGL

Command 'glxinfo' not found, but can be installed with:
sudo apt install mesa-utils

(cant do that until i fix my networking)
(or manage to hijack my cellphone's connection over a USB ... tomorrows problem)


lsmod

Module                  Size  Used by
uas                    28672  0
usb_storage            77824  2 uas
nls_iso8859_1          16384  2
btrfs                1540096  2
blake2b_generic        20480  0
zstd_compress         225280  1 btrfs
nvidia_uvm           1556480  0
nvidia_drm             77824  0
nvidia_modeset       1445888  1 nvidia_drm
binfmt_misc            24576  1
nvidia               7524352  2 nvidia_uvm,nvidia_modeset
kvm_amd               155648  0
kvm                  1015808  1 kvm_amd
crct10dif_pclmul       16384  1
crc32_pclmul           16384  0
ghash_clmulni_intel    16384  0
aesni_intel           376832  0
crypto_simd            16384  1 aesni_intel
cryptd                 24576  2 crypto_simd,ghash_clmulni_intel
joydev                 32768  0
input_leds             16384  0
drm_kms_helper        307200  1 nvidia_drm
cec                    61440  1 drm_kms_helper
rc_core                61440  1 cec
fb_sys_fops            16384  1 drm_kms_helper
ccp                   102400  1 kvm_amd
syscopyarea            16384  1 drm_kms_helper
sysfillrect            20480  1 drm_kms_helper
sysimgblt              16384  1 drm_kms_helper
sch_fq_codel           24576  1
msr                    16384  0
parport_pc             53248  0
ppdev                  24576  0
lp                     28672  0
parport                69632  3 parport_pc,lp,ppdev
drm                   618496  3 drm_kms_helper,nvidia,nvidia_drm
efi_pstore             16384  0
ip_tables              32768  0
x_tables               53248  1 ip_tables
autofs4                49152  2
raid10                 69632  0
raid456               163840  0
async_raid6_recov      24576  1 raid456
async_memcpy           20480  2 raid456,async_raid6_recov
async_pq               24576  2 raid456,async_raid6_recov
async_xor              20480  3 async_pq,raid456,async_raid6_recov
async_tx               20480  5 async_pq,async_memcpy,async_xor,raid456,async_raid6_recov
xor                    24576  2 async_xor,btrfs
raid6_pq              122880  4 async_pq,btrfs,raid456,async_raid6_recov
libcrc32c              16384  2 btrfs,raid456
raid0                  24576  0
multipath              20480  0
linear                 20480  0
raid1                  49152  1
hid_generic            16384  0
usbhid                 65536  0
hid                   147456  2 usbhid,hid_generic
ahci                   45056  9
xhci_pci               24576  0
libahci                45056  1 ahci
xhci_pci_renesas       20480  1 xhci_pci
1 Like

Well, the modules are loading.

We know that the drivers are installed.

For any of the next steps, we will need networking.

I see that Nvidia_DRM is showing loading, but didn't your initial errors when you started the thread say that loading it resulted in an error?

The network issue is definitely software related. Using the live disk it works just fine.

No idea where to even begin to fix the network card issue.

Should I just reinstall?
Sry about delay, IRL is seldom convenient

I really would ask that we stick to troubleshooting one topic at a time. I fully agree that troubleshooting network should take priority, since having a working network is conducive to repairing anything else.
Have you started a thread on your Network issue?
If not, can you please start a new one?

Network up and running and machine rebooted.

Since we swapped kernals, software & updates believes we are running on the nvidia-535-server-open driver instead of the 470 driver and its still really running llvm.

other than that it should be in the same state as previous.

I havent tested if neaveou or any other drivers are working.

i have now installed mesa utils and here is the output:

borgrel@coffeehouse:~$ glxinfo | grep OpenGL
OpenGL vendor string: Mesa/X.org
OpenGL renderer string: llvmpipe (LLVM 12.0.0, 256 bits)
OpenGL core profile version string: 4.5 (Core Profile) Mesa 21.2.6
OpenGL core profile shading language version string: 4.50
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 3.1 Mesa 21.2.6
OpenGL shading language version string: 1.40
OpenGL context flags: (none)
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.2 Mesa 21.2.6
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
OpenGL ES profile extensions:

ok... How about

ls /usr/share/X11/xorg.conf.d/

This may also be a VAAPI issue. We can try Disabling Intel Hardware Acceleration.

Source:

/usr/share/X11/xorg.conf.d/10-nvidia.conf
Add Option "Accel" "off".

Section "OutputClass"
    Identifier "nvidia"
    MatchDriver "nvidia-drm"
    Driver "nvidia"
    Option "AllowEmptyInitialConfiguration"
    Option "Accel" "off"
    ModulePath "/usr/lib/x86_64-linux-gnu/nvidia/xorg"
EndSection