Showing posts with label nVidia. Show all posts
Showing posts with label nVidia. Show all posts

Friday, November 4, 2016

Surviving an nVidia Driver Update

Scenario: I'm running Linux Mint 17.3 Rebecca (based on Ubuntu 14.04) on a PC with a GeForce 6150SE nForce 430 graphics card. My desktop environment is Cinnamon. The graphics card is a bit long in the tooth, but it's been running fine with the supported nVidia proprietary driver for quite some time. Unfortunately, having no reason to do so, I had not noted down what version of the driver I had.

Yesterday I installed a couple of "recommended" updates, one of which bumped the nVidia driver to version 304.132. Apparently this driver is "recommended" for people who don't want to see their desktops. On the next boot after the upgrade, I got a black screen. To be clear, there's no problem before the desktop loads -- I can see the BIOS messages at the start of the boot and the grub menu where I get to choose which version of the operating system to boot. It's only when we get to the desktop that the display fails.

A bit of searching showed me that I'm far from the only person experiencing this. What's lacking (at least as of this writing) is a definitive fix. I'll skip the gory details of a day and half of fruitless hacking and cut to the chase scene.

Getting to a terminal


The first step was to escape the black hole of my desktop. That was easier said than done. You can't right click on an invisible desktop (or at least if you do it's unproductive). Ditto trying to get to the application menu. Fortunately, control+alt+F2 did manage to kill the (worthless) X session and get me to a terminal. (The display worked fine in terminal mode.)

Getting to a desktop


It's a bit like cutting off your leg to get rid of a broken toe, but one way to get out of nVidia Hell is to get nVidia off your system. So in the terminal I ran

sudo apt-get purge nVidia*

(which deleted all nVidia packages) followed by

sudo reboot now

(which did exactly what you would think). With nVidia gone, the system was forced to use the open source "nouveau" driver. Unfortunately, the nouveau driver seemed to be hopelessly confused about what sort of display I had (it called it a "laptop display") and what resolution to use. The result was a largely unusable (but at least mostly visible) desktop.

Rolling way, way back


My hope was to roll back to the previous nVidia driver, but that hope was quickly dashed. I was able to run the device manager. (You have two ways to do this, depending on how good or bad the nouveau display is. One is to use the Mint menu to run "Device Manager", if you can. The other is to open a terminal and run "gksudo device-manager".) The device manager listed three possible drivers for me. The first was the multiple-expletives-deleted 304.132 nVidia driver, the second was the nouveau driver, and the third was the version 173.14.39 nVidia driver. So I picked the third, applied the changes and restarted.

This got me a fully functional desktop (at the correct resolution), but performance was less than stellar, as one might expect from that old a driver. There were noticeable lags between some operations (such as clicking the close control on a window) and their results (window actually closing). More importantly, if I suspended the machine, when I tried to resume I could not get the desktop back. So version 173 was not the permanent solution.

Rolling back just a little


I've mentioned the sgfxi script before. I tried running it, but it wanted to install the latest supported version, which was that nasty 304.132 thing. After screwing around for way too long, I discovered I could in fact roll back with the script.

The first step was to kill the X server, since sgfxi won't run while X is running. So I used control+alt+F2 to get to a terminal, logged in, and ran

sudo service mdm stop

to get rid of the display manager. That forced me to log in again, after which I ran

sudo su -
sgfxi --advanced-options | less

for two reasons. One was to find the correct option for specifying a particular driver version (it's -o). The other was to get a list of available versions, which appears near the top of the output.

I tried a few of the more recent ones listed, but was told either that they weren't supported (despite appearing in the list) or that some other package was the wrong version to mesh with them. Fortunately, 304.131 could be installed. I assume that was released immediately before the ill-fated 304.132. So once more unto the breach: running (as root)

sgfxi -o 304.131

worked. I was prompted to make a few decisions (one or two of which I simply guessed about), and I got one error message during a cleanup phase, but the script did install the driver and terminated. I rebooted and the system seems to be working normally. It doesn't feel sluggish any more, and returning from a nap is no problem.

The earlier purger and removed the nvidia-settings package, so I used the Synaptic package manager to reinstall version 304 of that. It provides a graphical interface to adjust display settings, although so far the defaults seem to work just fine for me.

Now I just need to be sure never, ever, ever again to update that driver.

Sunday, December 8, 2013

Mint Petra Odds and Ends

As I complete (hopefully) the upgrade of my home PC to Linux Mint 16 Petra (which appears to be commensurable with Ubuntu 13.10 Saucy Salamander), I'm making notes on glitches small and large.
  • I initially had some display problems booting and resuming from hibernation. By "display problems" I mean totally corrupted displays, or what appeared to be hang-ups during the boot process. This is not the first time I've experienced display problems of this nature. So I reinstalled the sgfxi script (this time to my home partition, where it should survive future system upgrades, and ran it. Since installing the latest NVIDIA proprietary drivers, I've had no boot failures. The display is still sometimes a bit slow redrawing itself, something I'll need to investigate when I have time.
  • As noted in my previous post, Petra has the annoying habit of opening the Disk Usage Analyzer (a.k.a. "Baobab") when it ought to be displaying a folder in Nemo, the replacement for Nautilus. The fix (in the second update of the previous post) is to add
    inode/directory=nemo.desktop;baobab.desktop;
    to  ~/.local/share/applications/mimeapps.list
  • Back in June, I posted about using the µswsusp package to accelerate hibernation and return from hibernation. Unfortunately, while the package seems to work as far as saving a compressed image to disk, Petra is unable to return from hibernation. I get a message that it is "Resuming from /dev/disk/by-uuid/you-don't-want-to-know", and I get to stare at that message for as long as I want before rebooting the machine. So I uninstalled µswsusp. Fortunately, Petra seems to be faster than Nadia was at both hibernating and (more importantly, at least to me) resuming. If I need to bring back µswsusp at a later date, the answer may be here (change the configuration file for µswsusp so that "resume device" is set to the UUID of the drive rather than its name). [Update: No joy. I reinstalled µswsusp, confirmed that it hung on resume, edited its config file to use /dev/disk/by-uuid/you-don't-want-to-know as the resume device, ran update-initramfs -u (as root), and it still hung on resume. Apparently I need to give up on µswsusp.]
  • While messing around with the hibernation stuff, I periodically encountered a message "No support for locale en_US.utf8". This seemed harmless enough, but I decided to do a quick search, and turned up clear instructions (albeit for an older version, Mint 13) on how to fix it. Running
sudo locale-gen --purge --no-archive
hopefully fixes it.

Tuesday, January 1, 2013

2013 Off to a Shaky Start

Here's a bit of irony for you: I had just finished reading a blog post in which the author pooh-poohed the notion that the year 2013 would be unlucky due to the last two digits when my system crashed ... repeatedly. Maybe the triskaidekaphobics are onto something (or maybe the Mayans were just off by a week and change).

When I first installed Linux Mint 14 (Cinnamon), upgrading from Mint 11, it appeared to me that my display seemed a bit dimmer than it had been. (My monitor is a Samsung SyncMaster 932GW; the video card is an NVIDIA GeForce 6150SE nForce 430.) My eyes adjusted to the difference, which I suspect was caused by the open-source Nouveau driver that ships with Mint. Until today, that's been the only issue with the Nouveau driver.

After reading the aforementioned blog post about 2013, I moved on to another post on the same blog, one containing a modest amount of graphics and nothing that, to my eye, would require any sort of hardware acceleration. As I scrolled down the post, my system suddenly did a spontaneous reboot. I think it was the X system rather than Linux itself, but I'm no expert on these things. The symptoms were a crash to a black screen, then the Mint log-in screen.

So I logged in again, and as soon as the password was submitted, I got a black screen (which is actually normal between log-in and desktop) and then the log-in screen again. Uh-oh! After logging in (again), the cycle repeated, only this time what I assume was the log-in screen was unreadable (the sort of colored "sleet" you get when there is a mismatch between monitor and display card with regard to scan frequency). I couldn't restart the X display with ctrl-alt-F1, so I powered down the computer the old fashioned way (the power switch), started it up again, and managed to log in successfully.

Poking around in the syslog file with the system log viewer (Menu > System Tools > System Log),  I found the following at what I think was the point of the initial crash:

Jan  1 17:49:20 HomePC kernel: [15178.991600] [drm] nouveau 0000:00:0d.0: Failed to idle channel 1.
Jan  1 17:49:20 HomePC gnome-session[1620]: Gdk-WARNING: gnome-session: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.#012
Jan  1 17:49:20 HomePC mdm[1193]: WARNING: mdm_slave_xioerror_handler: Fatal X error - Restarting :0

Note the reference to nouveau having a problem, followed by a fatal X error. The next chunk of the log shows the same pattern as my two log-in attempts lead to more crashes:

Jan  1 17:49:21 HomePC acpid: 1 client rule loaded
Jan  1 17:49:35 HomePC kernel: [15193.814381] [TTM] Failed to expire sync object before buffer eviction
Jan  1 17:49:35 HomePC kernel: [15193.814443] [TTM] Failed to expire sync object before buffer eviction
Jan  1 17:49:35 HomePC kernel: [15193.817930] [TTM] Failed to expire sync object before buffer eviction
Jan  1 17:49:51 HomePC kernel: [15209.778576] [drm] nouveau 0000:00:0d.0: Failed to idle channel 1.
Jan  1 17:49:51 HomePC kernel: [15209.782509] [drm] nouveau 0000:00:0d.0: Setting dpms mode 3 on vga encoder (output 0)
Jan  1 17:49:51 HomePC gnome-session[5894]: Gdk-WARNING: gnome-session: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.#012
Jan  1 17:49:51 HomePC mdm[5858]: WARNING: mdm_slave_xioerror_handler: Fatal X error - Restarting :0
Jan  1 17:49:51 HomePC pulseaudio[6317]: [pulseaudio] client-conf-x11.c: xcb_connection_has_error() returned true
Jan  1 17:49:51 HomePC pulseaudio[6322]: [pulseaudio] client-conf-x11.c: xcb_connection_has_error() returned true
Jan  1 17:49:51 HomePC pulseaudio[6325]: [pulseaudio] pid.c: Stale PID file, overwriting.
Jan  1 17:49:51 HomePC pulseaudio[6325]: [pulseaudio] bluetooth-util.c: org.bluez.Manager.ListAdapters() failed: org.freedesktop.DBus.Error.AccessDenied: Rejected send message, 2 matched rules; type="method_call", sender=":1.111" (uid=1000 pid=6325 comm="/usr/bin/pulseaudio --start --log-target=syslog ") interface="org.bluez.Manager" member="ListAdapters" error name="(unset)" requested_reply="0" destination="org.bluez" (uid=0 pid=853 comm="/usr/sbin/bluetoothd ")
Jan  1 17:49:51 HomePC pulseaudio[6325]: [pulseaudio] server-lookup.c: Unable to contact D-Bus: org.freedesktop.DBus.Error.NoServer: Failed to connect to socket /tmp/dbus-z2bkq6PwPB: Connection refused
Jan  1 17:49:51 HomePC pulseaudio[6325]: [pulseaudio] main.c: Unable to contact D-Bus: org.freedesktop.DBus.Error.NoServer: Failed to connect to socket /tmp/dbus-z2bkq6PwPB: Connection refused
Jan  1 17:49:51 HomePC pulseaudio[6330]: [pulseaudio] pid.c: Daemon already running.
Jan  1 17:49:54 HomePC acpid: client 5871[0:0] has disconnected
Jan  1 17:49:54 HomePC acpid: client connected from 6332[0:0]
Jan  1 17:49:54 HomePC acpid: 1 client rule loaded
Jan  1 17:49:54 HomePC kernel: [15213.146184] [drm] nouveau 0000:00:0d.0: Failed to idle channel 4.
Jan  1 17:49:54 HomePC kernel: [15213.173003] [drm] nouveau 0000:00:0d.0: Setting dpms mode 0 on vga encoder (output 0)
Jan  1 17:49:54 HomePC kernel: [15213.173011] [drm] nouveau 0000:00:0d.0: Output VGA-1 is running on CRTC 0 using output A
Jan  1 17:50:18 HomePC kernel: [15237.313716] [drm] nouveau 0000:00:0d.0: Failed to idle channel 1.
Jan  1 17:50:18 HomePC mdm[6318]: WARNING: mdm_slave_xioerror_handler: Fatal X error - Restarting :0
Jan  1 17:50:21 HomePC kernel: [15240.366064] [drm] nouveau 0000:00:0d.0: Failed to idle channel 4.
Jan  1 17:50:21 HomePC acpid: client 6332[0:0] has disconnected
Jan  1 17:50:21 HomePC acpid: client connected from 6745[0:0]
Jan  1 17:50:21 HomePC acpid: 1 client rule loaded
Jan  1 17:50:28 HomePC kernel: [15247.081273] [drm] nouveau 0000:00:0d.0: Failed to idle channel 3.
Jan  1 17:50:41 HomePC kernel: Kernel logging (proc) stopped.

So I decided to switch to the proprietary NVIDIA driver. The process is easy, although the download is a bit time consuming: Menu > Preferences > Software Sources, switch to the last tab (Additional Drivers), select the NVIDIA binary driver (proprietary and tested version in my case). The system did not demand a reboot after installation was complete, but inxi -G produced output that did not specify the new driver, so I rebooted just to be safe. After the reboot, inxi -G reports "FAILED: nvidia", which is apparently a bug (it reported "FAILED: nouveau" before the driver change) but shows "GLX Version: 2.1.2 NVIDIA 304.43", indicating I've successfully switched to the NVIDIA driver.

We'll see if my decision to switch drivers proves rash. The Nouveau driver worked fine for months, and it's possible some other thing will blow up the NVIDIA driver.

Wednesday, September 12, 2012

Boot Bug: Anniversary Edition

A year ago July, I ran into a problem in which my AMD-64 Mint Katya system started to hang on boots. As noted in that post, the problem stemmed from messing with the video settings for the boot loader. Reinstalling the nVidia display driver might or might not have helped eliminate the problem, but eventually it sorted itself out.

A month later, I ran into a second problem linked to the nVidia driver, in which the X server decided to spontaneously reboot on occasion. I got rid of that problem by using the sgfxi script to update to the latest version of the nVidia driver.

Do gremlins celebrate anniversaries? Slightly over one year after fixing the second problem, the first one came back. It started when I tried to run a program that crashed because it could not find a way to do XGL hardware graphics acceleration. That led me to the Additional Drivers dialog in the Control Center, where I discovered that the proprietary nVidia driver was not selected. As it turns out, that's probably because the Control Center sees the one the operating system installed but not the newer one that the sgfxi script installed. I'm not sure that either version was running, though, since both should provide XGL.

Anyway, I decided to turn on the proprietary driver in the Additional Drivers dialog. When I rebooted, the machine hung at the battery test stage. (Note to self: if this happens again, do not turn on the proprietary driver. Get out of X and run the sgfxi script.) I could boot into safe mode, but no way could I boot regularly (despite repeated attempts). Safe mode worked, but it used too low a resolution, with the desktop off-center (left edge cut off, blank screen on the right), and I could not change the resolution. In Control Center > Monitors I discovered that the system could not detect what kind of monitor I had, so it gave me a wimpy default choice that I could not edit.

Eventually I booted into safe mode, went back to Additional Drivers and discovered that I now had a new option: an "experimental" driver. This was the nouveau driver, which I selected. With the nouveau driver, I could boot normally and get to the desktop, but the resolution was still wrong, and the desktop was still off-center. So I switched back to the nVidia driver, and the next boot predictably hung.

I booted into safe mode again, but this time I selected the option to repair packages, thinking perhaps one of the X packages was damaged. That was apparently not the case, but it did remind me that I had seven updates I had not installed. These were updates to the Linux core, including the X system. Normally I don't install those updates because I like to upgrade to a new version of the entire system at once. This was not a normal time, though (particularly as it was getting on 1:30 in the morning), so I let the system install the updates and rebooted again ... and everything worked. Mint automatically detected my monitor correctly, the resolution was set correctly, everything was back the way it had been.  As an added bonus, the nVidia driver was being used, and the program that triggered this whole cluster worked correctly. Woo-hoo!

I subsequently ran the sgfxi script, and sure enough I was on a one-iteration-old nVidia driver. I let the script upgrade me to the latest version, and everything still seems to work properly.

There won't always be a core update waiting to bail me out, so I guess the solution process next time (a year from now?) will be as follows:
  1. Try the sgfxi script.
  2. If that fails, see if there's an update to any of the X packages that can be installed. If not, try reinstalling the X system.
  3. If that also fails, swear mightily and see if there's a new version of Mint I can download.

Tuesday, August 16, 2011

Spontaneous Reboots

Mint Katya (a derivative of Ubuntu Natty Narwhal) on my AMD-64 system has suddenly developed a penchant for spontaneously rebooting.  So far, it has only happened while I'm typing. I think it has happened twice when I was typing messages in Thunderbird and once or perhaps twice when I was typing in the Yoono plug-in for Firefox. I don't recall any recent updates to T-bird, Firefox or Yoono, nor have I changed my keyboard (in case this is related to n-key rollover -- my fingers sometimes get ahead of my brain when I'm typing). The mosquitoes have gotten quite aggressive (and numerous) around my house; maybe they drove some gremlins indoors?

Update #1: I was sloppy above. It's not Mint that is rebooting; it's the X server. This is apparently a common problem,although the variety of symptoms posted there suggest that this is in fact multiple distinct bugs with a common end result (X crashing). Things I have discovered:
  • Unlike several of those reports, I have not experienced any crashes as a result of clicking. Only typing triggers a crash.
  • Unlike the reports from laptop users, power management is not the culprit (the machine that crashes for me is a desktop).
  • Someone pointed the finger at running the AMD-64 version of Natty on an Intel CPU. My copy of Mint is in fact based on the AMD-64 Natty, but my box has an AMD-64 CPU.
  • Turning the proprietary nVidia driver off and then on did not help. (I have a GPU from nVidia.)
  • Installing xserver-xorg-video-nv did not help.
Update #2: I used the wizardous sgfxi script to update my nVidia drivers a few days ago. Since then I have had no spontaneous X crashes. I hesitated to claim victory lest it be premature, but today I wrote a moderately lengthy blog post (touch-typing at my usual subsonic speed) with nary a glitch. So hopefully the driver reload cured this problem. Thanks to Harald Hope for a handy script!

Update #3: See this post for earlier problems and this post for a later round of nVidia issues (and how I resolved them).