OR in an OB World: December 2013

Sunday, December 8, 2013

Mint Petra Odds and Ends

As I complete (hopefully) the upgrade of my home PC to Linux Mint 16 Petra (which appears to be commensurable with Ubuntu 13.10 Saucy Salamander), I'm making notes on glitches small and large.

I initially had some display problems booting and resuming from hibernation. By "display problems" I mean totally corrupted displays, or what appeared to be hang-ups during the boot process. This is not the first time I've experienced display problems of this nature. So I reinstalled the sgfxi script (this time to my home partition, where it should survive future system upgrades, and ran it. Since installing the latest NVIDIA proprietary drivers, I've had no boot failures. The display is still sometimes a bit slow redrawing itself, something I'll need to investigate when I have time.
As noted in my previous post, Petra has the annoying habit of opening the Disk Usage Analyzer (a.k.a. "Baobab") when it ought to be displaying a folder in Nemo, the replacement for Nautilus. The fix (in the second update of the previous post) is to add

inode/directory=nemo.desktop;baobab.desktop;

to ~/.local/share/applications/mimeapps.list.
Back in June, I posted about using the µswsusp package to accelerate hibernation and return from hibernation. Unfortunately, while the package seems to work as far as saving a compressed image to disk, Petra is unable to return from hibernation. I get a message that it is "Resuming from /dev/disk/by-uuid/you-don't-want-to-know", and I get to stare at that message for as long as I want before rebooting the machine. So I uninstalled µswsusp. Fortunately, Petra seems to be faster than Nadia was at both hibernating and (more importantly, at least to me) resuming. If I need to bring back µswsusp at a later date, the answer may be here (change the configuration file for µswsusp so that "resume device" is set to the UUID of the drive rather than its name). [Update: No joy. I reinstalled µswsusp, confirmed that it hung on resume, edited its config file to use /dev/disk/by-uuid/you-don't-want-to-know as the resume device, ran update-initramfs -u (as root), and it still hung on resume. Apparently I need to give up on µswsusp.]

While messing around with the hibernation stuff, I periodically encountered a message "No support for locale en_US.utf8". This seemed harmless enough, but I decided to do a quick search, and turned up clear instructions (albeit for an older version, Mint 13) on how to fix it. Running

sudo locale-gen --purge --no-archive

hopefully fixes it.

Saturday, December 7, 2013

Updating Linux Mint: Nemo v. Nautilus v. Baobab

I'm in the process of updating to Linux Mint 16 (Petra, Cinnamon-flavor), and I've already run into one bit of minor insanity. Mint 16 switched its file manager from Nautilus to Nemo, a fork of Nautilus. That's fine with me. There are minor adjustments to make, but no big deal.

Except ... after installing Petra and then downloading a file using Firefox, I right-clicked the file's entry in Firefox's download list and clicked "Open Containing Folder", expecting to, well, open the Downloads folder. Instead, Firefox launched the Disk Usage Analyzer (?!).

A web search found a small number of reports of this with no fixes posted. It took a lot of digging before I finally put together the facts that (a) Firefox uses MIME types to decide what applications to launch and (b) the name of the disk usage analyzer is "Baobab", not "Disk Usage Analyzer". Armed with that, I found a line in /usr/share/applications/mimeinfo.cache (line 368 in my copy; your mileage may vary) that read:

inode/directory=baobab.desktop;nemo.desktop;

So I ran sudo gedit /usr/share/applications/mimeinfo.cache, changed that line to

inode/directory=nemo.desktop;baobab.desktop;

and saved the file. Immediately (without having to restart it), Firefox sobered up and starting opening the Downloads folder in Nemo. I guess Firefox uses the first available program in a list of applications for a given MIME type, and for whatever reason someone thought Baobab should have precedence over Nemo.

Update: As soon as I posted this, I went back to fiddling with my installation, and installed the nemo-dropbox package from Synaptic, which integrates Dropbox with Nemo. It also downloaded Dropbox, regardless of the fact that Dropbox was already installed. :-( Not only that, it apparently modified mimeinfo.cache, because mimeinfo.cache reverted to its original self. I had to edit it again. Looks like this may be the gift that keeps on giving. :-(

One other note to self: the annotations on the Dropbox folder (up-to-date, being synched, ...) that are the whole point of the nemo-dropbox package only show up after you stop Nemo (nemo -q) and then restart it.

Update #2: I think it's solved now. After noting that /usr/share/applications/mimeinfo.cache had reverted yet again, I added the line

inode/directory=nemo.desktop;baobab.desktop;

to ~/.local/share/applications/mimeapps.list. That should override the entry in /usr/share/applications/mimeinfo.cache.

Friday, December 6, 2013

Controlling 3G/4G in Jelly Bean 4.3

My phone (a Samsung Galaxy S3) updated itself to Android 4.3 this morning. I'm sure it has all sorts of wizardous new features (most of which I'll never use). One feature that is somewhat handy is that you now have some control over which "quick setting buttons" you have in the notifications panel (top of the screen), and in what order they appear. If you swipe down from the top to expose the notifications stuff, you'll see the usual rows of buttons plus a new (I think) button in the upper right, as in the following screenshot.

Tap the button in the upper right, and you get an expanded display of the quick setting buttons.

Now tap the edit button (looks like a pencil), and you get a screen that allows you to drag and drop buttons within and between two groups, those displayed by default and those displayed only in the expanded view.

Unfortunately, there is some bad news: the quick setting button to turn mobile data (3G/4G service) on or off has been removed. I went online and found various people complaining about this on phone company support forums. I can't blame them for complaining -- this is a feature I use fairly frequently when traveling -- but I think the complaints may be misplaced. I doubt that the service providers (Verizon, Sprint, etc.) had anything to do with eliminating the button. It's possible it was a decision by the hardware vendors (Samsung, did you do this to me?), but may also be a "feature" determined by Google.

The good news is in two parts. First, you can still turn mobile data on and off by going through the device settings menu, although it's buried fairly deeply. Second, there are free third party apps that provide alternative ways to toggle data service. One very nice one is Notification Toggle (henceforth "NT"). Once installed from Google Play, it will drop a launcher icon somewhere on one of your desktops, as seen in the next shot.

You'll also see an icon in the upper left corner of the notifications bar, telling you that NT is running. Tap the launcher to get to the NT home screen.

There are lots of things you can set in NT, but for our purposes the next step is to tap the icon to the left of the program name.

Now tap the Toggles menu item to get a list of built in buttons.

The number available is truly impressive. Scroll down a bit and you'll find one for "Mobile data". Put a check next to it. (You can see in the shot above that I also added a button to turn on flashlight mode from the notifications screen.) There's also a button labeled "4G", which I assume controls whether you use 4G when available or 3G only. Now go to any screen and swipe down to expand the notifications bar.

In addition to the usual stuff, you get some new buttons (circled in the screenshot above). The one on the left toggles flashlight mode; the one on the right toggles data service on (lit) or off (unlit, as in the shot above).

Problem solved.

Sunday, December 1, 2013

Testing Regression Significance in R

I've come to like R quite a bit for statistical computing, even though as a language it can be rather quirky. (Case in point: The anova() function compares two or more models using analysis of variance; if you want to fit an ANOVA model, you need to use the aov() function.) I don't use it that often, though, which is a mixed blessing. The bad new is that infrequent use makes it hard for me to remember everything I've learned about the language. The good news is that infrequent use means I'm not having to do statistical analysis very often.

I don't think I'm alone in believing that consistent coding patterns (paradigms, idioms, whatever you want to call them) are very helpful when using a language infrequently. That motivates today's post, on testing significance of a regression model. By model significance, I mean (in somewhat loose terms) testing

H0: the null model (no predictors other than a constant term) fits the data at least as well as our model

versus

H1: our model fits the data better than the null model.

When performing a standard linear regression, the usual test of model significance is an F-test. As with most (all?) statistics packages, R helpfully prints out the p-value for this test in the summary output of the regression, so you can see whether your model is (literally) better than nothing without any extra work. To test whether a second model (call it model2) improves on model 1 significantly, you use the anova() command:

anova(model1, model2)

which is easy enough to remember.

When performing a generalized linear regression, however, R does not automatically give you a model significance test. I'll focus here on a binary logit model (dependent variable binary), but I'm pretty sure the various approaches apply to other uses of the GLM, perhaps with some tweaks.

Let's say that model1 is a binary logistic regression model I've fitted in R. The most common test for significance of a binary logistic model is a chi-square test, based on the change in deviance when you add your predictors to the null model. R will automatically calculate the deviance for both your model and the null model when you run the glm() command to fit the model. The approach to testing significance that I've seen on a number of web pages, including this one, involves calculating the p-value manually, using some variation of the following syntax:

with(model1, pchisq(null.deviance - deviance, df.null - df.residual, lower.tail = FALSE))

That's fine (nice and compact), but prospects of my remembering it are slender at best. Fortunately, we can use the aforementioned anova() command by manually fitting the null model. First rerun the logistic regression using just a constant term. Call the resulting fit null.model. Now compare null.model to model1 using the anova() command, adding an argument to tell R that you want a chi-square test rather than the default F test:

anova(null.model, model1, test = "Chisquare")

You can also use the same syntax to compare two fitted logistic models for the same data, say where model2 adds some predictors to model1. For me, that's a lot easier to remember than the manual approach.

Here's some heavily annotated code (or you can download it), if you want to see an example:

#
# Linear and logit regression examples.
#
# (c) 2013 Paul A. Rubin
# Released under the <a href="http://creativecommons.org/licenses/by/3.0/deed.en_US">Creative Commons Attribution 3.0 Unported License</a>.
#
library(datasets);
#
# To demonstrate linear regression, we use the ChickWeight data set.
# Dependent variable:
#   weight = the weight (grams) of a chick.
# Predictors:
#   Time = age of the chick (days)
#   Diet = factor indicating which of four diets was fed to the chick
# Not used:
#   Chick = subject identifier
#
# First model: regress weight on just age.
#
model1 <- lm(weight ~ Time, data = ChickWeight);
summary(model1);
#
# Shocking discovery: weight increases with age!
#
# Second model: regress weight on both age and diet.
#
model2 <- lm(weight ~ Time + Diet, data = ChickWeight);
summary(model2);
#
# Diet is also significant, and diets 2-4 all apparently have
# different effect on weight gain from diet 1.
#
# Is model 2 better than the "null" model (constant term only)?
# The summary output includes the approximate p-value (< 2.2e-16, 
# so essentially zero) for an F-test comparing model 2 to the null
# model. We can get the same information as follows:
#
null.model <- lm(weight ~ 1, data = ChickWeight); # actual null model
summary(null.model);
anova(null.model, model2); # compare model 2 to null model
#
# Is model 2 significantly better than model 1?
#
anova(model1, model2); # yes (p < 2.2e-16 again)
#
# We now switch to logit regression. To demonstrate it, we create
# a new 0-1 variable indicating whether a chick is heavier than
# 170 grams (1) or not (0), and append it to the data set.
#
ChickWeight <- cbind(ChickWeight, chubby = ChickWeight$weight > 170);
#
# Next, we run a logit model to see if age and diet predict whether
# a chick is chubby.
#
model3 <- glm(chubby ~ Time + Diet, data = ChickWeight, family = "binomial");
summary(model3);
# All terms except Diet2 seem significant (suggest that diets 1 and
# 2 may have the same tendency to create chubby chicks, while diets
# 3 and 4 are more inclined to do so, since their coefficients are
# positive).
#
# Now we add interactions.
#
model4 <- glm(chubby ~ Time*Diet, data = ChickWeight, family = "binomial");
summary(model4);
#
# Main effects of diet are not longer significant, but somewhat oddly
# the interaction of time with diet 4 is.
#
# We use a chi-square test to test for overall model significance,
# analogous to the F test for a linear regression. The catch is
# that R does not provide a significance value in the summary output
# for the glm method. We can compute a p-value manually, as follows.
#
with(model3, pchisq(null.deviance - deviance, df.null - df.residual, lower.tail = FALSE));
#
# The p-value is essentially zero, so we reject the null and conclude
# that model 3 is better than nothing.
#
# Manual computation may be faster on large data sets (the deviances
# have already been calculated), but it is arguably easier (at least
# on the user's memory) to generate the null model (as before) and then
# run an ANOVA to compare the two models.
#
null.model <- glm(chubby ~ 1, data = ChickWeight, family = "binomial");
anova(null.model, model3, test = "Chisq");
#
# The chi-square test has a bit less precision (p < 2.2e-16 rather than
# p = 7e-69), but that precision is probably spurious (the weight data
# is not accruate to 69 decimal places). We still reject the null
# hypothesis that the null model is at least as good as model3 at
# predicting chubbiness. To test whether the model with interaction terms is
# better than the model without them, we can again run an ANOVA.
#
anova(model3, model4, test = "Chisq");
#
# The p-value (0.03568) suggests that we should again reject the
# null hypothesis (that first model is at least as good as the
# second model) and conclude that inclusion of the interaction terms
# improves prediction (although the significant interaction with
# no significant main effect for diet is rather suspicious).
#
# One other way to test significance of a logit model is to run
# an ANOVA with the model as the sole argument.
#
anova(model3, test = "Chisq");
#
# R adds terms to the model sequentially and shows the significance
# of each change (using a chi square test). If any of the terms
# are significant, the model is better than the null model. In
# the output of the previous command, the Time variable is very
# significant (p < 2.2e-16), so the null hypothesis that our model
# is no better than the null model is rejected even before we see
# the results for adding Diet (again significant).