## Sunday, December 1, 2013

### Testing Regression Significance in R

I've come to like R quite a bit for statistical computing, even though as a language it can be rather quirky. (Case in point: The anova() function compares two or more models using analysis of variance; if you want to fit an ANOVA model, you need to use the aov() function.) I don't use it that often, though, which is a mixed blessing. The bad new is that infrequent use makes it hard for me to remember everything I've learned about the language. The good news is that infrequent use means I'm not having to do statistical analysis very often.

I don't think I'm alone in believing that consistent coding patterns (paradigms, idioms, whatever you want to call them) are very helpful when using a language infrequently. That motivates today's post, on testing significance of a regression model. By model significance, I mean (in somewhat loose terms) testing
H0: the null model (no predictors other than a constant term) fits the data at least as well as our model
versus
H1: our model fits the data better than the null model.
When performing a standard linear regression, the usual test of model significance is an F-test. As with most (all?) statistics packages, R helpfully prints out the p-value for this test in the summary output of the regression, so you can see whether your model is (literally) better than nothing without any extra work. To test whether a second model (call it model2) improves on model 1 significantly, you use the anova() command:
anova(model1, model2)

which is easy enough to remember.

When performing a generalized linear regression, however, R does not automatically give you a model significance test. I'll focus here on a binary logit model (dependent variable binary), but I'm pretty sure the various approaches apply to other uses of the GLM, perhaps with some tweaks.

Let's say that model1 is a binary logistic regression model I've fitted in R. The most common test for significance of a binary logistic model is a chi-square test, based on the change in deviance when you add your predictors to the null model. R will automatically calculate the deviance for both your model and the null model when you run the glm() command to fit the model. The approach to testing significance that I've seen on a number of web pages, including this one, involves calculating the p-value manually, using some variation of the following syntax:
with(model1, pchisq(null.deviance - deviance, df.null - df.residual, lower.tail = FALSE))

That's fine (nice and compact), but prospects of my remembering it are slender at best. Fortunately, we can use the aforementioned anova() command by manually fitting the null model. First rerun the logistic regression using just a constant term. Call the resulting fit null.model. Now compare null.model to model1 using the anova() command, adding an argument to tell R that you want a chi-square test rather than the default F test:
anova(null.model, model1, test = "Chisquare")
You can also use the same syntax to compare two fitted logistic models for the same data, say where model2 adds some predictors to model1. For me, that's a lot easier to remember than the manual approach.

Here's some heavily annotated code (or you can download it), if you want to see an example:

#
# Linear and logit regression examples.
#
# (c) 2013 Paul A. Rubin
#
library(datasets);
#
# To demonstrate linear regression, we use the ChickWeight data set.
# Dependent variable:
#   weight = the weight (grams) of a chick.
# Predictors:
#   Time = age of the chick (days)
#   Diet = factor indicating which of four diets was fed to the chick
# Not used:
#   Chick = subject identifier
#
# First model: regress weight on just age.
#
model1 <- lm(weight ~ Time, data = ChickWeight);
summary(model1);
#
# Shocking discovery: weight increases with age!
#
# Second model: regress weight on both age and diet.
#
model2 <- lm(weight ~ Time + Diet, data = ChickWeight);
summary(model2);
#
# Diet is also significant, and diets 2-4 all apparently have
# different effect on weight gain from diet 1.
#
# Is model 2 better than the "null" model (constant term only)?
# The summary output includes the approximate p-value (< 2.2e-16,
# so essentially zero) for an F-test comparing model 2 to the null
# model. We can get the same information as follows:
#
null.model <- lm(weight ~ 1, data = ChickWeight); # actual null model
summary(null.model);
anova(null.model, model2); # compare model 2 to null model
#
# Is model 2 significantly better than model 1?
#
anova(model1, model2); # yes (p < 2.2e-16 again)
#
# We now switch to logit regression. To demonstrate it, we create
# a new 0-1 variable indicating whether a chick is heavier than
# 170 grams (1) or not (0), and append it to the data set.
#
ChickWeight <- cbind(ChickWeight, chubby = ChickWeight\$weight > 170);
#
# Next, we run a logit model to see if age and diet predict whether
# a chick is chubby.
#
model3 <- glm(chubby ~ Time + Diet, data = ChickWeight, family = "binomial");
summary(model3);
# All terms except Diet2 seem significant (suggest that diets 1 and
# 2 may have the same tendency to create chubby chicks, while diets
# 3 and 4 are more inclined to do so, since their coefficients are
# positive).
#
#
model4 <- glm(chubby ~ Time*Diet, data = ChickWeight, family = "binomial");
summary(model4);
#
# Main effects of diet are not longer significant, but somewhat oddly
# the interaction of time with diet 4 is.
#
# We use a chi-square test to test for overall model significance,
# analogous to the F test for a linear regression. The catch is
# that R does not provide a significance value in the summary output
# for the glm method. We can compute a p-value manually, as follows.
#
with(model3, pchisq(null.deviance - deviance, df.null - df.residual, lower.tail = FALSE));
#
# The p-value is essentially zero, so we reject the null and conclude
# that model 3 is better than nothing.
#
# Manual computation may be faster on large data sets (the deviances
# have already been calculated), but it is arguably easier (at least
# on the user's memory) to generate the null model (as before) and then
# run an ANOVA to compare the two models.
#
null.model <- glm(chubby ~ 1, data = ChickWeight, family = "binomial");
anova(null.model, model3, test = "Chisq");
#
# The chi-square test has a bit less precision (p < 2.2e-16 rather than
# p = 7e-69), but that precision is probably spurious (the weight data
# is not accruate to 69 decimal places). We still reject the null
# hypothesis that the null model is at least as good as model3 at
# predicting chubbiness. To test whether the model with interaction terms is
# better than the model without them, we can again run an ANOVA.
#
anova(model3, model4, test = "Chisq");
#
# The p-value (0.03568) suggests that we should again reject the
# null hypothesis (that first model is at least as good as the
# second model) and conclude that inclusion of the interaction terms
# improves prediction (although the significant interaction with
# no significant main effect for diet is rather suspicious).
#
# One other way to test significance of a logit model is to run
# an ANOVA with the model as the sole argument.
#
anova(model3, test = "Chisq");
#
# R adds terms to the model sequentially and shows the significance
# of each change (using a chi square test). If any of the terms
# are significant, the model is better than the null model. In
# the output of the previous command, the Time variable is very
# significant (p < 2.2e-16), so the null hypothesis that our model
# is no better than the null model is rejected even before we see
# the results for adding Diet (again significant).
Created by Pretty R at inside-R.org