Friday, November 9, 2018

Stepwise Regression Code Revisited

I've added a few more tweaks to the stepwise regression code I published back in 2011. (If you wish, see here for the original post and here for a subsequent update.) The code does stepwise regression using F tests (or, equivalently, p-values of coefficients), which is a bit old fashioned but apparently how it is still taught some places. The latest update supplies default values for the alpha-to-enter and alpha-to-leave values. The default values (0 and 1 respectively) are consistent consistent with forward and backward stepwise. For forward stepwise, you would start with a bare-bones initial model, set your alpha-to-enter, and omit alpha-to-leave. For backward stepwise, you would start with a large initial model, set alpha-to-leave and omit alpha-to-enter. Both are demonstrated in the notebook.

The update also allows you to use the R shortcut of typing "." in a formula (meaning "all variables except the dependent variable"). The "." shortcut only works if you specify the data source as an argument to the function. You cannot use "." while omitting the data argument and relying on having the data source attached. Again, there are demonstrations in the notebook.

The code is free to use under a Creative Commons license. It comes in the form of an R notebook, which both defines the stepwise() function and does some demonstrations. From that web page, you should be able to download the notebook file using the select control labeled "Code" in the upper right corner. You can also get the files from my Git repository. The Git repository also has an issue tracker, although I think you will need to create an account in order to add an issue.


  1. Thanks for the update Paul.

    Question - Does it work with other type of models than lm? Thank you.

    1. As written, no; the function uses lm to fit models. Hypothetically, you could change lm to glm inside the function to fit generalized linear models. The catch is that the add1 and drop1 functions use a deviance F-test for glm models, so I'm not sure it would truly conform to the original intent (picking variables based on p-values). Still, the deviance test is an F-test, so maybe this would be closer to what you want than what leaps::regsubsets does.


Due to intermittent spamming, comments are being moderated. If this is your first time commenting on the blog, please read the Ground Rules for Comments. In particular, if you want to ask an operations research-related question not relevant to this post, consider asking it on Operations Research Stack Exchange.