OR in an OB World: Updated Stepwise Regression Function

Monday, August 21, 2017

Updated Stepwise Regression Function

Back in 2011, when I was still teaching, I cobbled together some R code to demonstrate stepwise regression using F-tests for variable significance. It was a bit unrefined, not intended for production work, and a few recent comments on that post raised some issues with it. So I've worked up a new and (slightly) improved version of it.

The new version is provided in an R notebook that contains both the stepwise function itself and some demonstration code using it. It does not require an R libraries besides the "base" and "stats" packages. There is at least one butt-ugly hack in it that would keep me from being hired in any sort of programming job, but so far it has passed all the tests I've thrown at it. If you run into issues with it, feel free to use the comment section below to let me know. I'm no longer teaching, though, so be warned that maintenance on this is not my highest priority.

The updated function has a few new features:

it returns the final model (as an lm object), which I didn't bother to do in the earlier version;
you can specify the initial and full models as either formulas (y~x+z) or strings ("y~x+z"), i.e., quotes are strictly optional; and
as with the lm function, it has an optional data = ... argument that allows you to specify a data frame.

There are also a few bug fixes:

if you set the alpha-to-enter greater than the alpha-to-leave, which could throw the function into an indefinite loop, the function will now crab at you and return NA;
if you try to fit a model with more parameters than you have observations, the function will now crab at you and return NA; and
the function no longer gets confused (I think) if you happen to pick variable/column names that happen to clash with variable names used inside the function.

As always, the code is provided with a Creative Commons license, as-is, no warranty express or implied, your mileage may vary.

Update (11/09/18): I've tweaked the code to add a few features. See here for a post about the updates.

16 comments:

AnonymousNovember 6, 2017 at 12:10 PM
Thanks for the updated function. It helped me in my project.
ReplyDelete
Replies
AnonymousApril 20, 2018 at 1:13 PM
I am trouble integrating my own data frame. How do I modify the function arguments with a data from generated within the script?

Thank you!
ReplyDelete
Replies
UnknownOctober 27, 2018 at 7:23 PM
Hello, is it possible to use as criteria to add or drop variables the P-Value instead of the F-test?
ReplyDelete
Replies
Brent DanielOctober 29, 2018 at 11:52 AM
Fantastic! My manager is more familiar with SPSS than R and wants to see it done like this. My only issue is that, despite setting alpha to enter less than alpha to exit, my dataset keeps looping with the same variable entering then exiting over and over. I pulled the first variable from the data set that triggered that, and then it started doing that to another variable.

I'm thinking about adding some code that stores the previously dropped variable to make sure we don't add it right back in, but I wanted to check in here, first, before messing with it too much.

Have you seen that happen?
ReplyDelete
Replies
AdaszNovember 8, 2018 at 8:41 AM
Hi Paul,

this post at Cross Validated mentions that it is possible to use your function not only as stepwise, but as a backwards and forwards too. Can I ask you how?

If I wish to do backward elimination, I guess I should only define the argument full.model and alpha.to.leave? Analogically, for a forward selection, should I start with an initial.model and only enter alpha.to.enter?

Thank you!
ReplyDelete
Replies

Add comment

Due to intermittent spamming, comments are being moderated. If this is your first time commenting on the blog, please read the Ground Rules for Comments. In particular, if you want to ask an operations research-related question not relevant to this post, consider asking it on Operations Research Stack Exchange.

OR in an OB World

Monday, August 21, 2017

Updated Stepwise Regression Function

16 comments:

Previous Posts

Labels