tag:blogger.com,1999:blog-8781383461061929571.post749220585996734151..comments2024-03-14T09:08:19.035-04:00Comments on OR in an OB World: Updated Stepwise Regression FunctionPaul A. Rubinhttp://www.blogger.com/profile/05801891157261357482noreply@blogger.comBlogger16125tag:blogger.com,1999:blog-8781383461061929571.post-81235869986615993442018-11-12T04:18:23.387-05:002018-11-12T04:18:23.387-05:00OK, thank you!OK, thank you!Adaszhttps://www.blogger.com/profile/17705078212744425021noreply@blogger.comtag:blogger.com,1999:blog-8781383461061929571.post-5768286298425792992018-11-09T16:26:31.215-05:002018-11-09T16:26:31.215-05:00You had the right idea, but the previous code migh...You had the right idea, but the previous code might not have worked as expected with omitted alpha values. I've just updated the code again. You can now skip alpha.to.enter for backward regression, or skip alpha.to.leave for forward regression. Keep an eye out for a short post with the latest updates.Paul A. Rubinhttps://www.blogger.com/profile/05801891157261357482noreply@blogger.comtag:blogger.com,1999:blog-8781383461061929571.post-5716201448663323482018-11-08T08:41:00.230-05:002018-11-08T08:41:00.230-05:00Hi Paul,
this post at Cross Validated mentions th...Hi Paul,<br /><br />this post at Cross Validated mentions that it is possible to use your function not only as stepwise, but as a backwards and forwards too. Can I ask you how?<br /><br />If I wish to do backward elimination, I guess I should only define the argument full.model and alpha.to.leave? Analogically, for a forward selection, should I start with an initial.model and only enter alpha.to.enter?<br /><br />Thank you!Adaszhttps://www.blogger.com/profile/17705078212744425021noreply@blogger.comtag:blogger.com,1999:blog-8781383461061929571.post-72919926146806750562018-10-30T11:46:29.228-04:002018-10-30T11:46:29.228-04:00I'm not sure I was entirely clear in my previo...I'm not sure I was entirely clear in my previous response. The code currently uses p-values for the F test of each coefficient to pick which variable gets in (or out). Those p-values are identical to what you would get for the two-sided t-tests (what you normally see in regression output from software), so it's basically already doing what you had in mind. (One-sided t-tests are possible, but not commonly used in this context.)Paul A. Rubinhttps://www.blogger.com/profile/05801891157261357482noreply@blogger.comtag:blogger.com,1999:blog-8781383461061929571.post-84507990819986328102018-10-29T18:50:36.868-04:002018-10-29T18:50:36.868-04:00No, that's not it. The first two arguments to ...No, that's not it. The first two arguments to the add1() function are the current model and the full model. The full model is used only as a source of variables to consider. The add1() function takes each variable in full but not in current and evaluates what happens when that variable is added to current. It does not use p-values or t-statistics from the full model.<br /><br />Two questions come to mind. First, what are your choices for alpha-to-enter and alpha-to-leave. Second, what does the code say the p-values are for the "yo-yo" variable (both when it is chosen to enter and when it is chosen to leave)? Those are not currently printed, so you'll need to hack the code to print them (pmin and pmax respectively).Paul A. Rubinhttps://www.blogger.com/profile/05801891157261357482noreply@blogger.comtag:blogger.com,1999:blog-8781383461061929571.post-76203335357192894872018-10-29T18:28:12.315-04:002018-10-29T18:28:12.315-04:00Yes, that's exactly what is happening. I start...Yes, that's exactly what is happening. I start with "output" ~ 1, and it adds about five variables, then when it adds the sixth, it drops it, then adds it again, then drops it again, and continues to do so until I interrupt it.<br /><br />I looked pretty carefully at the code and I think I know why. When it attempts to add, it looks into the full model results. When it attempts to drop, it looks at the current model results. Brent Danielnoreply@blogger.comtag:blogger.com,1999:blog-8781383461061929571.post-27956051722659791002018-10-29T16:25:17.304-04:002018-10-29T16:25:17.304-04:00Are you saying that a variable enters, exits immed...Are you saying that a variable enters, exits immediately (without any intervening changes to the model), enters again etc.? That should be impossible. If it happened, I would start to wonder whether there was some numerical instability in either the regression fitting or the calculation of the p-values. On the other hand, if it's something like "x enters, y enters, x leaves, z enters, y leaves, x enters, ...", I think that may be possible with annoying correlated predictors ... maybe. I don't recall ever seeing it happen, though. As far as blocking the variable that just left, that won't help in the second case, and might turn the first case into the second case. Hard to say.Paul A. Rubinhttps://www.blogger.com/profile/05801891157261357482noreply@blogger.comtag:blogger.com,1999:blog-8781383461061929571.post-41529921285341765122018-10-29T15:32:40.838-04:002018-10-29T15:32:40.838-04:00Ok, thank you very much for your reply!Ok, thank you very much for your reply!Anonymoushttps://www.blogger.com/profile/04062121076027483953noreply@blogger.comtag:blogger.com,1999:blog-8781383461061929571.post-9261405557332225942018-10-29T11:52:51.284-04:002018-10-29T11:52:51.284-04:00Fantastic! My manager is more familiar with SPSS t...Fantastic! My manager is more familiar with SPSS than R and wants to see it done like this. My only issue is that, despite setting alpha to enter less than alpha to exit, my dataset keeps looping with the same variable entering then exiting over and over. I pulled the first variable from the data set that triggered that, and then it started doing that to another variable.<br /><br />I'm thinking about adding some code that stores the previously dropped variable to make sure we don't add it right back in, but I wanted to check in here, first, before messing with it too much.<br /><br />Have you seen that happen? Brent Danielnoreply@blogger.comtag:blogger.com,1999:blog-8781383461061929571.post-19927348114506975002018-10-28T14:07:47.052-04:002018-10-28T14:07:47.052-04:00Does the code currently allow that? No. Could the ...Does the code currently allow that? No. Could the code be modified to use the p-value? Yes (assuming you are talking about the two-sided p-value, which is what is generated in the summary of lm models in R, and in most other software). Would it make a difference? No. Using two-sided p-values would be equivalent to using alpha-to-enter/alpha-to-leave with a two-sided t-test of each coefficient. If my memory serves me correctly (iffy these days), the two-sided t-test and the F-test are algebraically equivalent (the F statistic is the square of the t statistic), so decisions on which terms were significant (and thus which variables to add or drop) would be identical to what the current code gives.Paul A. Rubinhttps://www.blogger.com/profile/05801891157261357482noreply@blogger.comtag:blogger.com,1999:blog-8781383461061929571.post-61916280201006003012018-10-27T19:23:12.059-04:002018-10-27T19:23:12.059-04:00Hello, is it possible to use as criteria to add or...Hello, is it possible to use as criteria to add or drop variables the P-Value instead of the F-test?Anonymoushttps://www.blogger.com/profile/04062121076027483953noreply@blogger.comtag:blogger.com,1999:blog-8781383461061929571.post-27233962139778436642018-04-22T19:45:10.768-04:002018-04-22T19:45:10.768-04:00No. The "data = NULL" in the existing co...No. The "data = NULL" in the existing code declares a data argument and gives it the default value NULL. If you invoke the function without specifying the data argument, it expects the model variables to be globally defined. If you pass in a data frame in the data argument, it looks for the model variables in that data frame. This is exactly how the lm() function works.Paul A. Rubinhttps://www.blogger.com/profile/05801891157261357482noreply@blogger.comtag:blogger.com,1999:blog-8781383461061929571.post-32787721326697110112018-04-22T19:40:19.229-04:002018-04-22T19:40:19.229-04:00Do you need to modify the function at all? Do you ...Do you need to modify the function at all? Do you remove the data = NULL?Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-8781383461061929571.post-85040773050896212602018-04-20T16:13:51.367-04:002018-04-20T16:13:51.367-04:00You just use the optional "data" argumen...You just use the optional "data" argument to specify the data frame. So if your data frame is in the variable "whatever", you just call stepwise(..., data = whatever).Paul A. Rubinhttps://www.blogger.com/profile/05801891157261357482noreply@blogger.comtag:blogger.com,1999:blog-8781383461061929571.post-12263828661383637212018-04-20T13:13:43.200-04:002018-04-20T13:13:43.200-04:00I am trouble integrating my own data frame. How do...I am trouble integrating my own data frame. How do I modify the function arguments with a data from generated within the script?<br /><br />Thank you!Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-8781383461061929571.post-33999592761086247782017-11-06T12:10:01.144-05:002017-11-06T12:10:01.144-05:00Thanks for the updated function. It helped me in m...Thanks for the updated function. It helped me in my project. Anonymousnoreply@blogger.com