Last night we began our discussion on regression. Tonight, I want to talk about a few more things related to this topic. As I was thinking of what to write I realized how difficult it is to explain these topics without a flip chart! The flip chart is the best teaching device ever created, but I digress.
When dealing with simple linear regression there are a few key points to note once you have collected and analyzed your data:
- What is the R-Sq (adj) value?
- What is the P value?
- How do the residuals look?
Remember, we are assuming the measurement system is repeatable and reproducible. This can be a big assumption if a proper measurement systems analysis has not been completed – but this is a topic for another evening.
What is the R-Sq (adj) value?
Now then, what is this R-Sq (adj) number all about? First of all, it stands for R-Squared (adjusted). There are entire text books written on regression that offer far more sophisticated explanations then the one I am about to give. But here goes.
The R-Sq (adj) value tells us how much variation is explained by the model. So the higher the R-Sq (adj) value the better off we are as it tells us we have captured a lot of the variation. The (adjusted) portion of R-Sq takes into account the number of inputs the model has. In many cases, R-Sq and R-Sq (adj) will be close to one another. But to be safe, always use R-Sq (adj).
The question is often asked, how high should R-Sq (adj) be? Well typically we like to see values over 80% but you have to be careful. Sometimes sample size issues can bite you! Keith Bower wrote a great article related to this very topic here.
What is the P value?
Remember our friend the P value from our hypothesis testing discussion? Well, he’s back! When we do regression we also state a null (Ho) and alternative (Ha) hypothesis. With regression it goes something like this:
Ho: Slope of the line is 0
Ha: Slope of the line is not 0
So, if we chose an alpha value (how much risk we are willing to take) of 0.05 and our regression model spits out a P value of 0.01 we can use the saying “P is low, so Ho must go!” This means we REJECT the null hypothesis and state that the slope of the line is not 0 at the specified degree of risk.
Using R-Sq (adj) and P value Together
OK, so now we have two things to look at: R-Sq (adj) and a P value. Whippee. Stay with me here… it’s about to get fun! There are four quadrants our results can fall into (this is where my flip chart comes in handy).
- High R-Sq (adj) and Low P value: This means the variation is explained and the model is statistically significant. You are a genius (assuming you chose a nice sample size of course)!
- High R-Sq (adj) and High P value: This means the variation is explained but is not statistically significant. Recommendation is to get more data as sample size issues may be biting you here.
- Low R-Sq (adj) and Low P value: This means the variation is partially explained and is statistically significant. There may be other X’s impacting the model here. Try to find them. You are almost a genius but not quite yet!
- Low R-Sq (adj) and High P value: This means the variation is not explained and not statistically significant. Rats! Keep going and get more data. Also, check for non-linear relationships in the model. A nice Black Belt can help you here.
Tomorrow night I will discuss residuals a bit as well as anything else that pops in my mind before then!
Subscribe to LSS Academy