Next up I want to discuss something called the least squares method and residuals. I will wrap it all up with a short discussion on the differences between correlation, causation, and extrapolation. Yikes, this sounds serious.
Least Squares Method
Our regression equation used to predict things is determined by a procedure known as the method of least squares. There is some math involved to sort this all out but the basic idea is simple. All we are doing is plotting the actual data points and drawing a line down the middle of them. This line is called the “best fit” line as it tries to minimize the distance of all the points to the best fit line (actually it is the total squared vertical distance for the statistics nerds out there).
So basically, we plot the actual data points and fit a line down the middle of them. That is the least squares method and I didn’t even need an entire book!
I mentioned how the lack of a flip chart was slowing me down last night. Well I am trying out my scanner and while it is not the best it is better than nothing. As my nice little picture (compliments are very welcome by the way… hee hee) demonstrates, a residual is simply the distance between the actual data point and the predicted data point (also called the “fit”). Put another way, the residual is the leftover variation in Y after using X to predict it.
We like to look at our residuals when doing regression as it can help us spot any issues with data collection, variation issues, operator error, etc. There are a few assumptions we make with residuals, namely:
- They are not related to the inputs
- They don’t change over time – they are consistent
- They are normal (bell shaped)
A nice Black Belt can help you ensure these assumptions are in check. If they are not in check you need to proceed with much caution (i.e. don’t try to predict anything).
Correlation, Causation, and Extrapolation
Typing those three words made me cringe. They sound so serious. Well don’t sweat it I will do my best to bring it down to earth for us normal people. Yes, I am normal. I swear. I am!
Correlation means that two things seem to be varying in a similar manner. If raising the temperature on our injection molding machine seems to be impacting the weight of the part we may say there is correlation.
Taking it one step further, causation means that when we change one variable the other variable in question changes too. So, in our injection molding example we may be able to prove causation by predicting what our Y will be given a specific X and then testing the theory! The 11th commandment of Six Sigma is “Thou Shall Confirm.”
Finally, the term extrapolation means that we attempt to predict Y outside the range of what was tested. So if you only tested up to 500 degrees with your injection molding machine you should not try to predict what will happen at 600 degrees. We have no data and do not know if there is a linear relationship.
Well that about sums things up for our regression discussion. I hope you found it useful. As with anything, the best way to learn something is to give it a shot! So go collect some variable data and fit a line through it. Until next time, I wish you all the best on your journey towards continuous improvement.
Subscribe to LSS Academy