One debate that often arises amongst my Six Sigma cohorts is when to use the standard deviation of a dataset and when we should use another measure of dispersion, namely the range.
Descriptive Statistics Overview
Let’s take a quick review from our descriptive statistics class. When we are looking at the dispersion or spread of a data set there are three primary methods at our disposal.
If you work for GE we can add a fourth – span. But let’s leave that for a future blog.
Am I Normal?
So the debate on when it is OK to use standard deviation versus the range deals with something called “normality.” And my, oh my, can statisticians get their under garments in an uproar over this!
The standard normal distribution has a mean of zero and a variance of one. When you look at a normally distributed data set on a graph it resembles that of a bell, thus the term bell curve.
However, sometimes data will not follow a normal distribution. Instead, the distribution may be skewed positively or negatively.
A good example here is cycle time with a natural boundary of 0. You can’t have less than 0 seconds, days, months, etc. which makes the whole bell curve tough to create! Thus, cycle time is often non-normal in nature. Below is how a graph of cycle time may look.
Others will say that if your data are not normal you should not use standard deviation, instead you should use the range.
The rationale for using the range is that for non-normal distributions like cycle time or things like home prices using the standard deviation may be misleading.
It’s all about Central Tendency
The reason it could be misleading, they say, has to do with the “measure of central tendency” employed. We can either use the mean, median, or mode to describe the central tendency of data. For normally distributed data we generally use the mean.
However, as an example, in the case of home prices (non normal data) a millionaire’s home may skew the mean away from the general population making it a bit misleading. So, in these types of non-normal situations we generally use the median for the measure of central tendency and not the mean.
This is significant for the simple reason that in order to calculate standard deviation we depend on the mean (it is part of the formula). So if you cannot trust the mean how can you trust standard deviation?
What to Do?
So what is a well balanced Six Sigma practitioner to do?
Personally, I usually state both the standard deviation and range when my data are not normal. After all, we are talking about variation and I know I want to kill it no matter what it is called!
I have seen people go to blows over this topic… no kidding. So, if you have a hot sports opinion on this topic please do share!