Six Sigma

Descriptive Statistics – Part 1

By Ron Pereira Updated on January 13th, 2011

Tonight we are starting a two part series on descriptive statistics.

Yeah, I know, many of you are likely having nasty flash backs of some professor with bad breath but by the end of this series I hope to make all those bad thoughts a distant memory.

Overview

In general terms descriptive statistics do just what the name implies… they describe things! Specifically, we do two things with descriptive statistics: 1) we study central tendency and 2) we study spread or dispersion.

The Mean

The most common measure of central tendency is the mean or average. To calculate the mean we simply add up the data points and divide by the number of data points.

An important point, that many miss, is that we should only use the mean when our data are from a normal distribution. If, for example, our data are from an exponential distribution (e.g. most “time” related metrics) we should not use the mean to describe its central tendency. Instead, we should use the median.

The Median

Let’s say we have the following data set: 1, 9, 11, 22, 9, 1, 6, 1, 2 and desire to know the median. Here is how we do it.

Sort the data in descending order: 22, 11, 9, 9, 6, 2, 1, 1, 1
Find the middle of the data set which will be the median: The integer 6 is in the middle and is our median.

If there are an even number of data points you simply average the two numbers that comprise the center since there will not be one single number dead in the middle.

We generally use the median when dealing with skewed distributions. The most common example is home prices.

Since there are always a few “expensive” homes in the neighborhood the distribution is often skewed. This makes using the median more appropriate since these “outliers” are not able to skew our result. So if you ever have a realtor telling you about the “average” home prices be on guard!

The Mode

The least used measure of central tendency is the mode. The mode is simply the most frequent value in a data set. Using the example above {22, 11, 9, 9, 6, 2, 1, 1, 1} the mode would be 1 since it occurs more times (3) than any other number.

MS Excel Tips

We can use Excel to figure these things out for us. Here is how we do it.

For the mean use the following: =average(data set)
For the median use the following: =median(data set)
For the mode use the following: =mode(data set)

That’s pretty much all there is to it folks. Tomorrow night we will discuss the 3 primary measures of dispersion so please subscribe to this blog via RSS feed lest you miss all the fun!

Ron Pereira

Ron Pereira is a co-founder and the Managing Director of Gemba Academy and has more than 30 years of experience helping organizations improve performance through Lean, Six Sigma, and continuous improvement. Prior to starting Gemba Academy, Ron served in a variety of manufacturing, supply chain, and leadership roles, including process engineer, engineering manager, Master Black Belt, and director of manufacturing and continuous improvement. Today, Ron works with leaders around the world to develop problem-solving capabilities, strengthen leadership, and build cultures where continuous improvement thrives. Through his writing, podcast interviews, and educational programs, he shares practical insights that help organizations achieve lasting operational excellence.

Descriptive Statistics – Part 1

Ron Pereira

Have something to say? Cancel reply

Recent Articles

Leadership Is Human

GA 640 | Optimistic Fear with Cornelia Choe

Your Response Is Always a Choice

The Keys to Developing Lean Leadership

GA 639 | Putting the Customer First with Slaytor Baldwin

Start your improvement training today.