Descriptive Statistics – Part 1

Tonight we are starting a two part series on descriptive statistics. 

Yeah, I know, many of you are likely having nasty flash backs of some professor with bad breath but by the end of this series I hope to make all those bad thoughts a distant memory.

Overview

In general terms descriptive statistics do just what the name implies… they describe things!  Specifically, we do two things with descriptive statistics: 1) we study central tendency and 2) we study spread or dispersion. 

The Mean

The most common measure of central tendency is the mean or average.  To calculate the mean we simply add up the data points and divide by the number of data points. 

An important point, that many miss, is that we should only use the mean when our data are from a normal distribution.  If, for example, our data are from an exponential distribution (e.g. most “time” related metrics) we should not use the mean to describe its central tendency.  Instead, we should use the median.

The Median

Let’s say we have the following data set: 1, 9, 11, 22, 9, 1, 6, 1, 2 and desire to know the median. Here is how we do it.

  1. Sort the data in descending order: 22, 11, 9, 9, 6, 2, 1, 1, 1
  2. Find the middle of the data set which will be the median: The integer 6 is in the middle and is our median.

If there are an even number of data points you simply average the two numbers that comprise the center since there will not be one single number dead in the middle.

We generally use the median when dealing with skewed distributions.  The most common example is home prices. 

Since there are always a few “expensive” homes in the neighborhood the distribution is often skewed.  This makes using the median more appropriate since these “outliers” are not able to skew our result.  So if you ever have a realtor telling you about the “average” home prices be on guard!

The Mode

The least used measure of central tendency is the mode.  The mode is simply the most frequent value in a data set.  Using the example above {22, 11, 9, 9, 6, 2, 1, 1, 1} the mode would be 1 since it occurs more times (3) than any other number. 

MS Excel Tips

We can use Excel to figure these things out for us.  Here is how we do it.

  1. For the mean use the following: =average(data set)
  2. For the median use the following: =median(data set)
  3. For the mode use the following: =mode(data set)

That’s pretty much all there is to it folks.  Tomorrow night we will discuss the 3 primary measures of dispersion so please subscribe to this blog via RSS feed lest you miss all the fun!