If you hate statistics this post is for you. Why? Because it’s my intention to have you understand AND be in position to teach others one of the more complicated and misunderstood statistical concepts of our time – the central limit theorem (CLT) – by the end of this article. If you are up for the challenge read on.
Central Limit Theorem – What is it?
To start things off, here’s an official CLT definition.
The central limit theorem (CLT) states that the means of random samples drawn from any distribution with mean m and variance s2 will have an approximately normal distribution with a mean equal to m and a variance equal to s2 / n.
For those new to statistics… this definition may seem a bit intimidating. Fear not.
You see, all this confusing definition is really saying is that as n, or our sample size, increases just about any distribution (normal or non normal) will tend to behave normally.
How can it be?
The key to this theorem is the whole s2/ n part of the formula. As n, sample size, increases we see s2, the variance, decrease. And less variance means a tighter, more normal, distribution.
Prove it to me
Remember, I told you that you will be able to teach others about this concept. So here are some teacher’s notes. Around this time in your explanation the student or students may want to give up. That’s good. You have them right where you want as you are slowly setting the hook. Once they bite, and they will, you will then reel them with ease. Let’s press on.
Time to Simulate
I came across this sweet little Java Applet tool that allows you to perfectly demonstrate the CLT. This tool was developed by some folks at Seton Hall and, from what I can tell, is free for anyone to use.
Fun with the Weibull Distribution
So here is the situation. Let’s assume we have a process that exhibits a Weibull distribution which would fail to pass as “normal” data as it is skewed to the right. This means we can not use any so called “parametric” hypothesis tests. Often times we see a Weibull distribution with reliability/ failure analysis data.
Let’s now pretend we send someone out on 50 different occasions (trials) to collect data from this process we know to exhibit a Weibull distribution. Let’s also assume we tell this person to only collect “1” data point per trip/trial. After the 50th trial we take the 50 data points and study its shape (Top Figure).
The blue bars are our data and the yellow outline is what a typical Weibull distribution looks like. As you can see our distribution looks pretty Weibull-ish.
Now then, we then tell the person to go back 50 more times. Only this time we ask them to collect 5 samples instead of 1 sample each trip out. We will take the 5 data points from each trial and average them together. After the 50th trial we take the 50 data points (remember each data point is an average of 5 numbers) and study its shape (Middle/Right Figure).
Notice how the distribution is beginning to behave a bit more normal like although still maintaining a little Weibullishness? Yes, that’s a word… at least in my dictionary.
Ok, we now attempt to really push our luck as we ask the person to go back out one last time. This time we ask them to collect 25 samples during each trial. We also buy them lunch at this point as we are beginning to whip them pretty good!
So, they go back out and collect 25 samples per trial. Again, we take these 25 samples and average them like in the second trial when we averaged 5 data points. After the 50th trial we take the 50 data points and study its shape (Bottom Figure).
Now we clearly see a normal, bell shaped, distribution beginning to appear. And all we did was increase the sample size, n, from 1 to 5 and finally to 25.
When you teach people this it’s at this point where you reel them in. Also, turn the simulation tool over to them so they can play around with it. Tell them to prove the theorem wrong if they can. No worries, they can’t.
Break out the dice
Another fun way to demonstrate the CLT is with fair dice. Simply have someone roll 1 die 50 times noting their results after each roll. When they graph this the distribution will be very flat. Then give them 2 die and have them roll them both at the same time 50 times (averaging the results each run). Finally, give them 5 die and repeat. You will see the distribution become more and more normal as the sample size, n, increases.
If you enjoyed this post please subscribe to this blog via RSS feed.
Update: In case you don’t read the comments Rob, from LearnSigma, shared a link to the coolest dice game Applet. So check it out. Thanks Rob.