Explaining the Central Limit Theorem

If you hate statistics this post is for you. Why? Because it’s my intention to have you understand AND be in position to teach others one of the more complicated and misunderstood statistical concepts of our time – the central limit theorem (CLT) – by the end of this article. If you are up for the challenge read on.

Central Limit Theorem – What is it?

For those new to statistics… this definition may seem a bit intimidating. Fear not.

You see, all this confusing definition is really saying is that as n, or our sample size, increases just about any distribution (normal or non normal) will tend to behave normally.

How can it be?

The key to this theorem is the whole s²/ n part of the formula. As n, sample size, increases we see s², the variance, decrease. And less variance means a tighter, more normal, distribution.

Prove it to me

Remember, I told you that you will be able to teach others about this concept. So here are some teacher’s notes. Around this time in your explanation the student or students may want to give up. That’s good. You have them right where you want as you are slowly setting the hook. Once they bite, and they will, you will then reel them with ease. Let’s press on.

Time to Simulate

I came across this sweet little Java Applet tool that allows you to perfectly demonstrate the CLT. This tool was developed by some folks at Seton Hall and, from what I can tell, is free for anyone to use.

Fun with the Weibull Distribution

So here is the situation. Let’s assume we have a process that exhibits a Weibull distribution which would fail to pass as “normal” data as it is skewed to the right. This means we can not use any so called “parametric” hypothesis tests. Often times we see a Weibull distribution with reliability/ failure analysis data.

Let’s now pretend we send someone out on 50 different occasions (trials) to collect data from this process we know to exhibit a Weibull distribution. Let’s also assume we tell this person to only collect “1” data point per trip/trial. After the 50th trial we take the 50 data points and study its shape (Top Figure).

The blue bars are our data and the yellow outline is what a typical Weibull distribution looks like. As you can see our distribution looks pretty Weibull-ish.

Now then, we then tell the person to go back 50 more times. Only this time we ask them to collect 5 samples instead of 1 sample each trip out. We will take the 5 data points from each trial and average them together. After the 50th trial we take the 50 data points (remember each data point is an average of 5 numbers) and study its shape (Middle/Right Figure).

Notice how the distribution is beginning to behave a bit more normal like although still maintaining a little Weibullishness? Yes, that’s a word… at least in my dictionary.

Ok, we now attempt to really push our luck as we ask the person to go back out one last time. This time we ask them to collect 25 samples during each trial. We also buy them lunch at this point as we are beginning to whip them pretty good!

So, they go back out and collect 25 samples per trial. Again, we take these 25 samples and average them like in the second trial when we averaged 5 data points. After the 50th trial we take the 50 data points and study its shape (Bottom Figure).

Now we clearly see a normal, bell shaped, distribution beginning to appear. And all we did was increase the sample size, n, from 1 to 5 and finally to 25.

When you teach people this it’s at this point where you reel them in. Also, turn the simulation tool over to them so they can play around with it. Tell them to prove the theorem wrong if they can. No worries, they can’t.

Break out the dice

Another fun way to demonstrate the CLT is with fair dice. Simply have someone roll 1 die 50 times noting their results after each roll. When they graph this the distribution will be very flat. Then give them 2 die and have them roll them both at the same time 50 times (averaging the results each run). Finally, give them 5 die and repeat. You will see the distribution become more and more normal as the sample size, n, increases.

Update: In case you don’t read the comments Rob, from LearnSigma, shared a link to the coolest dice game Applet. So check it out. Thanks Rob.

Meikah Delid
July 16, 2007 - 11:35 pm
Reply

Good work, Ron! I understand CLT even if I strain myself most of the time to understand. Your last example finally got to me. 🙂
Ron Pereira
July 17, 2007 - 6:04 am
Reply

Thanks Meikah. I find that you must “show” people the theorem in action which is why a simulator is so useful.

But if you can do the dice that is even better as it is a bit more interactive… and who doesn’t like to throw dice around!
Rob
July 17, 2007 - 6:46 am
Reply

Ron – good explanation of an often confusing concept.

Here’s a nice simulation: http://tinyurl.com/23ssq3
Ron Pereira
July 17, 2007 - 7:10 am
Reply

Excellent tool Rob! Thanks for sharing. I hope everyone uses this dice game.
George Tete
January 24, 2008 - 6:46 am
Reply

why do you need to sudy and understand the Central Limit Theorem, Exponetial Density Function and Analysis of Variance
George Tete
January 24, 2008 - 6:52 am
Reply

why do you study and understand the Central Limit Theorem, Exponential Density Function and Anaylsis of Variance
Ron Pereira
January 24, 2008 - 8:59 am
Reply

Hi George, can you please elaborate more on your question?
Anonymous
May 19, 2009 - 9:19 pm
Reply

Hi Ron,

I think I understood the theorem, but I want to know WHY this happens.
kumar sharma
May 16, 2010 - 1:56 am
Reply

Gr8888 explanation…..its mind blowing, students willl become my fans if i teach this lol
Lean Education
January 14, 2011 - 2:45 am
Reply

The famously quipped that there were three ways to avoid telling the truth: lies, damned lies, and statistics. The joke works because statistics frequently seem like a black box—it can be difficult to understand how statistical theorems make it possible to draw conclusions from data that, on their face, defy easy analysis…

Anonymous
March 14, 2011 - 8:05 pm

The most confusing part ( and for which one will search in vain for an answer) is why the average of the means is just the mean but the average of the variances is the variance divided by the square root of N. Sure you can find thousands of sites that give you the formula but try to find just one site that explains the part about the variance.

alireza emadin
April 11, 2011 - 5:42 am
Reply

wanted to say thank you !
Andrew Anonymous
November 14, 2011 - 7:15 pm
Reply

I think the last example got me. Thank you.
simon
August 15, 2012 - 1:56 am
Reply

since my first year of study untill today never understnd CLT like this ,thank u so much,this is my last year on statistic, i think now i cn claim that im statition since i know in details what is the idea from central limit theorem,
Zubair
April 10, 2013 - 10:34 am
Reply

very easy to understand, thanks 🙂
Zubair
Anonymous
November 8, 2013 - 10:53 am
Reply

I am taking Stats right now and this article really helped me !
Thanks 🙂

Explaining the Central Limit Theorem

Central Limit Theorem – What is it?

How can it be?

Prove it to me

Time to Simulate

Fun with the Weibull Distribution

Break out the dice

Ron Pereira

Meikah Delid

Ron Pereira

Rob

Ron Pereira

George Tete

George Tete

Ron Pereira

Anonymous

kumar sharma

Lean Education

Anonymous

alireza emadin

Andrew Anonymous

simon

Zubair

Anonymous

Have something to say? Cancel reply

Recent Articles

GA 525 | Crafting a Long-Term Vision with Jennifer Drago

Enhancing Organizational Efficiency through Supplier Process Improvement

GA 524 | Connection Before Correction with Royden Johnson

GA 523 | The Cleveland Clinic Improvement Model with Tim Pettry

GA 522 | Continuous Improvement at Jack Henry with April Bell

Start your Lean & Six Sigma training today.