Dealing with Non Normal Data

Robin, over on the iSixSigma blog, had an interesting post regarding hypothesis testing. Specifically, the question posed was how to deal with non normal data.

Typically, most Six Sigma practitioners are taught to use “non parametric” tests such as the Moods Median Test when dealing with non normal data instead of “parametric” tests such as ANOVA and the 2 Sample t.  I wanted to touch on this as I have some opinions to share.

The Technical Issue

The main question here is whether or not using parametric tests (ANOVA, etc.) with non normal data will lead us in the wrong direction.  One (not the only) of the underlying issues revolves around the use of the standard deviation in the calculations.  If, for example, our data is skewed using the median is recommended which impacts the measure of dispersion we need to use (i.e. standard deviation).  Why?  Let me use an example to explain.

Rich Dad Messes Things Up

In most neighborhoods, the super wealthy folks at the end of the subdivision are not representative of the rest of us regular Joe’s.  So their $1.5 million dollar homes can really skew the mean or average home price of the neighborhood. This, in turn, may influence the standard deviation calculation (which uses the mean in its formula) in a misleading manner. And since the mean and standard deviation are used in most parametic tests the issues begin to really compound.  Statistics are sort of like dominos I suppose.

What to do?

So, what you may ask, is a person to do when faced with non normal data?  My personal approach is to study the data using both parametric AND non parametic tests. The funny thing is in most cases the results of the tests tell the same story. 

So, instead of debating and studying the mind numbing statistics books on my desk I choose to be as speedy as possible while still ensuring I am confident in my conclusion.  So the extra 45 seconds it takes me to run both tests is much better than debating and wondering what to do.

Don’t get carried away

Six Sigma is often criticized for it’s analysis paralysis approach to problem solving.  Hypothesis testing is powerful and should be used by all continuous improvement practitioners, lean and six sigma alike. But with this said… it’s these long, drawn out debates such as which test to use in certain situations that frustrate people. So my advise is to stop debating and do both tests… then get back the gemba and make something else better!

1 Comment

  1. Rob

    June 16, 2007 - 1:23 am

    This article [http://tinyurl.com/yrprzg] suggests four “solutions” to handling non-normal data:
    * Sub group averaging
    * Segmenting data
    * Transforming data
    * Using different distributions
    * Non-parametric statistics

    I tend to fit the distribution first, then try to transform the data then finally get into non-parametric statistics. However, I agree with you, sometimes I just run the numbers and use practical understanding of the process as well.