The p-value expresses the probability of obtaining a test statistic that is at least as extreme as the one result actually observed, assuming that the null hypothesis is true. The scientific method attempts to disprove the null hypothesis. The null hypothesis is disproved when the p-value is less than the statistically significant level. That level is 0.05 or the p-value. Another way to express this may be that the p-value is the chance of a random false positive.
Not only does modern science rest on hypothesis testing and the use of the p-value to assign statistical significance to test results, it is a familiar number to practitioners of six sigma. We could say that the p-value is a standard for testing the outcomes of our experiments, of validating the results of our continuous improvement efforts.
An article in the August 2011 Scientific American argues that the p-value is in fact an arbitrary standards and not always a trustworthy one. The author states:
Many scientific papers make 20 or 40 or even hundreds of comparisons. In such case, researchers who do not adjust the standard p-value threshold of 0.05 are virtually guaranteed to find statistical significance” in what are actually statistically meaningless results.
Father of modern statistics Ronald A. Fisher invented the p-value as an informal measure of evidence against the null hypothesis. Although often overlooked, Fisher called on scientists use other types of evidence such as the a priori plausibility of the hypothesis and the relative strengths of results from previous studies in combination with the p-value.
Can a dead salmon read your mind? According statistically significant results backed by the p-value, they can. With the application of skepticism based on past experience and plausibility, researchers in the Scientific American article example recognized that the hypothesis and their method of testing it was questionable, a p-value of 0.01 notwithstanding.
Inasmuch as lean thinking and kaizen are reliant on the scientific method, we need to follow Fisher’s advice and apply statistical (if arbitrary) standards in combination with the intuition based on common sense and experience. At the same time, the truths within lean systems run deeply counter to our intuitions and even our (mis)perceptions of experience. One at a time is faster than many at a time. A lot of safety stock creates unsafe conditions. Taking time to clean up actually saves time.
Years ago I asked a six sigma MBB how we could possibly test the hypothesis that the p-value was a valid standard. He looked at me as if I was an idiot and said, “We live in a universe governed by statistics.” The history of science is littered with universal laws which became outdated as our understanding of our world advanced. All standards are temporary and subject to improvement. If we continue to use the p-value as a standard, we are required to kaizen this standard.