In our explainer on statistical significance and statistical testing, we introduced how you go about testing a hypothesis, and what can be legitimately inferred from the results of a statistical test. In this post, we will look at a way in which this process can be abused to create misleading results. This is a technique known colloquially as ‘p-hacking’. It is a misuse of data analysis to find patterns in data that can be presented as statistically significant when in fact there is no real underlying effect.
Why it matters?
Most scientists are careful and scrupulous in how they collect data and carry out statistical tests. However, there are ways in which statistical techniques can be misused and abused to show effects which are not really there. To avoid reporting spurious results as fact and giving air to bad science, journalists must be able to recognise when such methods may be in use. This piece introduces one such technique known as ‘p-hacking’. It is one of the most common ways in which data analysis is misused to generate statistically significant results where none exists, and is one which everyone reporting on science should remain vigilant against.
A foregone conclusion
In the statistical significance post, the scientist went in with a well-motivated hypothesis which she put to the test in an experiment. This is a baseline assumption of scientific testing: that the scientist forms a prior hypothesis based on the theory which they then put to the test.
Suppose, however, a scientist took the opposite approach. Suppose they started off with the conclusion they want to reach, and were not particularly concerned with scientific ethics. In this case, they could use statistical testing to manufacture this result through selective reporting.
To take a toy example, suppose you wanted to establish a link between chocolate and baldness. You could then get a group of 10,000 men (a pretty big sample size by all accounts) to report on their consumption of M&Ms, Twix and Mars Bars over a period of time. In addition, you record the rate of going bald in the group over time.