Cause =/= Effect

Animals like us are fantastic at spotting patterns in things - it's an inherent part of our biology, involving parts of our brains that are especially good at it. Unfortunately, while some patterns are significant, many are not, and our brains are not terribly good at differentiating between the two - in fact, we tend to overemphasize matches, believing them to be especially significant, meaningful and, in a sense, real.

It could be argued that both pattern-recognition and overemphasis on matches are the result of natural selection over millenia, since in the wild, anything that helps us quickly identify and respond to possible attacks by predators, even if there are none, is likely to increase our survival, within reason anyway. Arguably, this is what makes wild animals 'alert', 'nervous' or 'jumpy'. It's a fail-safe mechanism. It's also the root of the fear we feel when we think we are in a dangerous situation, such as walking down a dark alleyway in an unfamiliar city at night. The sense of physical danger heightens our senses and primes our fight-or-flight instincts with a boost of adrenaline. Running away screaming from a harmless vagrant is safer than ignoring potential threats.

However, what I've just done in that paragraph is invent a vaguely plausible scenario, outlined it briefly, and some of you now believe it to be true, based on nothing more than its apparent plausibility and my credibility (such as it is). The reason I mentioned running away screaming was to stimulate a visceral reaction in you: the strong emotions that situation invokes adds even more emphasis to the story.  It 'makes sense'. In fact, there are many other plausible scenarios or reasons why pattern-recognition and overemphasis might or might not be linked to anything but having described a particular pattern, that is probably now locked into your brain and perhaps given special significance or meaning.

To illustrate my point, look at pattern-recognition from the predator's perspective: predators need to recognize possible prey and respond ahead of competing predators ... but distinguishing edible prey from everything else (including other predators, animals with poisonous or otherwise dangerous defenses, and rocks) is a critical part of the predator's biology. Attacking anything and everything would be a fail-unsafe approach, the exact opposite of prey. In reality, there are very few 'pure' predators or prey: even prey animals need to eat, while apex predators at the very top of the food chain may have a fear of cannibalism or prey that successfully fights back, so the real world is far more complex that my simplistic description implies.

OK, with that in mind, take a look at this graph:


Sure looks like the red and black lines are related, doesn't it? They track each other, on the whole. Their patterns match quite closely over the 13 year period shown, implying that they are somehow linked. In that specific case, statistical analysis tells us that the two variables are indeed correlated with a probability of just under 79% where 100% represent total identicality (indistinguishable) and 0% represents total discrepancy (no relation whatsoever). 79% is a pretty high value, so it is entirely possible that the two variables are indeed linked. 

So, at this point we think we've found a link between <ahem> the annual number of non-commercial space launches globally and the annual number of sociology doctorates awarded in the US - for those are the numbers graphed! Hmmmm.

Yes, you might be able to come up with some vaguely credible reasoning to explain that apparent linkage, but be honest it would be a stretch of the imagination and would involve considerable effort to find, which you might be willing to do if you feel the pattern-match is somehow significant (!). Far more likely is that we've simply found a matching pattern, a sheer coincidence, a fluke. If we have enough data available and keep on searching, we can probably find other variables that appear to correlate with either of those two, including some with even higher coefficients of correlation ...

... which I guess is pretty much what someone has done - using automated statistical techniques to find correlations between published data. Have a browse through these spurious correlations for some 29,999 other examples along these lines, and remember all this the next time you see a graph or a description that appears to indicate cause-and-effect linkages between anything. We humans desperately want to see matches. We find them almost irresistable and especially significant, almost magical, verging on real. Unfortunately, we are easily deluded.

From that point, it is but a short hop to 'lies, damn lies, and statistics'. Anyone with an axe to grind, sufficient data and a basic grasp of statistics can probably find correlations between things that appear to bolster their claims, and a substantial proportion of their audience will be swayed by it, hijacked by their own biology. I rather suspect that civil servants, politicians and managers are pretty good at that.

By the way, although I recognise the bias, I am far from immune to it. I try to hold back from claiming causal links purely on the basis of patterns in the numbers, and phrase things carefully to leave an element of doubt, but it's hard to fight against my own physiology.

Think on.
Gary.

PS. Finding spurious matches in large data sets is an illustration of the birthday paradox: there is a surprisingly high probability that two non-twin students in the average class were born on the same day. 

PPS The 79% correlation in the example above is only a fraction beneath the 'magical' 80% level. According to Pareto's Principle (I'm paraphrasing), 80% of stuff is caused by 20% of things. It's a rule-of-thumb that seems to apply in some cases, hence we subconsciouly believe it can be generalized, and before you know it, it's accepted as truth. The fact that 80% + 20% = 100% is somehow 'special' - it's another obvious but entirely spurious pattern.