Human error stats

Within our next awareness module on "Mistakes", we would quite like to using some headline statistics to emphasize the importance of human error in information security, illustrating and informing.

So what numbers should we use? 

Finding numbers is the easy part - all it takes is a simple Google search. However, it soon becomes apparent that many of the numbers in circulation are worthless. So far, I've seen figures ranging from 30 to 90% for the proportion of incidents caused by human error, and I've little reason to trust those limits!

Not surprisingly the approach favored by marketers is to pick the most dramatic figure supporting whatever it is they are promoting. Many such figures appear either to have been plucked out of thin air (with little if any detail about the survey methods) or generated by nonscientific studies deliberately constructed to support the forgone conclusion. I imagine "What do you want us to prove?" is one of the most important questions some market survey companies ask of their clients.

To make matters worse, there is a further systemic bias towards large numbers. I hinted at this above when I mentioned 'emphasize the importance' using 'headline statistics': headlines sell, hence eye candy is the name of the game. If a survey finds 51% of something, it doesn't take much for that to become "more than half" then "a majority", then "most", then, well, whatever. As these little nuggets of information pass through the Net, the language becomes ever more dramatic and eye-catching at each step. It's a ratchet effect that quite often ends up in "infographics": not only are the numbers themselves dubious but they are deliberately visually overemphasized. Impact trumps fact. 

So long as there is or was once (allegedly) a grain of fact in there, proponents claim to be speaking The Truth which brings up another factor: the credibility of the information sources. Through bitter experience over several years, I am so cynical about one particular highly self-promotional market survey company that I simply distrust and ignore anything they claim: that simple filter (OK prejudice!) knocks out about one third of the statistics in circulation. Tightening my filter (narrowing my blinkers) further to discount other commercial/vendor-sponsored surveyors discounts another third. At a stroke, I've substantially reduced the number of figures under consideration.

Focusing now on the remainder, it takes effort to evaluate the statistics. Comparing and contrasting different studies, for instance, is tricky since they use different methods and samples (usually hard to determine), and often ambiguous wording. "Cyber" and "breach" are common examples. What exactly is "cybersecurity" or a "cyber threat"? You tell me! To some, "breach" implies "privacy breach" or "breach of the defensive controls" or "breach of the defensive perimeter", while to others it implies "incidents with a deliberate cause" ... which would exclude errors.

For example, the Cyber SecurityBreaches Survey 2018 tells us: 
"It is important to note that the survey specifically covers breaches or attacks, so figures reported here also include cyber security attacks that did not necessarily get past an organisation’s defences (but attempted to do so)."
Some hours after setting out to locate a few credible statistics for awareness purposes, I'm on the point of either giving up on my quest, choosing between several remaining options (perhaps the 'least bad'), lamely offering a range of values (hopefully not as broad as 30 to 90%!) ... or taking a different route to our goal. 

It occurs to me that the situation I'm describing illustrates the very issue of human error quite nicely. I could so easily have gone with that 90% figure, perhaps becoming "almost all" or even "all". I'm not joking: there is a strong case to argue that human failings are the root cause of all our incidents. But to misuse the statistics in that way, without explanation, would have been a mistake.