Little boxes, little boxes

In preparation for a forthcoming security awareness and training module on business continuity, I'm re-reading The Power of Resilience by Yossi Sheffi (one of my top ten books I blogged about the other day).

It's a fascinating, well-written and thought-provoking book. Yossi uses numerous case studies based on companies with relatively mature approaches to business continuity to illustrate how they are dealing with the practical issues that arise from today's complex and dynamic supply chains - or rather supply networks or meshes.

Risk assessment is of course an important part of business continuity management, for example:

Identifying weak, unreliable or vulnerable parts of the massive global 'system' needed to manufacture and supply, say, aircraft or PCs;
Determining what if anything can be done to strengthen or bolster them; and
Putting in place the necessary arrangements (controls) to make the extended system as a whole more resilient.

Yossi covers the probability plus impact approach to risk analysis that I've described several times on this blog, with (on page 34) a version of the classic Probability Impact Graph:

The dotted lines divide the example PIG into quadrants forming the dreaded 2x2 matrix much overused by consultants and politicians. He discusses more involved versions including the 5x5 matrix used by 'a large beverage company' with numbers arbitrarily assigned to each axis - not the obvious 1,2,3,4,5 linear sequence but (for some barely credible reason) 1,3,7,15 and 31 along the impact axis and 1,2,4,7 and 11 for likelihood or probability, with the implication that they then multiply the values to generate their risk scores.

That appears straightforward but is in fact an inappropriate application of mathematics since the numbers are not cardinal numbers or percentages denoting specific quantities but category labels (ordinals). The axes on the 2x2 matrix could have been labeled green and red or Freda and Fred: it makes no sense to multiply them together ... but that's exactly what happens, often.

Yossi's example PIG above demonstrates another problem with the approach: "Earthquake" is shown across the middle of the impact axis, spanning the Light and Severe categories. So which is it? If it must be in a box, which box?

The obvious response is either to shift "Earthquake" away from the boundary, arbitrarily, or add another central category, dividing that axis into three ... which simply perpetuates the issue since there are so few clear columns on the PIG to draw the lines. Likewise with the rows.

What's more, earthquakes vary from barely detectable up to totally devastating in impact, way more range than the PIG shows. Those barely-detectable quakes happen much more frequently than the devastating ones (fortunately!) hence a more accurate representation would be a long diagonal shape (a line? An oval? A banana? Some irregular fluffy cloud maybe?) mostly sloping down from left to right, crossing two or three of the four quadrants and extending beyond the graph area to the left and right. A single risk score is inappropriate in this case, in almost all cases in fact since most risks show the same effect: more significant and damaging incidents typically occur less often than relatively minor ones. We can't accurately determine where they fall on the PIG since the boundaries are indistinct. We seldom have reliable data, especially for infrequent incidents or those that often remain somewhat hidden and perhaps totally unrecognized as such (e.g. frauds).

As if that's not enough already, the whole situation is dynamic. The PIG is a snapshot representing our understanding at a single point in time ... but some of the risks may have materially changed since then, or could materially change in an instant. Others 'evolve' gradually, while some vary unpredictably over the time horizons typical in business. Some of them may be related or linked, perhaps even inter-dependent (e.g. "Computer virus", or more accurately "Malware", is one of many causes of "IT system failure", hence is it appropriate to show those as two distinct, separated risks on the PIG?).

The possibility of cascading failures is one of Yossi's core messages: it is not sufficient or appropriate to consider individual parts of a complex system in isolation - "the straw that broke the camel's back" or "the butterfly effect". A seemingly insignificant issue in some obscure part of a complex system may trigger a cascade that substantially magnifies the resulting impact. System-level thinking is required, a wholly different conceptual basis.

Given all the above complexity, and more, it makes sense (I think) to dispense with the categories and quadrants, the dodgy mathematics and the pretense at being objective or scientific, using the PIG instead as a tool for subjective analysis, discussion and hopefully agreement among people who understand and are affected by the issues at hand. An obvious yet very worthwhile purpose is to focus attention first and foremost on the "significant" risks towards the top right of the PIG plus those across the diagonal from top left to bottom right, while downplaying (but not totally ignoring!) those towards the bottom left. That's the reason our PIGs have no specific values on the axes, no little boxes, a variety of sizes and shapes of text indicating the risks, overlaid on a background simplistically but highly effectively colored red-amber-green. We're not ignoring the complexities - far from it: we're consciously and deliberately simplifying things down to the point that experts and ordinary people (managers, mostly) can consider, discuss and decide stuff, especially those red and amber zone risks. Are they 'about right'? What have we missed here? Are there any linkages or common factors that we ought to consider? It's a pragmatic approach that works very well in practice, thank you, as both an awareness and a risk management tool.

I commend it to the house.