Five characteristics of effective metrics


What makes a security metric effective or good?  What makes one ineffective or bad?  Can we spot shining stars among the duds, short of actually firing them off to management for a few months and watching the fallout?  

It's an interesting question that gets into our understanding of metrics.

Naturally, Krag and I believe we know the answers, but we're not the only ones to have expressed an opinion on this.

[Before you read on, what do you think makes a good security metric?  Take a moment to mull it over.  It's OK, you don't need to tell anyone.  It's your little secret.]

Following a conference presentation by Gartner's Jeffrey Wheatman, Tripwire's Dwayne Melancon wrote up what he described as "a really good list of 'Five characteristics of effective metrics'" that had been presented by Wheatman:
  1. Effective metrics must support the business’s goals, and the connection to those goals should be clear.
  2. Effective metrics must be controllable. (In other words, don’t report on the number of vulnerabilities in your environment, since you can’t control that.  Instead, report on the % of “Critical” systems patched within 72 hours, which you can control).
  3. Effective metrics must be quantitative.
  4. Effective metrics must be easy to collect and analyze. (Wheatman says “If it takes 3 weeks to gather data that you report on monthly, you should find an easier metric to track.”)
  5. Effective metrics are subject to trending.  (Tracking progress and setting targets is vital to get people to pay attention)
I agree to an extent with the first characteristic (along with Jaquith's fifth criterion - see below) but Wheatman's phrasing, as reported by Melancon, is subject to differing interpretations.  If a security metric only partly supports the business' goals, does that necessarily mean it is not effective?  What if there simply is no better metric?  Effectiveness is a comparative not an absolute value, and sometimes we have to settle for metrics that are good enough rather than perfect.  That said, it does make sense to clarify the connections or associations between metrics and organizational objectives, values, strategies etc., and ideally to start out with those very objectives etc. when designing or selecting suitable metrics.   Clearly specifying the requirements is a great way to start anything!

Wheatman's second characteristic is almost but not quite right.  I would agree that effective metrics usually measure activities, situations, systems etc. that can be directly controlled or influenced to some extent, but not always.  Sometimes, raw knowledge about a situation is valuable, even if there is no obvious, straightforward way to use it at that point.  "Defcon" is an example: it is a generalized metric, used more as an awareness or alerting tool than a way to switch certain behaviors and activities on or off (although some anticipated behaviors and activities are no doubt specified in the military procedures and training manuals).  

[I'm sure we could have an interesting panel discussion about the remainder of Wheatment's second statement too: any information security pro would challenge his assertion that you cannot control the number of vulnerabilities - most of the time we are doing exactly that.  I can envisage situations in which 'number of vulnerabilities' could be a valid and worthwhile metric, particularly with a small change to the wording along the lines of 'number of identified or known or confirmed vulnerabilities' (for example in relation to system security testing).  I could also challenge the implied suitability of '% of "Critical" systems patched within 72 hours'.]  

Characteristics 3 and 4 are distinctly reminiscent of "the definition of a good metric" by Andrew Jaquith in his book Security Metrics: Replacing Fear, Uncertainty and Doubt.  According to Jaquith, a good security metric should be:

  • Consistently measured, without subjective criteria;
  • Cheap to gather, preferably in an automated way;
  • Expressed as a cardinal number or percentage, not with qualitative labels like "high", "medium", and "low";
  • Expressed using at least one unit of measure, such as "defects", "hours", or "dollars"; and
  • Contextually specific - relevant enough to decision-makers so that they can take action.

Jaquith's characteristics have been widely circulated for more than five years, at least since the book was published in 2007, but I have seen little critical discussion of them.  It's as if people are simply quoting them without, perhaps, understanding or challenging the implicit assumptions.

Take Jaquith's first criterion, for instance: "Consistently measured" seems fair enough (one could certainly argue that consistency is a useful property, depending on how one defines it), but the subsidiary clause "without subjective criteria" raises a different issue entirely.  One can measure things consistently using subjective criteria, just as one can measure things inconsistently using objective criteria.  Jaquith is confusingly blending two distinct considerations, one of which is quite misleading, into the same criterion.

Jaquith's second criterion is also heavily loaded by the subsidiary phrase.  Equating cheapness with automation is inaccurate and, again, misleading.   It  reflects a strong bias towards the use of automated data sources throughout IT, and implies that manually-collected metrics are junk.  Furthermore, and even more importantly, there are many situations in which the metric's cost is almost irrelevant provided the information and insight it generates is sufficiently valuable - in other words, the issue is not cheapness per se but the metric's cost-effectiveness.  Some security metrics are most certainly worth the investment.  Some cheap security metrics are indeed nasty.

Wheatman's third point, plus Jaquith's third and fourth criteria, are distinctly troubling.  They strongly imply that qualitative measures are totally worthless - that assigning measurement values to categories such as high/medium/low  is innately wrong.  This is a curiously prejudicial view, expressed at some length by Jaquith in Security Metrics and elsewhere.  There are legitimate mathematical concerns about categorization, in particular the misuse of simple arithmetic to manipulate category labels that happen to be numeric.  For instance, a risk categorized as level 1 is not necessarily "half as risky" as one at level 2.  Two level 1 risks are not necessarily equivalent to one level 2 risk.  It is not appropriate to manipulate or interpret category labels in this way (which, if anything, is an argument in favor of textual labels such as high/medium/low or red/amber/green).  However, that does not mean that it is inherently wrong to use categories, nor that metrics absolutely must be expressed as cardinal numbers or percentages which is how Jaquith's criteria are commonly interpreted, even if that is not quite what he means.

There are perfectly legitimate, valid, mathematically accurate and scientifically sound reasons for using metrics that involve categories and/or qualitative values.  One of the most useful is prioritization, or comparative analysis.  Isn't it better to tell management that option A is "much riskier" than option B (based on your subjective analysis of the available evidence, 'you' being an experienced information security/risk management professional), than to withhold that information purely because "riskiness" cannot be expressed as a cardinal number or percentage?  Isn't it just as misleading, biased or wrong to insist, dogmatically, on cardinal numbers or percentages?