Website attack exposure metric

Page Rank versus Internet attack rate - a worked example of the metrics selection/design process

In theory, organizations that establish a substantial web presence for marketing reasons are also, potentially, setting themselves up as high-profile targets for Internet-based attacks.   But is the premise true?  Are organizations that maintain high-profile websites attacked more often from the Internet than those that maintain a low profile?  Let's try to address the question by developing a metric using the PRAGMATIC approach.

Most websites are routinely scored or ranked by Internet search engines and social networking sites.   Some popularity measures are distinctly dubious, however, but Google Page Rank is a simple, widely-used and well-respected measure of the visibility or popularity of a website, so that part of the equation is easy enough. 

To measure Internet-based attacks, we might count up the number of web-related information security incidents such as website hacks, worms, social engineering/frauds etc. over a suitable period (e.g. an hour, a day, a week or a month), using log and alert data from the organization's firewall, intrusion detection system, incident reporting/management system and so on.   Although this information will take more effort to gather than Page Rank, it's feasible provided the numbers are readily available from the respective security systems.

So, if we now divide Page Rank by weekly attack count, we have our metric ... but in order to answer our rhetorical question we need to compare organizations on a comparable basis.  Here we run into two problems: 
  1. Although the algorithms are proprietary to Google, we understand that Page Rank is determined mechanistically from Google's search data, hence different websites can presumably be compared directly by their Page Ranks.  The rate of Internet-based attacks, however, is not well defined.  Even if we standardize the measurement period or adjust the rates accordingly, different organizations have different firewalls, different working definitions of Internet attacks, different detection sensitivity and so on.   It's going to be tough to standardize or normalize all these elements.
  2. Page Ranks are available in the public domain, but Internet-based attack rates aren't.  Most organizations consider information security-related data sensitive, if not secret.  The information might be available under some sort of non-disclosure agreement, or via a trusted intermediary such as the bodies that routinely survey and compare organizations' security for benchmarking purposes (e.g. the Information Security Forum).  Organizations that choose not to share their information cannot be used by others as comparators.
At this point, let's see how the metric rates using PRAGMATIC:
  • Predictiveness: provided the premise is true, an increase in Page Rank will probably increase the rate of Internet-based attacks, and an organization with a lower PR than a peer might anticipate a lower rate of Internet-based attacks.  However, the premise is unproven and a significant proportion of Internet-based attacks are non-specific and undirected - automated scans for vulnerable software, worms, spam etc.  So let's give it 45% for this criterion.
  • Relevance:  although Internet attack rate is clearly of interest to those managing, directing and funding Internet-related information security, we don't actually know whether the Page Rank element of this metric is relevant to information security or not: that's the rhetorical question we set ourselves earlier.  We think it is, but we're not sure.  Score 65%.
  • Actionability: here we have a BIG problem.  What are we supposed to do with this metric?  Let's say the number is 'very high' or 'very low' - are we expected to do something either way, and if so what?  There's not a lot we can do about our Page Rank nor the rate of Internet-based attacks against us.  We could tweak the web marketing budget, maybe, and increase or decrease the deterrents against attacking us, but to be honest neither one is very effective.  Score: 20%
  • Genuinness: how hard would it be to fake or manipulate this metric?  The Page Rank part is pretty much beyond our control, but we already noted that there are various elements to Internet attack rate, so there is some interpretation required there and the potential for someone so inclined to be deliberately selective when generating the numbers.  Score: 75%
  • Meaningfulness:  this criterion specifically concerns the perspective of the intended audience of the metric.  Oh oh!  Up to this point, we haven't really considered who that might be.  Who might be interested in the number?  Who might use the information?  The CISO or Head of Marketing maybe?  It's not entirely obvious.  Given that it has taken us so many words to explain this metric to you, and given that you are reading an infosec metrics blog, you are probably more clued-up than the audience for the metric, this metric clearly does not qualify as "self-evident" either.  Score 25%
  • Accuracy: whereas the attack-rate part of the metric is very granular, Page Rank has just nine possible values, ten if there is such a thing as a PR0 page.  Page Rank applies to individual pages not whole websites, so we could check the PR of several pages and generate a mean, I guess, but I doubt it is worth the effort.  The PR of the home page is probably "close enough for Government work" but perhaps you ought to check if you decide to go ahead with this metric.  Score 45%
  • Timeliness: determining the attack rate involves collating statistics from various sources, but it's not unreasonable to assume that most if not all of them are automated, hence once the appropriate data feeds are configured, the metric should be available pretty much immediately at the end of the reporting period.  Furthermore if we adjust something as a result of the metric, we would expect to see a change in the numbers within a few reporting periods.  Score 80%
  • Independence: Page Rank comes from Google, and can be verified easily.  The other figures come from the security systems and could also be verified with a bit more work, but there is a residual risk that someone might manipulate the systems, the figures or the calculations for their own purpose (whatever that might be!).  Score 80%
  • Cost-effectiveness: the metric is fairly cheap to generate, once the security systems have been correctly configured.  Changes to the systems would require changes to the data collection and calculations, and probably some effort to re-base prior periods if trends are important, but again these are not especially costly.  However, the other part of this criterion is the business value of the metric, and there we struggle.  It's not clear who would be the audience for the metric, nor how they would use it, consequently the business value is hard to fathom.  Score 20%
The overall PRAGMATIC score, then, is a simple unweighted mean: 51%.  The metric as currently described has little to commend it.  However, the analysis has pointed out a number of issues that might be addressed during the metrics design process, in particular clarifying its audience and purpose.  Unless we make genuine headway on those factors, there are probably other more PRAGMATIC metrics to keep us occupied.

To conclude this worked example, three Hinson tips to mull over at your leisure:
  1. You might quarrel with those PRAGMATIC ratings, in fact I would be very disappointed if you didn't because that would imply that you haven't fully engaged with the process.  The ratings are subjective and as such there is definitely room for interpretation, discussion, argument even.  In this context, that's a very positive thing because it forces us to think through the issues and elaborate on our requirements, concerns and thoughts, considering each other's perspective and developing our own.  The final score has some value but the thinking behind it is invaluable.  Done well, the PRAGMATIC method involves active participation and, yes, the odd laugh.  
  2. Astute readers of our book, or of Lance Hayden's, may have noticed that we skipped the G in the classic GQM approach - we didn't clarify the business Goal but leaped directly into the Question and then the Metric.  Tut tut, go to the back of the class.  This is the root of the issue of the unclear audience and purpose for the metric.
  3. We selected a random graph to illustrate this blog item, but if we were serious about proposing this metric, it would be well worthwhile mocking-up the metric to see how it might look if we went ahead with it.  Doing so would force us to think about the type and style of graph, the axes including the timescale, and the data points to be plotted.  It would also give us something more concrete to discuss with our potential audience, a focal point (and, yes, we'd have to discuss it with someone, so that would be our cue to decide who!).  Even better would be to discuss a small sheaf of alternative metrics, considering their PRAGMATIC scores on a relative basis and gradually figuring out their pros and cons, enabling us to shortlist and refine a single metric, or maybe a couple.  Less is more.