SMotW #20: uptime

Security Metric of the Week #20: ICT service availability ("uptime")

Uptime is a classic ICT metric that is also an information security metric, although it is seldom considered as such.

Uptime is commonly measured and reported in the context of Service Level Agreements or contracts for ICT services, but in our experience this is usually something of a farce. The IT Department or company generally defines uptime narrowly in ways that suit their purposes rather than being a true reflection of the ICT services actually provided to the business users/customers, while business people don't honestly believe the numbers anyway since (a) they do not reflect their experience as consumers of ICT services, and (b) they are self-assessed and self-reported by the IT people who clearly have an interest in reporting only good news. Tying internal ICT cost recovery to uptime makes things even worse from the security metrics perspective (i.e. providing genuine, fact-based data on which to make business decisions concerning information security), since it places IT and the business in diametrically opposing positions - a recipe for much more heat and smoke than light.

Being rather cynical graybeards, we note that uptime is often defined (by IT) only in relation to ICT service provision during "core service hours" (which IT determines unilaterally). Exclusions are common, particularly 'scheduled downtime' (as if the fact that IT has decided when to take ICT services down somehow magically allows the business to carry on using them normally), 'change and patch implementation' (because the business wanted the changes or patches, so IT can hardly be blamed for doing what they are asked to do) and backups (again, IT rationalizes this exclusion along the lines of "Backups are required by the business, so they should be bloody grateful!").

Despite the drawbacks that we have just described, uptime turns out to have a pretty good PRAGMATIC score as an information security metric:

P	R	A	G	M	A	T	I	C	Score
84	97	66	78	94	61	79	47	89	77%

The very high 97% rating of uptime for Relevance may come as a surprise to those who are unfamiliar with the modern interpretation of information security as 'ensuring the confidentiality, integrity and availability of information'. We rated the metric a few percent less than 100 for Relevance to account for the relatively small amount of business information which lies completely outside of IT: we completely accept that paperwork and knowledge need to be available as well as the ICT systems, networks and data, but without ICT support, a lot of the non-IT information is practically worthless to the business as a whole since it cannot be communicated or processed much beyond its immediate location.

The high rating for Meaningfulness reflects the fact that, leaving aside arcane issues relating to the precise definition, uptime is a simple and familiar measure.

The high Cost-effectiveness score also reflects the metric's simplicity and familiarity: in most organizations, uptime is already being measured, analyzed and reported for purposes other than information security, so the marginal cost to include it in information security reports is negligible. However, Costs can increase markedly if management decides to measure uptime independently of IT, for example using network and system availability monitoring outwith the department, or independently auditing the figures.

[By the way, in discussing this metric in the book, we refer to the interesting metrics challenges presented by cloud computing when significant parts of the ICT service delivery process depend on third parties and resources well outside the organization's physical and logical boundary. The PRAGMATIC approach is just as well suited to developing, selecting and/or improving valuable, worthwhile security metrics for cloud computing as for more traditional approaches ... but I'm afraid you'll have to wait for the book to find out exactly what we mean!]