Saturday 27 February 2010

Awareness value of a US data center incident

Consonus, a US data-center/co-location facility provider that prides itself on its "highly secure and reliable data centers", suffered a rather embarrassing physical security incident at one of its data centers on Saturday February 20th. An email from the Consonus data center manager to his customers indicates that an Inergen automated fire suppression system was accidentally triggered during a routine 6-monthly inspection of the fire system. This incident somehow damaged a large number of disks in the facility - I understand from other less reliable sources that as many as five hundred disks may have bitten the dust. Oops.

The point of this blog posting is not to poke fun at Consonus, who have clearly invested heavily in state-of-the-art controls and appear to have a comprehensive approach to information security, but rather to indicate that control failure remains a risk that we should all consider, no matter how strong we believe our controls may be.

In this incident, disk damage was evidently not the anticipated result of triggering the fire suppression system. It was an unforseen risk, exactly the kind of thing that contingency planning is designed to mitigate. I wonder how many of Consonus' customers either buy its optional disaster recovery and data protection (evidently meaning backup and archival) services, or have their own contingency controls in place, or didn't but now wish they did ...

At the same time, this incident is probably not generating the kind of publicity that Consonus would welcome (although there's some truth in the saying that there's no such thing as bad publicity! It's not all bad news). I wonder if their customer services team has its own contingency plan for this kind of event?

This unfortunate incident would form the basis of an excellent case study for security awareness purposes, but it's far from isolated. The truth is that unpredictable and costly information security incidents happen more often than most people realize [and here I'm talking in general terms, explicitly not referring to Consonus!]. In the course of my career, I have seen many and, I'm ashamed to admit, been personally involved in a few.

Investing in high availability technologies and strong security measures still cannot guarantee that essential IT services will be 100% available under all circumstances. Testing the fire system 'outside normal office hours' reduces but does not eliminate the risks. Siting IT facilities above the anticipated '100-year flood level' is merely gazing into some weather man's crystal balls. 'Uninterruptible power supply' is an oxymoron.

Even if information security is truly taken to heart by an enlightened senior management, as IT technologies and services get ever more complex, some types of coincident or catastrophic failure (including those caused by the very security controls we are implementing) become more not less likely.



Contingency planning depends on contingency thinking, which starts with someone posing the inevitable "What if ...?". There's a fine art to getting managers to suspend their rather charming but somewhat dubious trust in technology just long enough to consider what might happen if things don't in fact work perfectly, while at the same time not going so far as to be accused of just spreading FUD or constantly crying wolf (which is where classic "worst case scenarios" can easily lead). This is exactly the area where security awareness really helps in that it aligns information security and business thinking, focusing everyone on the risks and controls with the benefit of knowledge of what can, and indeed does, go wrong in similar situations elsewhere.

And that's why case studies make such good awareness tools. Better to learn from other people's misfortunes than to suffer them yourself.