Sunday 10 January 2021

Y2k + 20: risk, COVID and "the Internet issue"


It feels like 'just the other day' to me but do you recall "Y2k" and all that? 

Some of you reading this weren't even born back then, so here's a brief, biased and somewhat cynical recap.

For a long time prior to the year 2000, a significant number of software programmers had taken the same shortcut we all did back in "the 90s". Year values were often coded with just two decimal digits: 97, 98, 99 ... then 00, "coming ready or not!".

"Oh Oh" you could say. "OOps".

When year counters went around the clock and reset to zero, simplistic arithmetic operations (such as calculating when something last happened, or should next occur) would fail causing ... well, potentially causing issues, in some cases far more significant than others.

Failing coke can dispensers and the appropriately-named Hornby Dublo train sets we could have coped with but, trust me, you wouldn't want your heart pacemaker, new fangled fly-by-wire plane or the global air traffic control system to decide that it had to pack up instantly because it was nearly 100 years past its certified safe lifetime. Power grids, water and sewerage systems, transportation signalling, all manner of communications, financial, commercial and governmental services could all have fallen in a heap if the Y2k problems wasn't resolved in time, and this was one IT project with a hard, immutable deadline, at a time when IT project slippage was expected, almost obligatory. 

Tongue-in-cheek suggestions that we might shimmy smoothly into January 1st [19]9A were geekly-amusing but totally impracticable. 

In risk terms, the probability of Y2k incidents approached 100% certain and the personal or societal impacts could have been catastrophic under various credible scenarios - if (again) the Y2k monster wasn't slain before the new year's fireworks went off ... and, yes, those fancy public fireworks display automated ignition systems had Y2k failure modes too, along with the fire and emergency dispatch systems and vehicles. The combination of very high probability and catastrophic impact results in a risk up at the high end of a tall scale. 

So, egged-on by information security pro's and IT auditors (me, for instance), management took the risk seriously and invested significant resources into solving "the Y2k issue". 

Did you spot the subtle shift from "Y2k" to "the Y2k issue"? I'll circle back to that in just a moment. 

Individual Y2k programming updates were relatively straightforward on the whole with some interesting exceptions, mostly due to prehistoric IT systems still in use well past their best-before dates, with insurmountable hardware, software and wetware limitations. The sheer overwhelming scale of the Y2k problem was the real issue through. Simply finding all those IT systems was an enormous global challenge, let alone testing and where necessary fixing or replacing them all. The world discovered, during '98 and '99 (there I go again!) that rather few "computers" were as obvious as the beige boxes proliferating on desktops at the time, nor even the massive machines humming away in air conditioned sanctuaries known as "the mainframe". Counting the blue IBM labels was no longer considered an adequate form of computer stock-taking. Computers and chips were "everywhere", often embedded in places that were never intended or designed to be opened once sealed in place. It was almost as if they had been deliberately hidden. Conspiracy theories proliferated almost as fast as Y2k jokes. 

Flip forward 20 years and we see similar horrors unfolding today in the form of myriad IoT things and 'the cloud', so indistinct and unclear that people long since gave up trying to draw meaningful network diagrams - only now the year encoding aspect is the least of our security problems. But I digress. Back to the plot.

From what I saw, for reasons of expediency and ignorance, the general solution to "the Y2k problem" was to treat the superficial symptoms of an underlying disease that we still suffer today. We found and corrected Y2k issues in software. I believe the world as a whole missed a golden opportunity to change our software design, development, testing and maintenance processes to prevent Y2k-like issues ever arising again. Oh sure, some organizations implemented policies on date encoding, and presumably some were far-sighted enough to generalise the issue to all counters and maybe coding shortcuts etc. but, on the whole, we were far too busy baling out the hold to worry about where the ship was heading. Particularly during 99, we were in crisis mode, big time. I remember. I was there.

Instead of thinking of the Y2k work as an investment for a better future, it was treated as a necessary expense, a sunk cost. If you don't believe me, just ask to see your organisation's inventory containing pertinent details of every single IT device - the manufacturers, models, serial numbers, software and firmware revisions, latest test status, remediation/replacement plans and so on. We had all that back in 99. Oh wait, you have one? Really? So tell me, when was it last updated? How do you know, for sure, that it is reasonably comprehensive and accurate? Go ahead, show me the associated risk profiles and documented security architectures. Tell me about the IT devices used in your entire supply network, in your critical infrastructure, in everything your organisation depends upon. 

Make my day.

Even the government and defence industries would be very hard pressed to demonstrate leadership in this area.  

That's not all. Following widespread relief that January 1st 2000 had not turned out to be a cataclysmic global disaster, we slipped into a lull and all too soon "the Y2k problem" was being portrayed in the media as "the Y2k debacle". Even today, two decades on, some pundits remain adamant that the whole thing was fake news created by the IT industry to fleece customers of money.

It was a no-win situation for the IT industry: if things had gone horribly wrong, IT would definitely have copped the blame. Despite the enormous amount of hard work and expense to ensure that things did not go horribly wrong, IT still cops the blame. 

Hey, welcome to the life of every information risk and security professional! If we do our jobs well, all manner of horribly costly and disruptive incidents are prevented ... which leaves our organisations, management and society at large asking themselves "What have the infosec pros ever done for us? OK, apart from identifying, and evaluating, and treating information risks ...".

For what it's worth, I'm very happy to acknowledge the effort that went into mounting an almost unbelievably successful Y2k rescue mission - and yet, at the same time, we were saved from a disaster of our own making, a sorry tale from history that we are destined to repeat unless things change.

As I mentioned, two major areas of risk have come to the fore in the past decade, namely the information risks associated with IoT and cloud computing. They are both global in scope and potentially disastrous in nature, and worse still they are both linked through the Internet - the big daddy of all information risks facing the planet right now. 

The sheer scale of the Internet problem is the real issue. Simply finding all those Internet connections and dependencies is an enormous global challenge, let alone testing and where necessary securing or isolating them all.

You do have a comprehensive, risk-assessed, supply-chain-end-to-end inventory of all your Internet dependencies, including everyone now working from home under COVID lockdown, right? Yeah, right.

If you don't see the parallel with Y2k, then you really aren't looking ... and that's another thing: how come "the Internet issue|problem|risk|crisis ..." isn't all over the news?

Yes, obviously I appreciate that COVID19 is dominating the headlines, another global incident with massive impacts. The probability and impact of global pandemics has been increasing steadily for decades in line with the ascendance of global travel, increasing mobility and cultural blending. Although the risk was known, we failed to prevent a major incident ... and yet, strangely, the health industry isn't in the firing line, possibly because we are utterly dependent on them to dig us out of the cesspit, despite the very real personal risks they face every day. They are heroes. IT and infosec pro's aren't. I get it. Too bad.

OK, that's enough of a rant for today. I will expand on "the Internet issue|problem|risk|crisis" in a future episode. Meanwhile, I'll click the Publish button in just a moment, while it still works.