As organisations around the world continue to recover, the CrowdStrike software glitch serves as a wake-up call to keep businesses secure against unforeseen IT failures, says Puneet Kukreja
It is estimated that 8.5 million Windows devices across 674,620 direct customers in 1,200 unique industries were affected due to a flaw in a routine update issued for a piece of cyber software.
It was not a cyberattack or breach. However, the outage has triggered warnings from cybersecurity experts about a surge in hacking attempts exploiting the IT disruption.
The disruption on 19 July 2024 pales in comparison to the WannaCry virus in 2017 that infected around 230,000 computers across 150 countries before a kill switch was identified.
The widespread impact of the global IT outage was quite alarming for those directly affected. People were not able to withdraw money from bank accounts, supermarkets were forced to close, airline fleets were grounded, and congestion built up at major ports across the world.
Global IT outage exposes critical fault lines
The outage brings organisations like major software vendors and IT infrastructure providers into the realm of critical infrastructure, underscoring their importance to our daily lives as well as their broad socio-economic significance.
It also brings into focus the question of trust. Just as people turn on the tap in their homes to get clean water that they don’t need to test before consuming, they turn on their computers with the same level of trust not expecting to get a “blue screen of death” because of a routine update from a trusted provider.
There is a significant element of concentration risk at play. A vast majority of the world’s IT systems run on a handful of providers. Should any of them experience an outage, the results could be catastrophic, extending far beyond mere inconvenience. Such an event could compromise public health and safety, and even put lives at risk.
Minimising risk
One way to reduce concentration risk is to diversify. However, the interconnectedness of the technology provider ecosystem means that this may not be very practical.
The question of trust will arise for many of the organisations affected by the recent outage. At least some of them may be considering switching providers. This is not necessarily a wise course of action though. It would risk further disruption with no guarantee that the new solution would be as effective.
The fact remains that the likely cause of the outage was human error, and this does happen from time to time, even in the very best organisations.
This puts the focus back on the affected organisations.
Every organisation must take responsibility for its ability to function and provide services to its customers, even in the most trying of circumstances.
It matters little to your customers if an IT outage was caused by a cyberattack or a flawed software update – all they care about is that they are not disrupted.
This increases the importance of IT resilience and robust business continuity plans (BCPs). IT resilience has now become a fundamental aspect of business operations, enabling organisations to quickly recover and maintain continuity in the face of unforeseen disruptions such as that caused by the global outage.
By embedding IT resilience into their core strategies, businesses can ensure that they remain operational and competitive, and continue to serve their customers even amidst the growing complexities and vulnerabilities of the digital landscape.
Building better resilience
The introduction of regulatory frameworks such as the NIS2 Directive and Digital Operational Resilience Act (DORA) makes IT resilience and BCPs even more important.
Article 18 in the NIS2 Directive mandates that essential and important entities implement risk management measures, including advanced threat detection and continuous monitoring. Article 20 requires regular testing and updating of these measures to ensure effectiveness.
DORA, on the other hand, emphasises operational resilience in the financial sector, with Article 11 focusing on the need for thorough digital operational resilience testing, and Article 15 mandating comprehensive incident response and recovery plans.
Organisations must foster a culture of resilience through regular employee training across critical systems, ensuring quick recovery from disruptions.
By adhering to NIS2 and DORA, businesses can enhance their resilience, ensuring they remain operational and competitive amidst evolving digital threats and not just those related to cybersecurity.
In this respect, businesses should know their:
- BCPs well and test them regularly;
- resilience gaps and identify corresponding workarounds;
- third- and fourth-party technology ecosystems;
- recovery strategies and establish a clear tiering system; and
- limits around “stretch capability” partners through consistent testing.
Armed with these five “knows”, organisations will be able to recover quickly and continue to operate even during times of extreme disruption.
Puneet Kukreja is Cyber Security Leader at EY UK & Ireland