Ten steps to mitigate system outage risk

No comments

IT system failure can strike at any time, yet many companies remain blissfully ignorant of the harm that even a short outage can cause to their business

SSP

There are few better examples of how the cost of a seemingly innocuous IT system outage can spiral than the crash of British Airways’ (BA) global network, reportedly caused by the accidental disconnection of a power supply in April this year.

BA’s system went down for just 15 minutes, yet the fallout led to flight delays and cancellations for three days, affecting some 75,000 customers, with BA’s passenger compensation bill under EU Regulation 261/2004 reportedly close to £80m.

It’s not just airlines that can suffer devastating losses as a result of system outage. Online businesses rely wholly on data feeds for their income. Banks could face crime losses if payment systems are compromised. System failure at a utilities or energy provider would have wide-reaching financial consequences, while manufacturing supply chains can grind to a halt if just one supplier in the chain is forced to halt production.

“Any company that is highly dependent on technology is at risk,” says Sarah Stephens, head of cyber, technology and media errors and omissions at JLT. “Those that can’t easily revert to manual back-ups – the check-in and boarding process at an airport, for example – or those whose systems are dependent on one another, creating a domino effect, have huge vulnerabilities.”

The BA debacle highlighted how airlines are particularly vulnerable. Flight disruption is hugely expensive due to refunds, missed onboard sales revenues and passenger compensation costs. Airlines also have multiple points of vulnerability – not just their own IT systems but those of global distribution systems, travel agents, ground handlers and the airports themselves.

“All it takes is one link in that chain to go down due to system failure and everybody will be affected, particularly the airlines, which will have to cancel or delay flights,” says Jamie Monck-Mason, executive director in Willis Towers Watson’s (WTW) cyber team.

When it comes to malicious attacks, every sector should consider itself a target. Hackers do not discriminate and will seek to cause disruption where they see monetary value. However, while such attacks tend to dominate media coverage and have historically been the focus of the cyber insurance market, the ever-present risk of human error and technical system malfunction is just as dangerous.

“Given some of the big outage events we’ve seen recently, it’s astonishing that people are still so focused on cyber attacks and are overlooking the risk of system failure,” says Monck-Mason, noting that last year, transport sector participants in WTW’s Transportation Risk Index – many of whom where from airlines – identified the failure of critical IT infrastructure as their biggest concern.

Assessing the risk

The first step to mitigating any risk is to first identify and quantify it within your organisation. In the case of system outage risk, this requires a thorough audit of internal computer networks to establish every potential point of exposure. This should also be extended to third-party IT vendors and key suppliers who may be unable to supply essential components or services if their own systems go down.

“We’re seeing awareness and responsibility for cyber business interruption increasing at board level, but it is still not widely understood,” says Matt Webb, group head of cyber security at underwriter Hiscox.

“Know your supply chain and where your critical points of failure are. Run your own scenarios to work out how long you can afford to be down for. This should help identify where you may need to build redundancies into certain parts of the process. It is crucial you get that right,” he says.

Two key steps are to identify which parts of the operation and supply chain would have the biggest financial impact in the event of an outage, and to establish a recovery time objective that would keep losses to a reasonable level, says JLT’s Stephens.

“Sometimes there’s a natural division by business unit, and sometimes by specific systems,” she explains. “List which critical systems you rely on in each business unit. What’s the functionality of each of these systems? Do you earn revenue from this system? How long does an interruption have to be before you start losing money rather than just delaying that revenue?”

Stephens advises risk managers to map out redundancies and dependencies within the system. “If one system goes down, is it in a vacuum, does it trigger other systems, or vice versa? Once you’ve applied any mitigating factors to the interruption calculations and mapped out any dependencies, you start to get a better picture of how bad a loss can be.”

According to Stephens, cyber underwriters prefer to see isolated systems where possible and minimal dependencies when assessing an organisation’s risk. They will also assess the extent to which control over systems is outsourced to third parties.

Redundancies and infrastructure

“Underwriters will want to know what alternatives you have for each of the systems and what technical safeguards underpin your recovery time,” says Stephens. “If you tell them you can get your system back up and running in four hours, they will want you to explain how.”

Having a plan B is of course vital, whether in the form of internal redundancies or the use of external back-up systems such as standby IT vendors or payment processors. Companies should maintain back-ups of key data and where possible have alternate feeds in place to ensure business operations can get back online as soon as possible.

The cost of installing, running or outsourcing back-up systems in this way may appear prohibitive to some – until disaster strikes, leading to a damaging loss that could have been avoided.

“Redundancies come with a price tag, and this will ultimately come down to the appetite of senior board members,” says Dean Chapman, risk management executive – cyber risk at WTW.

“The cost of investment isn’t always clear to see, but the fact that you have avoided a system outage or strain of ransomware should be evidence enough of a return on investment.”

Another cost that causes trepidation in many CFOs is updating legacy IT infrastructure. Many organisations in both the public and private sectors still use old systems, relying on patching to keep pace with the latest processes and security updates.

Webb says: “[Global ransomware attack] WannaCry showed the amount of Windows XP machines still being run is absolutely huge. Updating legacy systems is not sexy compared to investing in shiny new modern operating systems, but events like this show how crucial it is.”

However, there is no silver bullet when it comes to technology. Simply throwing money at the shiniest new IT system will only go some way towards offering adequate protection.

“It’s always a combination of people, process and technology,” says Stephens. “Can one person pull a plug at a data vendor and cause a system failure, or do you have safeguards in place?”risk of error

A safeguard, she says, could mean requiring two people to disconnect a power line, for example, or in the case of heavy industry, having a physical override that kicks in if a power line is unplugged or a system hacked.

Various cyber risk mitigation standards can be used to help guide improvements to internal safeguards and processes, from the US National Institute of Standards and Technology’s (NIST) Cybersecurity Framework to the UK’s IASME standard for SMEs.

However, experts agree that the first and best defence against system outage is to equip staff with the knowledge and training to minimise the risk of human error.

The human factor

Implementing comprehensive, documented IT system and cyber-security training is a crucial first step in reducing human risk factors and creating a culture of awareness, and should be a priority for senior managers across all business sectors.

“If you have an effective cyber culture driven from the top of the organisation and take the time to train and educate your workforce, you are far less vulnerable to outages and also cyber attack,” says Monck-Mason.

“One of the quickest risk mitigation wins, yet one that is overwhelmingly overlooked, is the human factor,” adds Chapman.

“Millions are spent on high-end technical defences, but these are rendered useless if the humans operating those networks are unaware of the risks they face. Education is absolutely key.”

Third-party vendors

External data vendors present a serious outage risk as they often have direct control over business-critical data and systems. Selecting the right vendor and conducting due diligence and risk assessments is essential, as is ensuring robust contract certainty to ensure it assumes an agreed level of financial liability in the event of an outage.

While major network vendors have been historically inflexible in their terms, JLT’s Sarah Stephens notes a softening of approach. She advises companies to enter these relationships with their “eyes open” and clearly outline expectations if data flow is interrupted.

Mayer Brown partner Brad Peterson says companies should build contracts with clear, enforceable commitments around business continuity and back-up requirements, and should require third-party certifications such as ISO 27000 certification or ISAE 3402 audit reports, notice of data security incidents and early warnings on technical and financial risks.

Steps to mitigate system outage risk

• Conduct a thorough audit of internal and external network exposure points

• Identify business-critical systems and business units

• Quantify the potential financial impact from various outage scenarios

• Isolate systems and reduce dependencies where possible

• Consider applying redundancies and back-ups to essential systems

• Stress-test cyber defence systems and safeguards using third-party experts

• Update legacy IT infrastructure where appropriate

• Design, implement and test a thorough incident response plan

• Buy cyber insurance coverage that includes business interruption

• Train staff in system and cyber security protocols to minimise risk of error