Member log in

Power failure blamed for Air NZ IT crash

UPDATE: IBM's backup generator failed to kick-in for Air NZ

The IT crash that brought Air New Zealand’s computer system to its knees yesterday was caused by a power failure, IBM confirmed to NBR this afternoon.

The catastrophic failure occurred despite IBM bragging earlier this year about a new disaster recovery system it had put in place for the airline. In a May 27 update, around the time the multinational filed its local result, IBM included Air New Zealand in a customer success story round-up called "Delivering value to clients". It read:

"As part of an initiative to manage operational costs, increase flexibility and reduce risk, Air New Zealand engaged IBM to migrate its technology infrastructure to a virtualised environment, providing improved business continuity, increased reliability and performance, improved disaster recovery and a significant reduction in the total cost of ownership."

Today, IBM released the following statement: "At 9.03am on Sunday morning during a scheduled maintenance check at IBM's Newton Data Centre, the back up generator experienced a power failure, which caused a disruption to its clients.

"One client's system in particular was badly affected on Sunday, but IBM's technical support team has now reinstated all client services."

Flamed by Fyfe
In an internal email revealed today by Computerworld, Air NZ chief executive Rob Fyfe - who comes from an IT background - blasted IBM, writing:

Air New Zealand chief executive Rob Fyfe has lashed out at IBM in an internal email about yesterday’s mainframe crash that crippled services and disrupted thousands of passengers.

“In my 30-year working career, I am struggling to recall a time where I have seen a supplier so slow to react to a catastrophic system failure such as this and so unwilling to accept responsibility and apologise to its client and its client’s customers.

“We were left high and dry and this is simply unacceptable. My expectations of IBM were far higher than the amateur results that were delivered yesterday, and I have been left with no option but to ask the IT team to review the full range of options available to us to ensure we have an IT supplier whom we have confidence in and one who understands and is fully committed to our business and the needs of our customers.

More than 10,000 customers were affected by delays yesterday when the IBM run system faulted. Air New Zealand and IBM are conducting a series of crisis talks today to determined what went wrong yesterday.

Air New Zealand apologised to customers today saying it “should never have happened.”

Air New Zealand group general manager (for short haul) Bruce Parton said: “Air New Zealand work extremely hard to provide world class service and were simply unable to do that. We are sorry customers faced delays of up to two hours as we used manual processes for check in and boarding.

Mr Parton said the outage started at 9am and was running again by 3pm. “The backlog has now been cleared at most airports and the schedule is returning to normal.”

The IT crash also affected Air New Zealand’s booking system for around four hours.

More by Kelly Gregor and Chris Keall

Comments and questions
15

This silly weasel word for power failure should not be 'outage' it should be 'outrage'. There are far too many of them and Air New Zealand aren't the only victims.

IBM didn't deliver
Telecoms Visionstream didn't deliver
Lets stop listening to spin doctors, lets put in some penalty clauses that mean something, then they might actually deliver on their promises.

I work for a global telco here in NZ, we are tied into a 10 year contract with IBM - their ability to non deliver on a monthly, weekly, daily basis astounds me.

Behind the powerpoint, posturing and parrotting of cliche its all BS. I don't think any of these outsourcing deals in NZ or elsewhere have ever resulted in anything other than higher cotss and lower service quality. I don't understand why Air NZ would entrust these vital systems to IBM or anyone else - I am sure they could do much better running them themselves.

Finally people can see the problems with outsourcing, trouble is accountants live in a different world to the people this type of thing effects.
Stop listening to the bean counters, inhouse does work and doesn't need to cost anymore with a loyal workforce behind them.

Who is to blame? Is the CIO at the end of the day ultimately responsible for the crash - whether or not systems are outsourced or managed by third parties? Or is IBM to blame for not having in place fail over systems? Or both? A back-up disaster recovery site can prevent something like this from happening together with good disaster/backup and recovery solutions in place. The ultimate price is paid by the stranded passengers. The finger pointing begins in an endless battle between the flying kiwi's and big ol' blue. It could have been prevented...

I can't believe the naivety on display here.
Air NZ should run their IT in-house? With which resources? With whose money?
Managing infrastructure via a data centre makes eminent sense, both from a cost and an expertise perspective, but ONLY if the supplier can display high levels of reliability and retrieval.
But don't forget that this is happening in New Zealand, where the CBD was without power for weeks and where power lines fall over when it rains. (Nelson last week)
A real data centre will have 3 separate sources of power, banks of batteries to provide temporary power until one of the 2 diesel generators (both kept on warm standby) kicks in and which have initial fuel supplies for 48 hours.
And you don't build data centres in the middle of a city.
And if Computerworld is right, it wasn't even IBM's own data centre
"IBM has a datacentre in Wellington but not in Auckland, though it has been using the Air New Zealand facility there, part of which is sub-let to AT&T. "
Full story here http://bit.ly/175iKV

You can't prevent outages. No provider will guarantee 100% system availability and Service Level Agreements reflect this. 97-98% is fairly standard, reflecting the need to schedule planned downtime for maintenance. However, if you're not achieving 99.x% service availability, you're not doing your job. And 99.5% is a downtime of 4 minutes a day or 2 hours a month.
Because of this, you schedule maintenance outside core business hours and you have contingency plans to minimise business disruption.
Simple as.
But don't expect machines never to fall over.

So John you can't believe our naivety.
Your the spin doctors dream, got it all out of a text book I think.
Companies/Corporations cut corners to meet KPI's and every so often they pay the price.

I believe outsource (to IBM or whoever) can be one of the main contributors of this case. Defining SLA may be helpful, having penalty clause may be even better, but if the system is down, it is down and customers are affected, no matter what was defined to protect the investment.
But is it possible that the internal IT guys are not doing their part to review/drill the system setup by someone else? For this kind of problem, its the IT management problem, more than technical/contractual problem.

A business critical server for Air NZ should be duplicated in another city. If IBM Wgtn should side into the harbour, generators and all, a server backup in Chch should seamlessly take over.

When real disasters happen they can really stuff infrastructure and arguably IT infrastructure is the most valuable we have.

Systems have to be battle tested so they are capable of sustaining multiple damage, and still fly.

I also recommend your mid level people be let loose on a risk analysis exercise in an attempt to anticipate the Back Swan events that the experts have not thought of.

@Alan.
Actually, no. Providing hosting services to 4 of the major Star Alliance airlines at senior executive level.
Believe me, you don't get this stuff out of a textbook....

If I was CIO of a major corp, I'd probably already have reached in my pocket and pay for an out-of-region data centre to cope with loss of the primary one.

When a site power test was planned and communicated to me, I might get my staff to assess the risk and switch data centres in advance. Alternatively I would have them on site ready to instigate a site switch.

I don't think I'd whine like a little girl though. Almost certainly not.

If you're a CIO and you've outsourced your IT production to a service provider, it's not up to YOU to make sure that everything doesn't go pear-shaped if there's a power cut.

That what the service/hosting provider's paid for.

You'll hopefully have made sure during the selection/due diligence process that the provider has the resilience to ensure that your business runs smoothly when bits of his infrastructure fall over and that you are consulted and have veto rights for maintenance scheduling.

If you've done all that (which is a mere subset of what hosting evaluation is all about) and you then hold your provider to account for catastrophic service failures that they assured you would never happen, that's not "whining like a little girl"

Mind you, you get what you pay for and if IBM is providing hosting services merely by taking over the running of Air NZ's data centre in the middle of town with no mirrors at remote locations (as appears to be the case), then that just seems a bit too risky.

Fact. Nobody outsources if all their systems are running fine and in a cost effective manner. From what I've seen clients dump their poor decisions to date on a provider and hope they will fix them. They expect 100% redundancy, something they have not provided themselves but they are not prepared to pay for it. At the end of the day the provider delivers to the contract. The outburst by Air NZ shows a lack of professionalism and understanding of what he actually signed for.

Keep in mind Air NZ own the building and IBM operate it on their behalf. The equipment within it are Air NZ's

Post new comment or question

Login to use your NBR member name
Full HTML is not supported but you can use the following tags in your comments:
Link: <url>link</url>
Quote: <quote>text</quote>