Member log in

IBM's backup generator failed to kick-in for Air NZ

A failed oil pressure sensor on a backup generator could be the reason Air New Zealand’s computer system crashed on Sunday for six hours, IBM confirmed this morning.

The IT failure caused disruptions to more than 10,000 Air New Zealand customers, who were delayed by two hours on average. The computer failure affected the company's booking and check in systems. Passengers had to be manually checked in, which caused further delays.

Air New Zealand apologised to its customers yesterday saying the incident should never have happened.

Air New Zealand chief executive Rob Fyfe strongly criticised IBM’s reluctance to accept responsibility for the disaster. “We were left high and dry and this is simply unacceptable,” Mr Fyfe said.

Yesterday, IBM blamed the crash on a power failure but the issue lies within its disaster recovery plans for clients such as Air New Zealand, which was unable to regain control of its system due to a fault within a backup generator.

IBM has released the following statement: “The cause of yesterday's power outage at the Newton Data Centre has not been fully determined.

“IBM's primary focus was to rapidly restore services to our clients, and in particular to Air New Zealand.

"IBM immediately engaged a team of 32 local IT professionals supported by global colleagues and management to restore impacted client systems. Services to most clients were restored within an hour of the outage.

“We have already engaged an independent expert to conduct a thorough investigation into the cause of the outage.

"However, the likely cause appears to have been a failed oil pressure sensor on a backup generator. We regret any inconvenience caused to our clients or their customers.”

The computer failure affected Air New Zealand's systems from 9am to 3pm on Sunday. Air New Zealand declined to comment on IBM's latest statement due to commercial sensitivity.

More by Kelly Gregor

Comments and questions
11

I think you will find that Air New Zealand's relentless drive to reduce IT costs over the past 5 years has come home to roost. They have not been willing to pay for a proper DR system with real time data replication due to the costs - it is expensive. This is not IBMs fault. Essentially Air NZ took a business risk to save money so Rob Fyfe needs to look a lot closer to home and stop using IBM as a scapegoat. If IT is so mission critical to them (which it clearly is) then they should have a Business Continuity Plan (BCP) that involves (at some level at least) replication of key systems across to a secondary site. And BCP is more than IT. It addresses how a business will operate in a set of scenarios - failure of a primary datacentre is the most obvious one. Failure to have and failure to execute a BCP is a major failing from Air NZ's management and Rob Fyfe needs to front up and take responsibility. The pay for redundnacy in aircraft systems...but they have to due to regulation. Maybe if they had a choice they would remove them too as a money saving initiative...

Air NZ had another mainframe in Auckland. Wonder what happened to that? Has that been cancelled to save five cents?

But several people have been managed carefully to move on... I wouldn't trust Big Blue to run a bath let alone a mission critical app after having dealt with their complete lack of urgency... there's something about their emphasis on the customer that changes dramatically once the cheque's been banked. Just appalling.

One would have expected IBM, as the IT provider to Air NZ, to have had in place a DR facility that would have kicked in when the Newton facility failed. Modern high speed comms enable such to be achieved.

Should the client verify that his provider has a viable DR plan and system in place. Obviously he should.

It would seem that maintenance of the Newton site standby power facility may not have been up to scratch, if the reporting can be believed. Alas regular on load testing of standby power power plants is not the norm at too many vital IT installations.

Hi,
WE used an IBM system several years ago for a small business but because of continual problems and poor service we finally moved to another provider.

TS

How many "experts" missed the single point of failure?One oil-pressure sensor failure on the backup genset caused this?Whoever designed and signed off on this should hang their head in shame.Air NZ and IBM share the blame.

It is difficult to follow why AIR NZ Management are playing the blame game with their Service Provider IBM. The buck stops with AIR NZ who should have had an adequate and fully tested 'Emergency Response Plan (ERP) to cover the possibility of a glitch like this. Technology based ERP's are normally very expensive to establish and sustain therefore an ERP of this nature would prodominatly rely on manual back up processes to be enacted should the ERP be required.
I had first hand experience and was effected by the chaos after landing around 1400 Sunday at Auckland international and finding the AIR NZ Int/Dom Transfer desk inoperable and having to get family and luggage across to domestic for connecting flighs was unacceptable, with 5 International flights landing with 30 of ours the 20 minute operated inter terminal bus would not cope with this demand. My connecting flight for Wellington scheduled for 5:30pm finally took off at 7:05pm not the experience one needs after 16 hours travel.
I would like to acknowledge the AIR NZ ground staff who were placed very much in the fron line to deal with angry and frustrated customers I hope that AIR NZ recognises the efforts of these people and rewards them accordingly. From the obvious chaos the efforts and toil of the regualr and called out AIR NZ staff was thwarted by inadequate back up and processes to deploy in such an emergency.

IBM is responsible for maintaining system availability. They're paid to do it. Period.

That said, every airline has an ERP.
It works thus:
Suspend check-in and wait for 5 minutes in the hope that things will come right.
Duty officers to frantically try to make sense out of the service provider and establish a retrieval time.
If estimated retrieval time >15 minutes, go manual.
Which involves:
Doing a print dump of all the bookings for all the flights from your station for the next couple of hours (if the service provider supports it, which it should)
Checking in passengers manually, not knowing how many (and who) have already checked in and if you have connecting passengers (and how many)
Trying to get more resources in to manage the chaos
Accommodating disrupted passengers on later flights, not knowing how many (and who) have already checked in and if you have connecting passengers (and how many)
Doing weight and balance sheets manually (and believe me, it doesn't take long to get rusty when you don't do them every day)
And so on

To summarise:
No manual ERP can hope to work as smoothly as a IT based business process. How can it?
You're instantly reverting to the days when you could only check in at the airport, you chose your seat assignment from a board shaped like an aeroplane and there were 20 check-in positions
These days, you run your operation with minimal staff with skill sets that equip them to follow a menu-driven process on a screen.

If Air NZ gets you onto a flight with only a 95 minute delay in this environment, they've done pretty well.

What does this have to doe with the IBM systems...the power went out. IBM does not manufacture backup generators as far as I know. Seems to me this is primarily the airlines fault and they are looking for a scape goat to save face.

Air NZ signed a contract with IBM for IBM to run its data centre.

If the scope of the agreement is anything like the one IBM has with Cathay Pacific, it'll cover:

* operation of the airline's mainframe computers
* systems management
* asset management
* technical services, and
* business recovery services

Air NZ pays IBM to do this.

IBM makes sure that the airline can focus on its core business - getting passengers and cargo from A to B at a profit to Air NZ.

IBM screwed up, especially on the last point

That's it.

This is typical of the multi-nationals operating in NZ. They are generalists and average at best when it comes to service. Stick to making servers and leave the services game to those designed to deliver!

DC

Post new comment or question

Login to use your NBR member name
Full HTML is not supported but you can use the following tags in your comments:
Link: <url>link</url>
Quote: <quote>text</quote>