Xero taken offline by massive US data centre failure
One of the drawbacks of cloud computing was dramatically illustrated today as Rackspace - one of the world's largest web hosts - when offline for 45 minutes. New Zealand’s Xero was one of many SaaS (software-as-a-service) providers knocked out by the failure, with glitches continuing for hours.
The accounting software provider went offline around 8.30am as Rackspace, which hosts all of Xero's data, was hit by a still-unexplained, catastrophic failure.
All Xero servers were back up and running by 9.10am chief operations officer Alistair Grigg told NBR.
Some customers were still reporting problems logging on through the morning and early afternoon, as recorded on Xero’s blog. Mr Grigg says these were cookie and DNS (domain name server) issues, which were resolved by asking customers to restart.
Rackspace’s catastrophic outage has shocked the computer community worldwide.
The fault was caused by a power failure at the US company’s giant data centre in Dallas.
But with Rackspace maintaining server farms around nine locations in the US, UK and Hong Kong, it’s not clear why a failure at one facility took its systems completely offline.
Mr Grigg says Rackspace has multiple utility providers, backup generators and uninterruptable power supply systems at its Dallas facility, as would be expected from one of the world’s largest data hosts.
So far, no official explanation has been forthcoming.
The power fault also took out Rackspace's own website and help centre, adding to the confusion. It was left to the company's Twitter account to relay the disaster to the outside world.
When Rackspace came back online, it was running on a mix of utility and backup power, Mr Grigg notes.
The COO speculates that “there must have been some pretty significant component failure possibly at the point where maintenance work was being done”.
The outage marks Xero’s first ever hosting failure during its first two and a half years of operation. Mr Grigg says only a handful of UK and Australian customers were affected, due to the timing of the outage.
Has his faith in the cloud been shaken?
No. Mr Grigg says that after 27 months, Xero still has a record of more than 99.99% uptime.
He said the company could look at a second cloud host, but that such a disaster recovery switch typically took several hours, whereas Rackspace got Xero back online within 45 minutes (some customer with more complex systems took longer).
Xero customers could take some comfort in they were among millions hits - Rackspace hosts sites and services for more than 62,000 companies - with US singer Justin Timberlake leading complaints via, what else, Twitter.