AWS blames human error, typo, for Wednesday's massive internet outage

Xero chief executive Rod Drury

UPDATE March 3: Amazon has blamed human error and a typo for a glitch with its giant Amazon Web Services cloud-hosting platform, which caused problems for many websites and apps over 11 hours on Wednesday.

In a post post-mortem, the company says a staffer was investigating an issue with a billing system when they entered a command to take a limited number of AWS servers offline. The person incorrectly typed the command, leading to many servers going offline.

EARLIER: Xero, Instagram, other services hit by widespread AWS outage

March 1: The giant Amazon Web Services (AWS) is reporting high error rates with its S3 hosting service — meaning interrupted service for AWS clients from Xero to Instagram.

The problems first occurred around 8.30am NZT and have so far lasted around an hour.

NBR would poke fun at Xero boss Rod Drury but the free version of NBR Radio is on AWS client SoundCloud and misbehaving.

AWS, owned by Amazon (the e-tailer) has emerged as the world's largest cloud hosting platform.

A technical glitch in June last year affected a wide range of AWS-hosted services in New Zealand, from Westpac online banking to Domino's online pizza ordering to Spark's Lightbox streaming video on-demand service.

Despite the odd spill, AWS' client list keeps growing. Last week, NZX-listed Orion Health said it was migrating 110 million patient records to Amazon's cloud.

Mighty River Power and Spark Ventures are among AWS's other hero customers.

Amazon has yet to give an indicative time for resolving its technical problems, which are hitting sites and services intermittently rather than taking them offline altogether.

The June 2016 outage, which lasted for around half a day for some AWS-hosted services, was blamed on a backup and redundancy systems failing to kick-in after an Amazon data centre in Sydney was hit by power failure during a major storm.


11 · Got a question about this story? Leave it in Comments & Questions below.

This article is tagged with the following keywords. Find out more about MyNBR Tags

Post Comment

11 Comments & Questions

Commenter icon key: Subscriber Verified

so would you rely on a cloud system.
About time we all got back to basic and managed our own affairs instead of so called relying on such as the cloud.
The bigger they are the harder they fall.

Reply
Share
  • 1
  • 0

Amazon is always going to get an avalanche of publicity around each outage - because they effect so many businesses at once.

But it doesn't happen much. For most organisations, cloud hosting means more uptime than when they used to host on servers inhouse.

Reply
Share
  • 0
  • 0

uptime for AWS is much better than hosting your own stuff. (not to mention cheaper and more scalable)
Plus you can blame a 3rd party if it does go down!

Reply
Share
  • 0
  • 0

I think that debate is done and dusted. You'd be hard pressed to find a credible vendor investing in product to help you do that even if you wanted to.

Reply
Share
  • 0
  • 0

I've been hosting in the cloud for years. Far easier, more reliable and lower TCO than managing my own infrastructure. I've got some sites affected by this outage, but only on S3, which I use primarily for backups. Despite that - I'd rather explain that Amazon is down but they are working on it, than explain that somebody cut the fibre to my in-house server room (for example).

Reply
Share
  • 0
  • 0

Actually Chris, you are wrong in asserting that the cloud provides more up-time than that available using in-house systems.
By the mid 1980's 99.999% (5 nines) uptime was the standard for important systems and I have worked on a lot of these. Trading rooms, retail banks, equity managers, communications companies are examples of organisations that are never likely to commit important parts of their infrastructure to the cloud or even third party hosting.
Cloud is great for configurable infrastructure and cost savings but it will never get to 5 nines because of the way it is managed.

Reply
Share
  • 1
  • 0

Interesting that Xero did not have any fail-over systems in place.

There's a reason why people have on-premise and this is exactly it!

For the amount of customers Xero has surely they can spend some $ and go to on premise, that way all our data will be in NZ too.

Reply
Share
  • 0
  • 0

I can only imagine how that human feels having to face up to Bezos on this error.

Reply
Share
  • 0
  • 0

Why on earth doesnt Bezos use those friendly Microsoft certified technicians from India who keep calling me they have been notified of an error in my pc by their servers, sounds like just the ticket for AWS

Reply
Share
  • 0
  • 0

Every silver cloud has a dark lining

Reply
Share
  • 0
  • 0

The biggest issue everyone has missed here, is that if a "User Error" can bring down the biggest cloud platform in the world, what happens when someone actually wants to bring it down? There are some nasty people out there with hidden agendas and they're smart too!

Reply
Share
  • 0
  • 0

Post New comment or question

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.