Survived an AWS outage? Don’t gloat!

Remember a few weeks ago when the Great EBS Outage took place? Everyone became a software scalability architect overnight. Lots of posts were written about how most startups didn't know what they were doing by using ebs. Netflix in particular was the talk of the town. Their service wasn't affected at all because they didn't rely on EBS. While sites like Foursquare, Reddit and Quora were down, Netflix boasted that:

For Netflix, the short answer is that our systems are designed explicitly for these sorts of failures. When we re-designed for the cloud this Amazon failure was exactly the sort of issue that we wanted to be resilient to. Our architecture avoids using EBS as our main data storage service, and the SimpleDB, S3 and Cassandra services that we do depend upon were not affected by the outage.

Awesome. So last night, all those other sites were up but Netflix was down. What happened? SimpleDB was out cold for three hours.

To Netflix's credit, their tone of their post back then was measured. Other companies which essentially lucked out to different extents were not so restrained. Lesson learned: if your service survives a major outage, don't rub it in the faces of startups who make their best out of their scrappy budgets!

Leave a Comment Cancel Reply