The Amazon Web Services outage earlier this week that took down popular websites like Reddit, Airbnb, and FourSquare, as well as many start-ups, has people questioning the wisdom of putting their faith in “the Cloud”. It has not helped that some very well capitalized companies, rather than accepting any of the responsibility for the outage, have chosen instead to blame their infrastructure provider. We see that as analogous to getting stranded with a flat tire and then blaming the auto manufacturer because you decided to forgo the optional spare tire.
The fact of the matter is while a some companies suffered service degradation and a few were down hard, many other companies carried on just fine. The difference? The services that did not suffer a major disruption had loosely coupled and distributed networks, while those that did had one or more single points of failure in the affected area. Keep in mind that every data center, no matter how robust, can and likely will suffer outages from time to time. How services are architected on top of physical infrastructure is what makes the difference between whether a data center outage means small loss of capacity or whether it leads to extended downtime. Amazon, as well as other Cloud infrastructure providers offer a number of tools to help their customers avoid single points of failure. It is up to their customers however, to take full advantage of them.
While not every service disruption is preventable, it is important for companies to realize that they are the ones ultimate responsible for any disruptions that do occur. Netflix to their credit, has admitted to lessons learned and noted areas where they can improve. This is not to let Amazon off the hook, but rather a reminder that Cloud is not in and of itself redundant. It is however, a great platform to make services redundant.