Amazon’s Cloud Outage Catches Most Clients Offguard
The recent Amazon cloud outage at its Northern Virgina data center from 5 am Thursday, April 21, 2011 to roughly 5 am Friday, April 22 has shaken the confidence of some executives on public cloud computing. Most notably, FourSquare, HootSuite, Reddit, and Quora publicly suffered visible performance issues. The industry’s reassurances in the past on up time performance and massive redundancy capabilities combined with the massive corporate adoption had everyone believing that public clouds were bullet proof. As calmer heads prevail, most CIOs, business leaders, and analysts realize that:
- Cloud outages are rare but can happen. While most organizations can not deliver 99.5% up time let alone 90% performance, disruptions can and will happen. The massive impact to so many organizations last week highlights potential vulnerabilities of betting 100% of capacity in the cloud. More importantly, it showed that broad adoption does not equate with bullet-proof reliability. Most organizations lacked a contingency plan.
- Cost benefit ratios still favor cloud deployments. For most organizations, the cost of deploying in the cloud remains a factor of 10 cheaper than moving back to the traditional data center or even a private cloud. Capital costs for equipment, labor for managing the data center, excess software capacity, and the deployment time required to stand up a server create significant cost advantages for cloud deployments.
- Current service level agreements lack teeth and should be improved. Most organizations lack teeth in the cloud/saas contracts to address service level agreement failure. Despite all backups and contingency plans, clients should consider scenarios where core business systems go down. What remedies are appropriate? What contingencies for system back up are in place. Who is responsible for disaster recovery? Will the vendor provide liability and for what?