23 Oct 2012

Amazon ESB degradation shutdown many web sites

Posted by iwgcr

On Monday 22 October, Amazon ESB of US-EAST-1 Region, a cloud-based block storage service that allows users to store large amounts of data, used in conjunction with AWS’s Elastic Compute Cloud (EC2), experienced elevated API failures and delays launching, updating and deleting.

This issues started at 10:38 AM PDT and has impacted many major services, such as Reddit, Foursquare, Minecraft, Heroku, GitHub, imgur, Pocket, HipChat, Coursera, FastCompany, Flipboard, Payvment and many others. At 1:02 PM PDT Amazon suggest to launch replacement instances in the unaffected availability zones. At 2:20 PM PDT traffic was shifted away from the affected Availability Zone for customers with multi-AZ ELBs.

At 10:54 PM PDT, Amazon status said:

ELB has now completed recovery of nearly all affected load balancers. We will continue to work to restore IO for the remainder of volumes and will reach out via email to affected customers that own those volumes should action be required on their part. Volumes affected earlier in the day are continuing to re-mirror (which we expect will take several more hours) and while this process continues, customers may notice increased volume IO latency.

(source1, source2, source3)