8 Aug 2012

Configuration error outage Windows Azure

Posted by iwgcr

Microsoft public cloud application hosting and development platform was unavailable for about two and a half hours, Microsoft didn’t say how many customers were impacted.

Issue was a “safety valve” mechanism in the Azure network infrastructure designed to prevent cascading network failures  by capping the number of connections that network hardware devices accept.

Prior to this incident, we added new capacity to the West Europe sub-region in response to increased demand.
[…]
increased management traffic in turn triggered bugs in some of the cluster’s hardware devices, causing these to reach 100% CPU utilisation impacting data traffic

Source

Date

Service

Duration

Critical Data Lost

2012-08-03 Windows Azure 0.5 hours No