2 Jan 2012

VMware Cloud Foundry: 2 Outages

Posted by iwgcr

Cloud Foundry is an open platform as a service, providing a choice of clouds, developer frameworks and application services.  VMware’s open source service suffered to bouts of downtime during April 25 and April 26. As SiliconAngle report, “Shaky Start for VMware Cloud Foundry: 2 Outages”. In fact, the first downtime incident which was detected at 5:45 a.m. April 25 was caused by a power outage in the supply for a storage cabinet. Applications remained online but developers weren’t able to perform basic tasks, like logging in or creating new applications. The outage lasted nearly 10 hours and was fixed by the afternoon. Charles Babcock wrote on InformationWeek.com some sentences about the power supply problem.

VMware officials accidentally caused a second outage while developing an early detection plan to prevent the kind of problem that hit the service the previous day, but day two of the outage was far more serious. On Tech.SlashDot.org forum, people comment this outage: “VMware Causes Second Outage While Recovering From First”.

Dekel Tankel, one of the primary builders and managers of CloudFoundry.org, posted on Cloud Foundry support website an analyse of the serious downtime:

“This was to be a paper only, hands off the keyboards exercise until the playbook was reviewed. Unfortunately, at 10:15am PDT, one of the operations engineers developing the playbook touched the keyboard. This resulted in a full outage of the network infrastructure sitting in front of Cloud Foundry. This took out all load balancers, routers, and firewalls; caused a partial outage of portions of our internal DNS infrastructure; and resulted in a complete external loss of connectivity to Cloud Foundry.”

Date

Service

Duration

Critical Data Lost

2011-04-24 VMware Cloud Foundry 10 hour No