23 May 2013

3 difficult days for Rackspace Cloud Load Balancers

Posted by iwgcr

The Cloud Load Balancers of Rackspace has suffered many issues early this week.

On May 19, an incident “Cloud Load Balancers is currently experiencing a degradation of service.” is reported at 04:32 PM EDT and closed 10 minutes later. The next day, at 09:47 AM EDT same report but not closed before 12:34 PM EDT.

Rackspace Cloud Load Balancer engineers has given more information about this issues:

Load Balancer Nodes ztn-n09 and ztn-n10 in our ORD1 data center […] experienced a rare capacity issue from a combination of active load balancers, new provisioning requests, and overall traffic. This caused both of the affected nodes (ztn-n09 and ztn-n10) to attempt to shift their traffic to the failover node (ztn-n12) simultaneously, which in turn affected network connectivity for the instances supported by ztn-n09 and ztn-n10. After several attempts to restart services for ztn-n09 and ztn-n10, engineers determined that a reboot of all four nodes in the cluster was required to restore services.

After rebooting one of the nodes, engineers discovered that the previous node failures had corrupted the global configuration files. This contributed to the inability to add the ztn-n09 and ztn-n10 back into the cluster. The corruption was corrected and the original two problem nodes were restarted and began to take traffic from ztn-n12.

On May 21, two more incident concerning Cloud Load Balancers was reported. The fist at 07:29 AM EDT on the node ztn-n05 for 30 minutes and the second at 02:51 PM EDT on global Cloud Load balancers nodes for 15 minutes.

Date	Service	Duration	Critical Data Lost
2013-05-21	Rackspace	15 minutes	no
2013-05-21	Rackspace	0,5 hours	no
2013-05-20	Rackspace	3 hours	no
2013-05-19	Rackspace	10 minutes	no

Tags: downtime , Load Balancer , Rackspace

Browse
Follow
Follow @iwgcr
Blogroll
Categories
- Disasters
- Downtime
- Corporations
- News
- About IWGCR
Data Center Outages from GDCN

RSS Error: WP HTTP Error: SSL certificate problem, verify that the CA cert is OK. Details: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
Recent Posts
Recent Comments
- dubai motels * where to get the best offers on Zoho cloud apps service disrupted
- Tx Seo company on The Franco-American Cedexis joined IWGCR
- Certificate Expirty Leads to Total Outage For Microsoft Azure Secured Storage on Azure SQL Reporting Services outage: 5 days and counting
- Understanding the December 2012 Microsoft Azure Outage | Cloud Computing Today on Windows Azure: Storage at South Central US goes down for 77 hours
- iwgcr on Amazon ESB degradation shutdown many web sites
Archives

3 difficult days for Rackspace Cloud Load Balancers

Browse

Follow

Blogroll

Categories

Data Center Outages from GDCN

Recent Posts

Recent Comments

Archives