5 Mar 2013

Juniper Routers go down CloudFlare

Posted by iwgcr

On Sunday March 03, at 09:47 UTC, CloudFlare dropped off the Internet. The outage affected all of CloudFlare’s services including DNS and any services that rely on their web proxy. During the outage, anyone accessing CloudFlare.com or any site on CloudFlare’s network would have received a DNS error. Pings and Traceroutes to CloudFlare’s network resulted in a “No Route to Host” error. CloudFlare currently runs 23 data centers worldwide, connected to the rest of the Internet using routers.

Matthew Prince, Cofounder and CEO of CloudFlare said:

The cause of the outage was a system-wide failure of our edge routers. CloudFlare currently runs 23 data centers worldwide. These data centers are connected to the rest of the Internet using routers. These routers announce the path that, from any point on the Internet, packets should use to reach our network. When a router goes down, the routes to the network that sits behind the router are withdrawn from the rest of the Internet.

CloudFlare use Juniper routers and propagate their router rules by using a protocol called Flowspec. in response to an attack on one of the CloudFlare customer DNS, someone on the CloudFlare operations team spread a discard rule corresponding exactly to the attack profile: a packet-length between 99,971 and 99,985 bytes long.

Matthew Prince precise:

Flowspec accepted the rule and relayed it to our edge network. What should have happened is that no packet should have matched that rule because no packet was actually that large. What happened instead is that the routers encountered the rule and then proceeded to consume all their RAM until they crashed.

[…]

We have already reached out to Juniper to see if this is a known bug or something unique to our setup and the kind of traffic we were seeing at the time.

Source: CloudFlare post-mortem explanation

Date	Service	Duration	Critical Data Lost
2013-03-03	Cloudflare	1 hour	no

Tags: CloudFlare , DNS , downtime , routers , Software bug

Browse
Follow
Follow @iwgcr
Blogroll
Categories
- Disasters
- Downtime
- Corporations
- News
- About IWGCR
Data Center Outages from GDCN

RSS Error: WP HTTP Error: SSL certificate problem, verify that the CA cert is OK. Details: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
Recent Posts
Recent Comments
- dubai motels * where to get the best offers on Zoho cloud apps service disrupted
- Tx Seo company on The Franco-American Cedexis joined IWGCR
- Certificate Expirty Leads to Total Outage For Microsoft Azure Secured Storage on Azure SQL Reporting Services outage: 5 days and counting
- Understanding the December 2012 Microsoft Azure Outage | Cloud Computing Today on Windows Azure: Storage at South Central US goes down for 77 hours
- iwgcr on Amazon ESB degradation shutdown many web sites
Archives

Juniper Routers go down CloudFlare

Browse

Follow

Blogroll

Categories

Data Center Outages from GDCN

Recent Posts

Recent Comments

Archives