« Older Entries Newer Entries » Subscribe to Latest Posts

23 Sep 2013

Google: Dual network failure causes Gmail outage

Posted by iwgcr. No Comments

On Monday, 23rd of September, Google’s Gmail service suffered from delivery delays and a backlog of messages due to the failure of two unrelated network paths. The outage began at 5.54am PST, and the backlog of messages was cleared by 4.00pm PST, Google said in a blog post. The company said that 71 percent of messages had no delay, and that the average delay on the 29 percent of affected messages was 2.6 seconds.

Significant delays of more than two hours did occur for 1.5 percent of messages, and Google said that “users who attempted to download large attachments on affected messages encountered errors”.

 

Date

Service

Duration

Critical Data Lost

2013-09-23 Google Gmail service 10 hours 6 minutes no

 

Resources:

http://www.zdnet.com/google-docs-gmail-hit-by-service-disruption-company-says-its-investigating-7000020999/

http://www.zdnet.com/google-dual-network-failure-behind-mondays-gmail-outage-7000021182/

 

Tags: , , ,

18 Sep 2013

Amazon: Increased API Error Rates and Latencies

Posted by iwgcr. No Comments

Between 9:00 AM and 10:16 AM PDT on September 18, Amazon experienced increased error rates and latencies for RunInstances, DescribeInstances and VPC-related API requests in the US-EAST-1 Region.

Date

Service

Duration

Critical Data Lost

2013-9-18

Amazon Cloud

1 hour 16 minutes

no

 

 

 

Reference:

http://status.aws.amazon.com/

 

Tags:

16 Sep 2013

Amazon: Network Connectivity / Increased API Latencies

Posted by iwgcr. No Comments

Amazon experienced issues with network connectivity and increased API latencies. Both of the problems happened on September 16 2013.

There were connectivity issues between two Availability Zones affecting a small number of instances in the US-EAST-1 region betweeen 7:40 AM PDT and 9:49 AM PDT .

Later that day, between 1:12 PM and 3:21 PM PDT, Amazon experienced delays in returning DescribeInstances results for newly launched instances in VPCs in the US-EAST-1 region. During this time new instances in a VPC successfully launched but were not reported by DescribeInstances API calls. Also during this time some customers may have experienced increased latencies or error rates when calling the DescribeInstances API, including calls for instances that are not in a VPC.

Date

Service

Duration

Critical Data Lost

2013-9-16

Amazon Cloud

4 hours 18 minutes

no

 

 

 

Reference:

http://status.aws.amazon.com/

Tags:

13 Sep 2013

Network Issues Cause Amazon Cloud Outage

Posted by iwgcr. No Comments

On Friday 13th  2013 Amazon Web Services cloud computing  platform was having trouble in its US-EAST-1 Region between 7:04 a.m. and 8:54 a.m. Pacific Time. There were connectivity issues affecting a portion of the instances in a single availability zone.

Affected services included the mainstay EC2 compute service, load balancers, the Redshift data warehouse, the relational database service, and the simple email service, while among affected customers were Salesforce’s platform-as-a-service cloud Heroku and Github.

Amazon posted a status at 9:46am Pacific Time on the AWS status page: “Between 7:04 AM and 8:54 AM we experienced network connectivity issues affecting a portion of the instances in a single Availability Zone in the US-EAST-1 Region. Impacted instances were unreachable via public IP addresses, but were able to communicate to other instances in the same Availability Zone using private IP addresses. Impacted instances may have also had difficulty reaching other AWS services. We identified the root cause of the issue and full network connectivity had been restored.”

The Northern Virginia data center is Amazon’s oldest public-cloud facility, and has had numerous problems ranging from Elastic Block Store cockups, a huge generator failure, and even a massive general outage in summer 2012.

 

Date

Service

Duration

Critical Data Lost

2013-09-13 Amazon Web Services 1 hour 50 minutes no

 

References:

http://www.datacenterknowledge.com/archives/2013/09/13/network-issues-cause-amazon-cloud-outage/

http://www.theregister.co.uk/2013/09/13/amazon_cloud_problem_north_virginia/

http://status.aws.amazon.com/

 

Tags: ,

12 Sep 2013

Verizon mail service outage

Posted by iwgcr. No Comments

On Thursday morning, September 12, 2013, Verizon experienced a service disruption that affected email accounts. Subsequently, Verizon published a post at 12:58 PM explaining that the service has been restored in terms of sending and receiving emails, however users still couldn’t reach their message folders:

“On Thursday morning, September 12, 2013, Verizon experienced a service disruption that affected your email account.  You now are able to send and receive email normally and will continue to have access to your calendar and address book. 

We are diligently working to recover any of your email folders that cannot be seen at this time. When the issue is fully resolved, email previously sent or received and any personal folders you created prior to this interruption will be available.

While Verizon is working to resolve this issue, you can view updates as new information us available on verizon.com/outage.

If you regularly use a POP email client such as Microsoft Outlook, Yahoo or Mac Mail you may not have noticed any adverse impact outside of the temporary inability earlier today to connect to your servers. 

We sincerely apologize for any inconvenience this disruption has caused.”

It was not until September 16th 2013 that Verizon solved all of the problems connected to this outage:

“Users that have accessed their email within the last 30 days have been completely restored. Engineers have been working around the clock and will continue to do so until all email folders for customers who were affected have been restored. We are making progress in the restoration, however, this may take several days before all customers have been completely restored. When the issue is fully resolved, email previously sent or received and any personal folders that customers created prior to this interruption will be available.”

Date

Service

Duration

Critical Data Lost

2013-09-12 Verizon mail service 5 days no

 

References:

http://forums.verizon.com/t5/Verizon-net-Email/September-Email-Service-Issue-Daily-Updates/td-p/623397

 

Tags: ,

5 Sep 2013

Yahoo Mail service down for some users

Posted by iwgcr. No Comments

On Thursday September 5th 2013, Yahoo Mail service went down for some users. Yahoo Customer Care’s Twitter account confirmed the problem began as early as 6:30 AM PDT on Thursday, with several sites that monitor the status of Web services also reporting outages. Yahoo representatives had not returned requests for comment by press time.

It’s unclear how many users were affected by the outage; unlike Google and Microsoft, Yahoo doesn’t provide a services dashboard to update users on problems. Instead, sites like DownDetector and DownRightNow track tweets and complaints to gauge the severity of the outage.

 

Date

Service

Duration

Critical Data Lost

2013-09-5 Yahoo Mail Service 12 hours no

 

References:

http://www.techhive.com/article/2048215/yahoo-mail-service-goes-down-for-some-users.html

http://downdetector.com/status/yahoo-mail/news/2264-problems-at-yahoo-mail-2

Tags: , ,

4 Sep 2013

Twitter suffers an outage

Posted by iwgcr. No Comments

For an interminable 40 minutes on Wednesday September 4 2013, some tweeters had to hit the pause button on their urge to share their thoughts.

Twitter said it resolved a brief outage that made it difficult for some users to access the micro-blogging service. The company, in a blog post, blamed the issue on a code-related error, which caused a number of Web servers to go down for about a half hour. Some users were unable to access the Twitter site during that time.

Access to Twitter via mobile apps was not affected.

A Twitter spokeswoman declined to comment beyond the content of the company’s blog post.

Date

Service

Duration

Critical Data Lost

2014-9-4

Twitter

40 minutes

no

 

References:

http://blogs.wsj.com/digits/2013/09/04/twitter-suffers-brief-outage/

http://status.twitter.com/post/60295900552/twitter-site-issue

 

Tags:

2 Sep 2013

Amazon: Increased Instance Launch Error Rates

Posted by iwgcr. No Comments

Between 7:03 AM and 10:04 AM PDT on September 2nd, Amazon experienced increased launch error rates and delayed updates for VPC instances in the US-EAST-1 region. Delayed updates impacted VPC configuration changes.

Date

Service

Duration

Critical Data Lost

2013-9-2

Amazon Cloud

3 hours 1 minute

no

 

 

 

Reference:

http://status.aws.amazon.com/

Tags:

30 Aug 2013

No Routing at OVH

Posted by iwgcr. No Comments

OVH, Europe’s largest hosting provider, has experienced more than 5 hours of downtime due to poor quality assurance of the upgrade process of 88 routers of one of its largest data centers.

Date

Service

Duration

Critical Data Lost

2013-08-29 OVH 5 hours no

Reference:

http://travaux.ovh.net/?do=details&id=9224&PHPSESSID=832aac480663351e4809707d3cf372fc

 

27 Aug 2013

Twitter image service issue

Posted by iwgcr. No Comments

Twitter’s DNS registrar experienced an issue on August 27 2013 in which DNS records for various organizations were modified, including one of Twitter’s domains for image serving, twimg.com.  Viewing of images and photos was sporadically impacted from 20:49 to 22:29.

Date

Service

Duration

Critical Data Lost

2013-8-27

Twitter

1 hour 40 minutes

no

 

References:

http://status.twitter.com/post/59528478030/twitter-service-issue