« Older Entries Newer Entries » Subscribe to Latest Posts

10 Jan 2014

Dropbox goes down for 2 days

Posted by iwgcr. No Comments

The file storage and sharing site went offline on Friday, January 10 2014 and continued to suffer problems even after returning to life over the weekend. The remaining issues were corrected and the core service was restored as of Sunday 4:40 PM PT, according to Dropbox. For example, one issue prevented Dropbox users from sharing folders.

On Friday at 5:30 PM PT, we had a planned maintenance scheduled to upgrade the OS on some of our machines,” Dropbox’s head of infrastructure, Akhil Gupta, said in a blog posted on Sunday. “During this process, the upgrade script checks to make sure there is no active data on the machine before installing the new OS. A subtle bug in the script caused the command to reinstall a small number of active machines. Unfortunately, some master-slave pairs were impacted, which resulted in the site going down.”

Date

Service

Duration

Critical Data Lost

2014-1-10

Dropbox

47 hours

no

Resources:

http://techcrunch.com/2014/01/12/dropbox-outage-internal-maintenance/

http://news.cnet.com/8301-1023_3-57617116-93/dropbox-outage-sparked-by-buggy-server-upgrade/

Tags:

6 Jan 2014

Telstra cloud console goes offline for two days

Posted by iwgcr. No Comments

Telstra has confirmed that the management console for its corporate cloud platform went offline for some of its customers for two days starting January 6 2014, in the second demonstration in less than a year that the company’s cloud computing environment may not yet be as stable as the company would like customers to believe.

Telco’s Cloud Services Management Console had been intermittently experiencing major outages since the morning of 6 January. The issue did not cause a loss of service for the virtual machines and other cloud infrastructure which Telstra runs for customers in its datacenters, but did block access to management utilities and details about those machines, as well as the ability to purchase new or redistribute existing services.

The issue was serious enough that Telstra’s senior executive team, including the company’s chief executive David Thodey, was being updated regularly on the issue. Telstra is believed to have called in external assistance from at least one of its vendor suppliers on the issue, believed.

Telstra issued the following statement on the issue: “There were no connectivity issues to virtual servers or internet connections, but some Cloud customers may have had difficulty accessing the Cloud Services management console to view or modify services. The matter was identified on 6 Jan and rectified yesterday (8 Jan) evening. Overnight monitoring and subsequent checks have confirmed functionality has been fully restored. The incident has not affected our 99.9 per cent platform uptime commitment as customers could still access cloud environments throughout this time.”

 

Date

Service

Duration

Critical Data Lost

2014-1-6

Telstra cloud services

48 hours

no

 

Resources:

http://delimiter.com.au/2014/01/13/telstra-cloud-console-goes-offline-two-days/

6 Jan 2014

Fiber line cut cuts off a portion of Amazon Web Services

Posted by iwgcr. No Comments

Starting around 12:30 a.m. Jan. 6, several AWS customers found their services disrupted, starting at first with laggy performance and glitches before the services went offline.

By 2:45 a.m., more than two hours later, the services started coming back online, although it still took some time for Amazon to get all of the cloud services and applications back online.

But according to Amazon — and confirmed by its Service Health Dashboard – the public cloud leader did not suffer an outage, but instead the inability for users to connect to various AWS-hosted cloud services was due to an east coast Internet service provider experiencing a cut fiber line on January 2.

 

Date

Service

Duration

Critical Data Lost

2014-1-6 AWS 2 hours 15 minutes no

 

Resources:

http://talkincloud.com/public-cloud/010714/east-coast-blizzard-takes-down-amazon-customers

 

31 Dec 2013

Bluehost, HostGator Among EIG Brands Hit by Massive New Year’s Eve Outage

Posted by iwgcr. No Comments

Endurance International Group brands Bluehost, HostMonster, HostGator, and JustHost have ended 2013 with a massive outage affecting customers worldwide.

The outage struck Bluehost, Hostgator, Hostmonster and Justhost around 1:37 EST and the impact was felt around the world. Thousands of customers have been without access to their emails or websites. Customers have taken to Twitter to express their outrage with Bluehost and the other hosting companies that are experiencing an outage.

Bluehost released a statement earlier in the day stating, “Our Provo UT data center is currently experiencing some outbound network traffic connectivity issues. Admins are on site and looking into the issue.”

Finally, the services were restored at 4:50 pm ET Services for the majority of customers, according to EIG. However, no official statement was issued on what caused this outage.

Date

Service

Duration

Critical Data Lost

2013-12-31 HostGator, Bluehost, HostMonster, JustHost 3 hours 13 minutes no

 

Resources:

http://dailyglobe.com/34425/bluehost-hostgator-hit-new-years-eve-outage/

http://www.thewhir.com/web-hosting-news/bluehost-hostgator-among-eig-brands-hit-massive-new-years-eve-outage

http://americanlivewire.com/2013-12-31-2013-12-31-bluehost-and-hostgator-crash/

 

26 Dec 2013

A week-long Myer website crash due to ‘communication breakdown’: IBM

Posted by iwgcr. No Comments

The website of major Australian retailer Myer was brought down for over a week during the post-Christmas sales due to a “communication breakdown”.

“An IBM team of local and global experts worked around the clock with Myer to resolve the issue with its online store. The technical issue was caused by a communication breakdown between internet servers and a software application,” according to an IBM spokesperson.

“IBM and Myer will work together to conduct a thorough review to ensure this issue does not reoccur. IBM is committed to supporting Myer to continue to provide high-quality service to its customers.”

After advertising “Australia’s biggest stocktake sale” in the lead up to Boxing Day, visitors to its website on December 26 discovered that the website had been taken down, with the company saying on its Facebook page that it was “experiencing some technical difficulties” on its site.

It wasn’t until January 2, a full week later, that the website was back up and running. The retail giant said it would be monitoring the site and offering free shipping as compensation for failing to fulfil its advertised availability during the Christmas period.

However, many customers have continued to be unable to shop online, with Myer stating as late as Sunday that people were “still experiencing difficulties with the site”.

Brookes placed the monetary loss on the sales at “less than 1 percent of our business, which translates to AU$31 million”.

Date

Service

Duration

Critical Data Lost

2013-12-26 IBM 1 week no

 

Resources:

http://www.zdnet.com/myer-website-crash-due-to-communication-breakdown-ibm-7000024799/

 

12 Dec 2013

After Yahoo’s Mail outage, Flickr crashes too

Posted by iwgcr. No Comments

Some users reported that photo-hosting site Flickr was down on December 12 2013. Reports have indicated that the outage appears to have started around 8 a.m. PT.

In a statement to CNET, Yahoo apologized for the outage. “We are aware that Flickr is currently unavailable and we are working quickly to fix the issue. We apologize to affected users,” a spokesperson said.

Date

Service

Duration

Critical Data Lost

2013-12-12 Flickr 1 hour no

 

Resources:

http://news.cnet.com/8301-10811_3-57615441/after-yahoos-mail-outage-flickr-crashed-too/

http://techcrunch.com/2013/12/12/yahoos-bad-week-continues-flickr-crashed-today-too/

Tags: ,

9 Dec 2013

‘Hardware problem’ causes partial outage for Yahoo Mail

Posted by iwgcr. No Comments

On the 9th  of December 2013 a lot of Yahoo Mail users started complaining about issues when trying to access the service. Soon after that, the company confirmed that one of its Mail servers was acting up, and that the issue had proved to be much harder to fix than anticipated. Yahoo assured users that it had teams working on it round the clock to bring everything back up. The Mail issues seem to have been taken care, and with all that done and dusted, Yahoo CEO Marissa Mayer took to the company’s official blog to apologize for the prolonged outage.

She also detailed exactly what had happened. Mayer writes that there was a specific hardware outage in one of Yahoo’s storage systems that serves 1 percent of its users, but the “problem was a particularly rare one.” Since different users were affected in different ways, the resolution for affected accounts was nuanced. However during the week, Yahoo’s team worked “around the clock” to restore access as well as all messages to Mail inboxes. IMAP access was also improved for users who use clients such as Apple Mail or Outlook to access Yahoo Mail. Since then, Mayer says, access has been restored to “almost everyone” and the backlog of messages has been delivered.

 

Date

Service

Duration

Critical Data Lost

2013-12-09 Yahoo mail service 3 days no

 

Resources:

http://www.ubergizmo.com/2013/12/yahoo-ceo-marissa-mayer-apologizes-for-mail-outage/

http://news.cnet.com/8301-10811_3-57615298/hardware-problem-causes-partial-outage-for-yahoo-mail/

 

Tags: ,

25 Nov 2013

Amazon: Increased API error rates

Posted by iwgcr. No Comments

Between 1:51 PM and 2:37 PM PST, Amazon experienced increased error rates and latencies for the EC2 APIs in the US-EAST-1 Region.

Date

Service

Duration

Critical Data Lost

2013-11-25

Amazon Cloud

46 minutes

no

 

 

 

Reference:

http://status.aws.amazon.com/

Tags:

21 Nov 2013

Amazon Instances unavailable in a single Availability Zone

Posted by iwgcr. No Comments

Between 8:10 PM and 10:25 PM PST some instances experienced connectivity issues in a single Availability Zone in the US-WEST-2 Region. The issue has been resolved and the service is operating normally.

Date

Service

Duration

Critical Data Lost

2013-11-21

Amazon Elastic Compute Cloud (Oregon)

2 hours 15 minutes

no

 

 

 

References:

http://status.aws.amazon.com/

Tags:

21 Nov 2013

Xbox.com, other Microsoft sites hit by Microsoft Azure service disruption

Posted by iwgcr. No Comments

A handful of Microsoft’s online services, including Xbox.com, appeared to be experiencing a global outage on Thursday, November 21st  afternoon.

The outage, which Twitter users began reporting around 3 p.m., also affected Outlook.com and Office365.com, other online services hosted by Microsoft’s Azure cloud computing service. Error messages generated during attempts to visit the sites indicated it might be a DNS issue:

 “Service Unavailable – DNS failure

The server is temporarily unable to service your request. Please try again later.

Reference #11.83b302cc.1385076207.1118e9”

The degree to which sites were affected appears to be varied. At one point Thursday afternoon, every region around the world was experiencing “a full service interruption” to its storage services, according to Microsoft’s Windows Azure Services Dashboard. But a later update on the dashboard indicated that the services were back to normal operation in each region.

The outage to Xbox.com comes just hours before the midnight launch of the long-awaited Xbox One, the $499 successor to the Xbox 360 gaming consol.

The issue was solved at 4 p.m.

Date

Service

Duration

Critical Data Lost

2013-11-21 Windows Azure 1 hour no

 

Resource:

http://news.cnet.com/8301-10805_3-57613414-75/xbox.com-other-microsoft-sites-hit-by-service-disruption/

 

Tags: , , ,