18 Nov 2013

Rackspace Inc’s server hosting outage affects Pipedrive Inc

Posted by iwgcr

Pipeline Inc, a CRM and pipeline management software experienced an outage on November 18th 2013. An update was posted on Pipeline Inc’s blog stating:

“We had an outage of our core services (application and REST API) between Mon 00:35 AM – 2:35 AM UTC / Sun 4:35 PM – 6:35 PM PST time. Customers who tried to use Pipedrive received a 502 error message. The service is now restored, and I am very sorry for the trouble this caused.

We worked hard on restoring the service, and also got back to those of you reaching out to us on customer support. In case you have any questions or concerns, feel free to get in touch.

There is no loss or damage to your data caused by this outage. This is because the problem occurred in one of our central accounts databases which contains only basic information about the accounts — but none of the actual content. Plus, this database is thrice backed up, as is all of our data.

For those interested in tech details — the issue started with a seemingly complete crash of the database server process. However none of the open connections got dropped, so not all of the alarms started ringing the moment the problem occurred. It took about the amount of connection timeout to start the alarm bells, and only after that we identified the problem had occurred somewhere much deeper than just the regular database service layer as the VM stopped responding after forcing a restart — in fact, the problems seem to have occurred as deep as on the KVM virtualization layer inside OpenStack. The exact cause is unknown for us at the moment but we escalated the issue to our service provider at Rackspace, and their senior technicians are investigating this. Meanwhile, we spinned up our services using the real-time backup that we have of this database, and things are working properly again.”




Critical Data Lost

2013-11-18 Rackspace 2 hours no