8 Nov 2013

Microsoft’s Office 365 experiences mail delays

Posted by iwgcr

Office 365 experienced an issue on November 8th 2013 that resulted in prolonged mail flow delays. A post was published on Office 365’s blog explaining the problem:

“The first event occurred on November 8th from 11:24AM to 7:25PM PST.  This service incident resulted in prolonged mail flow delays for many of our customers in North and South America.  Office 365 utilizes multiple anti-virus engines to identify and clean virus messages from our customers’ inboxes. One of these multiple engines identified a virus being sent to customers, but the engine started to exhibit a lot of latency even as it handled the messages.  To compound the issue, our service was configured to allow too many retries and provide too long of a timeout for these messages.  Given the flood of these specific emails to some of our service capacity, this improper handling caused a significant backlog of valid email message throughput in these units.  We resolved the issue by deploying an interceptor fix to deal with the offending messages and send them directly to quarantine.  Going forward, we are instituting multiple further levels of defense. In addition to fixing the engine handling, we now have instituted more aggressive thresholds for deferring problem messages.  We have also built and implemented better recovery tools that allow us to remediate these situations much faster, and we are also adding some additional architectural safeguards that automatically remediate issues of this general nature.”




Critical Data Lost

2013-11-8 Microsoft Office 365 8 hours 1 minute no