A New Trend?
All of a sudden, everyone seems to want to be out of service, what with Amazon's problems, then Sony's PSN service catastrophe (still rolling after many weeks) and Blogger.com's 48 hour vanishing act (so far), it seems only fair that Microsoft joins in the fun.
Exchange Online and Web Access Services were hit by a spate of problems that started earlier in the week, limiting access to the services. Microsoft said about the issues which struck late on Tuesday,
The BPOS-S Exchange service experienced an issue with one of the hub components due to malformed email traffic on the service. Exchange has the built-in capability to handle such traffic, but encountered an obscure case where that capability did not work correctly. The result was a growing backlog of email. By 12:00am PDT, the malformed traffic was isolated and the mail queues cleared. The delays encountered by customers varied, on the order of 6-9 hours. Short term mitigation was implemented and a fix was under development.
All-in-all, a neat response to an isolated problem which was professionally dealt with. However, the issue struck again the next day, but was fixed faster and resulted in shorter delays. Unfortunately, a related incident kicked off shortly after that one was solved resulting in a delay for around 1.5 million emails. That obviously took rather a long time to clear, which brings us today, which Microsoft states,
In an unrelated incident, starting at 1:04am PDT, service monitoring detected a failure in the Domain Name Service (DNS) hosting the http://mail.microsoftonline.com domain. This failure, prevented users from accessing Outlook Web Access hosted in the Americas, and partially impacted some functionality of Microsoft Outlook and Microsoft Exchange ActiveSync devices. The team diagnosed, and fixed, an underlying problem in the servers hosting Domain Name Service (DNS) for the http://mail.microsoftonline.com domain, and restored service at 4:52am PDT. The team identified a number of improvements in our handling of problems associated with DNS, and will provide a full post mortem of this incident available through Microsoft Support.
So, while Microsoft gets away with some minor inconvenience, anyone currently touting the cloud as the future (hello, Google) really needs to think about how all of these outages will affect the perception of business and consumers.
Would you really put your faith (and data) solely in the hands of a single cloud-provider? And who will be the service that makes a big name for itself during one of these events by offering a split-cloud approach that can work around these kinds of issues which saves some major face/money and effort?