Rackspace CEO We Screwed Up During Cloud Reboot

2014-1-October-dropped-balls.jpg

Rackspace's CEO didn't mince words in an email to customers yesterday. He admitted the company made communication mistakes as it worked this week to patch a security vulnerability affecting certain versions of XenServer, a popular open-source hypervisor.

Taylor Rhodes, CEO and president of the San Antonio, Texas-based public and private cloud hosting provider, said the problem ultimately forced a reboot for about a quarter of Rackspace's 200,000 customers.

"In the course of it, we dropped a few balls," Rhodes said. "Some of our reboots, for example, took much longer than they should. And some of our notifications were not as clear as they should have been. We are making changes to address those mistakes. And we welcome your feedback on how we can better serve you." 

'Short Notice' Maintenance

Rackspace posted an "urgent notice" on its website early Saturday notifying customers of cloud server reboots in light of a potential problem with its public cloud environment.

The news came around the same time the company promised a 99.99 percent OpenStack API uptime guarantee for its new release of its private cloud software on its cloud computing open source OpenStack creation.  

In Rhodes' letter this week, obtained by CMSWire through Rackspace's media relations team, Rhodes told customers that Rackspace, like other major cloud providers, was forced to reboot some of its customers’ servers. "This maintenance was especially difficult for many of you because it had to be performed on short notice, and over the weekend," he said.

Rackspace cloud customers told CMSWire they didn't expect multiple-hour long downtimes, and some complained about poor timing for reboots.

The issue has been "fully remediated without any reports of compromised data among our customers," Rhodes said in the letter. Hence, the Xen community has lifted its embargo on talking about it.

Tough Choices

Thumbnail image for 2014-1-October-Rackspace.jpg

Rackspace, when alerted of the issue, faced a "balancing act" -- being transparent with customers versus opening the door for potential cyber attacks.

"We want to be as transparent as possible with you, our customers, so you can join us in taking actions to secure your data," Rhodes said. "But we don’t want to advertise the vulnerability before it’s fixed — lest we, in effect, ring a dinner bell for the world’s cyber criminals."

Issues such as the Xen bug must be fixed "swiftly and quietly," Rhodes said. 

"This particular vulnerability could have allowed bad actors who followed a certain series of memory commands to read snippets of data belonging to other customers, or to crash the host server," the CEO said. "We wanted to flag the issue as quickly as possible to those of you using our Standard, Performance 1, and Performance 2 Cloud Servers, and our Hadoop Cloud Big Data service. But we didn’t want to do so until we had a software patch in place to address the vulnerability."

Reboot Plan

Rackspace engineers learned of the security issue early last week. Their plan was to develop and test a patch. It was ready last Friday night and the technical details were scheduled to be publicly released today. Rhodes explained:

We were faced with the difficult decision of whether to start our reboots over the weekend, with short notice to our customers, or postpone it until Monday. The latter course would not allow us to sufficiently stagger the reboots. It would jeopardize our ability to fully patch all the affected servers before the vulnerability became public, thus exposing our customers to heightened risk. We decided the lesser evil was to proceed immediately, at which time we notified you, and our partners in the Xen community, of the need for an urgent server reboot."

Commitment to Customers

Rhodes referred in his letter to Rackspace's traditionally "strong record of open, timely communication." He added:

We reach out to you whenever there’s an issue. We answer the phone whenever you call. We do everything we can to find a solution. This past weekend, our engineers worked tirelessly with customers and partners to remediate the Xen vulnerability. As a veteran Racker who is proud of our commitment to our customers and their businesses, I am personally sorry for any inconvenience or downtime that we caused you during this incident."

Title image by Chaval Brasil  (Flickr) via a CC BY-NC-SA 2.0 license. Second image by Scott Beale / Laughing Squid  (Flickr) via a CC BY-NC-SA 2.0 license.