Business continuity plans are boring, but only until a major disaster such as #NA14 happens.
As I write this, it's been over eight days since Salesforce instance 14 in North America went down for over 20 hours, and up until two days ago the service was still experiencing performance issues. The outage impacted numerous organizations across the country, many of them dependent on Salesforce as a mission-critical service.
'It's Just Someone Else's Computer'
Even though we know very little about the root cause of the issue, the outage offers some relevant takeaways for every large-scale organization:
1. 'There is no cloud - it’s just someone else’s computer'
This internet meme is painfully true in the context of #NA14.
Many cloud vendors have managed to convince their customers that the cloud is somehow detached from the physical hosting equipment. This perception of cloud services is very far from the reality. The outage at Salesforce demonstrates that even the cloud market leaders don’t provide enough redundancy and have their physical data centers dedicated to specific customers and/or geographies.
2. A single hosting vendor means a single point of failure
Even if your vendor has multiple data centers with full redundancy, you remain dependent on that vendor’s processes, corporate culture, as well as financial and political risks that may impact their business. All of those factors can cause unplanned downtime of your service.
3. SLAs matter
Even though Salesforce brags about its 99.9 percent measured availability — which for what it's worth translates to 43 minutes of downtime each month — it doesn’t give any uptime guarantees. This might be acceptable if you’re running a knitting blog or a cat video website, but sounds very bizarre for a platform which drives all of its customers’ sales activity and other mission-critical business processes.
4. Disaster recovery is not enough
Even though it is better than nothing, disaster recovery isn’t enough if your revenue stream is on the line.
It takes time, it’s painful, costly and difficult to test. Its end result doesn’t always prevent data loss, which was also the case with #NA14 where four hours worth of customer data was lost. That data could include multi-million dollar sales leads that are now gone.
Always Sign the Prenup
NA14 highlighted a few faults inherent to any cloud platform:
- Cloud ties you to a single point of failure by making you dependent on a single vendor. This might or might not be a risk you can cope with, but I guarantee no space mission will ever run their systems in the cloud
- Even if you’re a Fortune 100 company, you’re still a small fish in a big fish tank as far as your cloud vendor is concerned. Restoring your service may not land you on the top of their list if they have 50 other customers of your size on the phone
- Pulling out is very costly and time consuming. Once you get on board with cloud software, it’s extremely difficult to switch vendors, because your platform is your vendor. Switching vendors implies switching platforms, starting a new software selection process, doing an implementation from scratch, etc. In the enterprise world that usually translates to a minimum of a two to three year-long project with millions of dollars of budget.
A relationship with your SaaS or a Managed Service vendor is like a marriage: it’s a great thing if it goes well, but it can be a complete disaster if it doesn’t.
To take it one step further, tying yourself to a single vendor with no backup plan or an easy way out is the equivalent of a millionaire getting married without a prenup.
Enter George Clooney and his character from "Intolerable Cruelty." If you haven’t seen the movie, I highly recommend it, but if you have, you know what George would say: always sign the prenup!
How does this translate into the technology world? Make sure you can get out of the relationship with your vendor without a disruption to your service and at a reasonable cost.
One way to achieve this is to own your software instead of going to a public cloud. This also allows you to put more risk protections in place and remove the single point of failure by working with multiple data center vendors and having multiple instances of your platform synchronized and active at all times. That means that at any given time, multiple data centers are up and running the same data. When one of your instances goes down, your traffic is automatically rerouted to the remaining ones.
Having multiple locations usually comes at an additional cost, but this can be offset by eliminating the need to use a Content Delivery Network and effectively becoming your own CDN.
The bottom line is this: the next time you’re selecting technology ask your vendor the question that George Clooney’s character would ask, “To cut to the chase, forensically speaking, is there a prenup?”