As a Software-as-a-Service (SaaS) company, a big part of the user experience (UX) for our customers is keeping our software up and running 24/7.
Yes, design, ease of use, user interface and many other pieces of the puzzle are important as well. However, when the whole system goes down, it really causes wrinkles for everyone.
Flash back to Tuesday, Feb. 28, 2017 … Amazon’s Web Service’s Simple Storage Service experienced a three hour and 39 minute disruption that had ripple effects across other Amazon cloud services and many internet sites that rely on the popular cloud platform.
My company, PandaDoc, was unfortunately one of the many victims of this outage, and our customers were without service for several critical hours.
Yes, disruptions, errors and outages are a fact of life in the cloud. And there’s usually no reason to panic when these things happen.
However, the Feb. 28 outage is a wake-up call to make sure cloud-based applications, including PandaDoc, are ready for the next time the cloud hiccups.
Preparing for an Outage
Here are four tips for preparing yourself for a cloud outage:
1. Spread the Risk
Don’t rely on one service for everything. The idea here is that if you deploy an application or piece of data to a single point in the cloud, it will not be very fault tolerant.
Depending on how available you want your application to be will determine how many points you spread your application or data across. The ultimate protection would be to deploy the application across multiple providers, for example using Microsoft Azure, Google Cloud Platform or some internal or hosted infrastructure resource as a backup.
One key to responding to a cloud failure is knowing when one happens, then ensuring that plans B and C are ready to pick-up where plan A dropped off.
2. Build Redundant Systems
It is very difficult to respond to on outage in real-time if you don’t have a plan B or C already in place. Preparation before the outage will save you when it inevitably comes. And it will come, that is certain…
3. Back-Up Your Data
It’s one thing to have redundant systems, it’s another thing to back your data up. This was especially important in the Feb. 28 disruption, because the outage initially impacted Amazon’s most popular storage service.
4. Test (and Retest) Your System
Don’t wait for an outage to occur to see if your system is resilient to failure. Test it ahead of time, and then test it again.
It may sound like too much work, but the best cloud architects are willing to kill whole nodes, services and regions to see if their application can withstand it. You should constantly be pushing your own system, working to finding vulnerabilities.
In the end, UX for SaaS companies is only as good as the services that power the software. So the best thing you can do to keep those customers satisfied is prepare, prepare, and then prepare again for the next unforeseen disruption.