Joshua Liberman has a warning for those who have their heads—and business applications—in the clouds. If cloud services or the connectivity to them are down, then so are your users. And since either a service or broadband failure alone can bring productivity to a halt, the perceived availability rate even for well-managed cloud offerings can effectively be 99 percent or less. That translates to days of downtime a year.
“That will improve. It’s already dramatically better. But it’s not a utility. It’s not anywhere near that reliable,” says Liberman, president of system builder and network integrator Net Sciences Inc., based in Albuquerque, N.M.
Even if a public cloud computing solution delivers the 99.9 percent, or “three nines,” availability it promises in its service-level agreement, a typical user can expect 8.76 hours of downtime a year. Murphy’s Law ensures that outage will occur precisely when a deadline looms.
What, then, can channel pros do to help customers keep the cloud-based systems their clients rely on continuously up and running?
Benchmarking and Monitoring
Straightforward solutions exist for some aspects of the problem. For example, deploying multiple circuits can significantly reduce a company’s exposure to connectivity failures. That strategy comes with important caveats, however. First, it’s only feasible in areas with access to multiple broadband options.
Second, redundancy alone won’t solve your clients’ connectivity issues. Download and upload speeds, as well as how many users share the same bandwidth, are important factors too. A limiting transfer rate that is too low in relation to the amount of data being moved can effectively render a cloud-based application too sluggish to be useful.
As for safeguarding customers from cloud service failures, Liberman’s advice is to evaluate service providers primarily on their performance and track record rather than price. Those variables are better at indicating the likelihood that a provider will be around in the future, Liberman says, and if a provider goes out of business the availability of its services will be no nines at all.
Cloud provider performance information, along with a range of other metrics, is available from cloud benchmarking services such as CloudHarmony, from the Monarch Beach, Calif. company of the same name, and Detroit-based Compuware Corp.’s CloudSleuth. CloudHarmony, for example, collects and publishes roughly a hundred metrics from several dozen cloud providers.
Nasuni Corp., which makes next-generation enterprise storage solutions combining local and cloud-based repositories, is a good source of storage-related performance data. The Natick, Mass.-based company stress-tests cloud storage services for file reading and writing speeds, response time, and availability, and makes the results available on its website.
“We are saying, ‘Here are the component makers and here is how they rank in the market, according to the use cases that you need,’” says Andres Rodriguez, Nasuni’s CEO. “If you need it for deep archiving, all of these vendors are fine. If you need it for high-performance data synchronization, the list narrows.”
Cloud monitoring services can be helpful in measuring performance too. For example, myCloudWatcher, of Granite Bay, Calif., gathers data at regular intervals from multiple monitoring points. Drawing on those figures, technicians can not only gauge a cloud service’s availability, but also track transaction times for tasks like completing a purchase on an e-commerce site.
Going Multiple
Still, even a service provider with excellent performance specs can experience service failures. While the overall failure rate of the service as a whole may be low, there could still be a dozen or more physical server failures each month, notes James Staten, a vice president and principal analyst with Forrester Research Inc. That can be a problem for organizations with just one instance of an application that happens to be running on an unlucky machine.
To counteract that danger, Staten recommends utilizing multiple application instances and dispersing them geographically. Most large public clouds are composed of multiple independent zones, he notes. “Customers that want to be protected against a zone-wide outage, which is a major effect, need to make sure that they’re spread across multiple zones,” Staten says.
In addition, make sure the cloud service providers you do business with offer rolling updates, advises Joe Brown, president of Accelera Solutions Inc., of Fairfax, Va. This allows technicians to take down component instances one at a time, patch them, and then bring them back online, thereby eliminating what would otherwise be maintenance-related downtime. “You’re never taking the whole segment of the solution stack offline,” Brown notes.
Of course, even businesses with multiple instances supported by rolling updates are vulnerable to partial service failures, like when an application instance on a heavily loaded server is functional but operating at glacial speeds. Launching multiple application instances and then killing the slowest of them is one possible response. Caching instances in memory can help as well, especially if you use multiple host machines.
Liberman cites another effective way to prevent cloud service outages from impacting your clients: Avoid using the cloud for applications that demand nonstop uptime. His firm, for example, relies on the cloud chiefly for tasks like archiving that usually aren’t time critical.
And when all else fails, Liberman adds, consider using a less intuitive strategy for minimizing cloud service downtime: Go local. The cloud-based archiving service he uses is located miles away from Net Sciences’ home office, so if something goes wrong and a terabyte of data has to be replaced quickly, he can always turn to a solution that works regardless of bandwidth limitations and connectivity failures.
“You can drive across town and pick up a drive,” Liberman says.