AS EXCITING AS “GOING CLOUD” can be for SMBs, for some, only on-premises infrastructure and applications will do, whether that’s because of bandwidth, data set, regulations, or simply customer choice. We have devised an alternative known as the “premises cloud,” based on a fast, flexible, and scalable network design that is hosted on premises at a cost that is within the reach of larger SMBs. The guiding principle here, as Einstein famously said, is: “Everything should be made as simple as possible, but not simpler.”
Hyperconvergence is a new way to describe the integration of computing, networking, and the software-defined network, often by means of clustered servers and shared storage. In the SMB space, this means a storage-area network (SAN), perhaps a chassis switch, and a clustered server design. Back in 2009, when Intel created the Intel Modular Server System, we saw an early incarnation of this technology. It offered six compute modules, an integrated SAN, and up to 20 Layer 3 switch ports, along with redundant power and storage controllers, all in a 6U chassis. This was well ahead of its time, but represented “Premises Cloud 1.0” well.
With the advent of VMware and Hyper-V, clustering and failover came into the SMB space, and the concept of hyperconvergence and the premises cloud came into its own. When paired with a fault-tolerant SAN providing Cluster Shared Volumes (CSV) storage, redundant 10Gb connections, and a rock-solid, dual-power, chassis-based switch, the reliability of clustering is within reach. Of course, trying to provide this on an SMB budget is just one challenge. Working with potentially limited space and conserving power are others. Let’s look at how you can bring this premises cloud to your SMB practice.
Steps to a Premises Cloud
Form Factor. Most SMBs that have grown large enough to embrace rack-mounted gear have at least 25U to work with, so space can get tight. Let’s start from the top of one of these designs and work our way down. We’ll figure on 2U for two firewalls in high-availability (HA) mode. Next in our design come the server nodes—we’ll assume 1U compute nodes with dual-power, hot-swap drives, and dual CPUs, which will add 3U more to that footprint. Let’s add another 1U on either side of this compute stack and then move on to the chassis switch.
The HP chassis switches are 7U, and with another 1U on either side, we are now up to about 16U. If you factor in another 2U for a rack-mounted BDR appliance and another 6U for a large, rack-mount UPS and external battery pack, you’ve hit 24U with a bit of space to spare. These are just suggestions, and if you have more room to work with, use it for airflow and cable routing. But the point is, 25U will suffice. Incidentally, other benefits of this design are reasonable requirements for power and cooling, but more on those later.
Compute Nodes. The Hyper-V hosts, which are just compute nodes with minimal storage needs, are the “brains” of the design. We generally go with 1U nodes, as they now offer up to 24 cores, or more with dual CPUs, and can easily support 256GB of memory at reasonable cost using 16GB sticks. They also offer dual-power, out-of-band management, and up to eight 2.5-inch drive bays, though generally speaking we don’t use more than three. Remember, storage simply isn’t an issue on these compute nodes, as that is handled by the SAN. That’s why we call them compute nodes; they’re all about compute and memory.
We typically use midrange, dual Intel Xeon processors with eight to 12 cores each and 128GB to 256GB of memory. All of this is completely dependent upon the number of virtual machines you’ll have across the cluster and the nature of their utilization, so build to suit. Most of these designs will support up to 512GB of memory. We also use only enterprise-grade SSDs, typically two in a mirror or three in a stripe, managed by a dedicated RAID controller.
Storage Design. With the advent of Windows Server 2012 R2 and 2016, it has become possible to avoid the expense and complexity of both an external domain controller (DC) and the traditional type of shared storage that an iSCSI or Fiber Channel SAN provides. Microsoft calls this SAN-less architecture, or “shared nothing.” Despite these changes, most network architects still prefer to deliver clusters with an external (out of cluster) DC and traditional SAN storage, and we deliver our solutions exclusively in that manner.
The ideal SAN for the solution is one that is fast, expandable, and highly available. In a 2U form factor, you have several options, but these specs are what we consider the opening bid for modern SANs: 24 (2.5-inch) drive bays, a fault-tolerant (dual-container) design, dual power, and dual integrated 10Gb ports. After this you can look for extras, such as the ability to do storage tiering (think caching with SSDs), daisy-chaining additional units together, and additional 10Gb ports. We check all those boxes with the Lenovo S3200 SAN in our designs.
Chassis Switch. In the past, interswitch bandwidth was always a limiting factor of discrete switches. Even with multiple bonded 1Gb ports or expensive dedicated 10Gb stacking switches, there was just no practical way to deliver backbone speed performance once 48 ports were exceeded. Chassis switch solutions change all that, and can save you money because only the chassis needs dual power rather than each switch; and the backplane provides more internal bandwidth than stacking discrete switches ever could.
In our design, we were able to take a 10Gb connection out from all three nodes to the SAN, and a 10Gb connection from each node to the chassis switch. With the right SAN and chassis switch, you can easily provide 16 or more 10Gb connections and hundreds of terabytes of storage in your design, as well as the ability to stack switches and SANs for even more storage and throughput. Adding compute nodes to an existing Hyper-V cluster for performance or fault tolerance is almost as easy as adding switch modules to your chassis switch.
VLAN Design. No discussion of the network would be complete without touching on the Virtual LAN, or VLAN, architecture. While VLANs, aka network segmentation, are often introduced into smaller environments as part of the security design, it is important to consider their performance and traffic management advantages as well. In our design both the security and throughput aspects of VLAN design are critical, and as always, we’ve tried to keep the design as simple as possible.
From the start, we made sure to segment our production, management, and SAN-to-node traffic. We also made sure we provided separate data paths for remote access and wireless services. In this network, we have broken out and set up separate VLANs for management; production; web/DMZ; iSCSI/SAN; voice; WAP management; production Wi-Fi; guest Wi-Fi; and WAN circuits 1, 2, 3, and 4; for a total of a dozen VLANs. This may seem complicated, but it really is as simple as it should be; any simpler would not be nearly as capable.
Data Protection/Replication. How do you back up and provide business continuity and cloud replication for this beast? We deployed a state-of-the-art, image-based solution. Start with a 2U server with dual Xeon processors, 128GB of memory, dual SSDs to run the Linux-based OS, and the bulletproof Solaris-derived ZFS file system across 20TB of redundant local storage. This device acts as a backup target for snapshot agents on each server, enabling you to snapshot all servers hourly or more frequently, as well as automate cloud replication of those snapshots.
Fortunately, this backup and replication can be accomplished using a single solution from Norwalk, Conn.-based Datto Inc. The company’s SIRIS product line offers image-based backup, business continuity, and, with automated off-site replication, true disaster recovery. It also provides the backstop of outstanding 24/7 support. If you plan to create your own offering, you will need local storage for fast file recovery and server failover, and replication off-site for cloud backup and disaster recovery. By the way, remember to test, test, and test.
Perimeter Security. Security is no place to cut corners, and they weren’t cut here. The site is protected by a pair of firewalls capable of 1Gb of throughput, set up in failover mode. Even with deep packet inspection (DPI) of SSL traffic enabled, the firewalls can handle 500Mb of throughput. The firewalls are set up in an active/passive configuration, which means you pay for support on only one firewall. We are using SonicWall 5600s here, from Santa Clara, Calif.-based SonicWall Inc., which have the added advantage of supporting the wireless infrastructure as well.
Another interesting aspect of this design is the physical separation of the firewalls. This site happens to have a “warm spare” building about 150 meters across the compound, where the high-availability firewall and Datto device reside, adding another dimension of fault tolerance. The firewalls are linked by a fiber line between buildings that keeps them in stateful sync. And of course, we employ all of the usual lockdowns—from geo/botnet filtering and content filtering to full DPI scanning of both standard and encrypted traffic, and more.
Secure Remote Access. Providing secure remote access from anywhere, at any time, is a must today. Once again, you have many options here, including SSL VPN in the firewall, a dedicated secure mobile access device (the SMA series, in SonicWall parlance), or a hosted service. We have gone with firewall-integrated SSL VPN, which allows for both secure access and (with additional software, in the SonicWall world) the ability to track and document user connectivity.
While going with a dedicated, remote access device makes good sense from the standpoint of added functionality, there are significant cost and complexity issues to be tackled, and with fewer than a dozen remote users, it didn’t fit in this design. Some of those benefits include more advanced endpoint vetting, more sophisticated endpoint remediation options, access levels based on vetting, high-availability operation of the remote-access devices, and more. Your mileage may vary, but it didn’t make sense in our case to go to dedicated SMA.
Secure Wireless Access. Secure wireless access can be accomplished in one of two ways: integrated into the network design or provisioned and secured separately. In the first case, you probably already know that all the major firewall vendors offer managed wireless access points as an option. SonicWall offers very good access points that can handle scores of simultaneous associations and perform wireless handoffs; the vendor also offers multiple virtual SSID support, wireless guest services, and more.
Going with a dedicated wireless access solution from Sunnyvale, Calif.-based Ruckus Wireless Inc., for example, provides significant advantages not found in the SonicWall solution we chose, however. For example, if you need hundreds of wireless devices per access point, coverage of large outdoor areas, or the ability to handle large numbers of guest users and control their traffic in a granular fashion, firewall-integrated solutions just aren’t going to cut it. Going down the dedicated wireless solution path wasn’t necessary here, but again, your mileage may vary.
Power Protection and Environmental Monitoring. It should go without saying that power and environmental protection and monitoring should be equally sound. And we’ve cut no corners. This installation is supported by an array of batteries, with load balancing and failover. Think of this as a RAID array with batteries. We selected the Symmetra solution from APC by Schneider Electric, based in West Kingston, R.I., along with the company’s NetBotz environmental monitoring solution for its powerful combination of online power filtering, remote management, and scalable runtime. We also installed and documented the switched power distribution units, so we can reboot remotely.
At the very least, you are going to want to provide adequate coverage for the power load of all your devices, sufficient runtime for meaningful power failover, and time for orderly shutdown of everything in one fully managed solution. If going to an APC Symmetra doesn’t fit your budget, consider at least managed batteries, switched PDUs, and basic temperature sensors attached to the UPS network interface controllers. Tying this together and configuring appropriate thresholds and alerts will take some time as well.
Tying It Together
The final step is to document and manage your network. We work with a variety of tools to document installations, including hosted documentation from IT Glue, in Vancouver, British Columbia, along with basics such as Microsoft Excel and Microsoft Visio. We set up remote monitoring and management with Durham, N.C.-based SolarWinds MSP, use outsourced email filtering from Greenville, S.C.-based Mailprotector, add DNS filtering using Cisco Umbrella from Cisco Systems Inc., San Jose, Calif. We then tie it all together with full, 24/7 managed services. If we are to support a network of this complexity, the client must commit to a complete managed services package.
There is a lot to learn, including how to manage Hyper-V clusters with Cluster-Aware Updating and Cluster Shared Volumes. You’ll need to learn SAN concepts and SAN performance tuning. And you’ll need to learn how to design and troubleshoot more highly segmented networks. The good news is that many of those tools are free or very reasonably priced, and are easy to learn. The end result will be the sort of performance, resilience, and manageability that no other network design can touch.
The promise of premises cloud for your practice is the ability to deliver four 9s of uptime and a flexible, scalable, on-site network to those sites that simply cannot or will not go to the cloud. There is a lot of expertise and commitment involved in designing a network this sophisticated, but once you sell and implement an installation like this, you will own it and the ongoing revenue it provides.
Just keep it as simple as possible, but not simpler.
- A “PREMISES CLOUD” solution brings the flexibility, efficiency, and scalability of cloud computing to SMBs that can’t or won’t go online.
- TO PARAPHRASE ALBERT EINSTEIN, a premises cloud should be as simple as possible in design, but no simpler.
- THE IDEAL SAN for the solution is one that is fast, expandable, and highly available.
- SECURITY IS NO PLACE TO CUT CORNERS. Build in ample firewall capacity and secure remote access and wireless networking functionality.
- DOCUMENT AND DIAGRAM everything you do to make managing the system easier.
JOSHUA LIBERMAN is president of Net Sciences Inc., an Albuquerque, N.M.-based systems integration firm and a long-time leader in sophisticated network design and support.
Image source: ThinkstockPhotos