Today we have a guest blog post from fellow PCFS Team member David Wallraff with contributions from Josh Kruck and David Malone.
TL;DR: If your cloud deployment relies on infrastructure that is not part of your cloud; regional scaling and datacenter loss are going to be more painful than you think.
A hybrid-cloud strategy is a good great idea for agile companies in an agile world. This means having both a public cloud and a private, on-site (“on-prem”) cloud underlining your platform, because each has its own merits. A public cloud allows for dynamic provision of resources, multi-region scalability, etc., while an on-prem cloud allows for the total ownership of your resources, physically secured backend services, etc. A hybrid cloud architecture gives you the best of both: coupled cloud deployments give your group access to on-prem, backend services within a responsive, provisionable cloud. That said, like just about any technological choice, tightly-coupled hybrid cloud architectures come with caveats.
These concerns can be broken down into two main categories:
- Shared cloud resources
Some of the most common resources on-prem that an enterprise might want to utilize in the cloud are DNS, NTP, and auth services (LDAP, AD, etc). Some are good services (internal DNS resolution) and some are needed services (internal auth services). The easiest, most common connection setup is a dedicated VPN tunnel linking the cloud to the on-prem datacenter. Unfortunately, this leaves vital services dependent on a single-point-of-failure should the VPN connection terminate due to an alien incursion, ISP failure, evil cats with lasers, and the like, leaving your customers with 404s. You can add redundant tunnels, but the single-point-of-failure isn’t just the connection, it’s the datacenter those resources live in.
The best way to handle this problem is to create strategic redundancy with local resources in the cloud, syncing where possible, being authoritative when not. Cloud-based authentication services can federate to your on-prem authentication services, a cloud-based DNS solution can host your cloud DNS zone (DNS forwarding only lasts as long as the TTL and is, at best, a band-aid not a solution) and cloud-based DBs can replicate to their on-prem brethren. Taking these steps to ensure your cloud deployment can survive on its own, with its own infrastructure means that if, and when, that VPN connection goes down, your cloud deployment (and its apps!) can continue to function.
Shared cloud resources
We all know we’re supposed to share our toys. But in the cloud, we can just push a button and voila! New copies of our toys. Lots of copies. This is one of those things that makes the cloud, The Cloud™. On-demand provisioning and scaling enables us to say “No problem!” when we realize we need to double capacity or establish a presence in South Africa. But when cloud deployments are sharing resources, like IP space, or are being forced through shared resources like a single outbound proxy, your growth is bound by these self-imposed bottlenecks. Have a shared network across your cloud deploys? RFC 1918 addresses top out at 17,891,328. That may seem like a lot, but when you start adding multiple deploys in US-East, US-Central, US-West, South America, Europe, APJ, China North, China South, Lagrange Point 1, … well you get the idea.
This is one of those things that’s easier to fix during design, and much harder after execution. By limiting yourself to VPNs for specific applications (back-end data services, federated auth servers, on-prem DNS zone transfers) rather than a wide site-to-site VPN tunnel, you can utilize the same IP space in each cloud deployment, giving you an upper-bound 4,294,967,296 IPs per cloud deployment (at least until IPv6 is more widely utilized in the cloud). Utilizing the cloud primitives for a specific-region’s outbound proxy (or maintaining your own) and syncing datasets/rules let’s you grow without worrying about an outbound proxy sitting on a single pipe (you are planning on growing enough to saturate a single pipe right?).
Public dependant on on-prem resources
Templated public cloud with own resources
So what then?
Why do it differently? Isolation and repeatability. Running local resources in a cloud and not sharing infrastructure across your deployments allows for greater flexibility and protection when it comes time to cut the cord. You’ll no longer be limited by shared IP space and the need to search through that spreadsheet you have, cause it’s always a spreadsheet, for enough IP space to deploy to another region (seeing a company run out of RFC 1918 IP addresses was not a fun day). With in-the-cloud resources, you can have true DR/BC, where the failure of one deployment doesn’t take down any or all of the other deployments. By using this pattern for hybrid-cloud deployments, you’ll be better able to stand up new deployments--or withstand the loss of one.