Monday, February 23, 2015

Planning for vSphere 6: VMCA considerations

With the immanent release of vSphere 6, I've been doing prep work for upgrades and new installs. There's a lot of information out there (just check out the vSphere 6 Link-O-Rama at Eric Siebert's vSphere Land for an idea of the breadth & depth of what's already written), but not as much as I'd like to make good decisions in order to future-proof the setup.

I'm sure I join lots of VMware admins in looking forward to the new features in vSphere 6—long-distance vMotion, cross-datacenter & cross-vCenter vMotion, multi-vCPU fault tolerance (FT), etc.—but along with these features come some foundational changes in the way vSphere management & security are architected.

Hopefully, you've already heard about the new PSC (Platform Services Controller), the functional descendant of the SSO service introduced in vSphere 5.1. SSO still exists as a component of the PSC, and the PSC can be co-installed ("embedded") on the same system as vCenter, or it can be independent. Like SSO on vSphere 5.5, the PSC has its own internal, dedicated database which it replicates with peer nodes, similar to what we've come to know and expect from Microsoft Active Directory.

This replication feature not only helps for geographically-distributed enterprises—allowing a single security authority for multiple datacenters—but high availability in a single datacenter through the use of 2 (or more) PSCs behind a load balancer. Note the emphasis on the load balancer: you will end up with the abstraction of the PSC with a DNS name pointing at an IP address on your load balancing solution, rather than the name/IP of a PSC itself.
PSC in load-balanced HA configuration

This delegation means you must plan ahead of time for using load balancing; it's really not the sort of thing that you can "shim" into the environment after implementing a single PSC.

Joining SSO in the PSC "black box" are several old and some brand new services: identity management, licensing...and a new Certificate Authority, aka VMCA (not to be confused with vCMA, the vCenter Mobile Access fling) .

It's that last item—the Certificate Authority—that should make you very nervous in your planning for vSphere 6. The documentation so far indicates that you have upwards of four different modes for your CA implementation that are independent of your PSC implementation choices:
  • Root (the default). Source of all certs for dependent services, this CA's public key/cert must be distributed and placed in your trusted CA store.
  • Intermediate. Source of all certs for dependent services, but the CA itself gets its trust from a parent CA, which in turn must have its public key/cert distributed. In the case of corporate/Enterprise CA/PKI infrastructure, this will already be in place and will be my go-to configuration.
  • None/External-only. All services receive their certs from a different CA all together. This model is equivalent to removing all the self-signed certificates in pre-6 and replacing them with signed certificates. With the proliferation of services, each using its own certificate, this model is becoming untenable.
  • Hybrid. In the hybrid model, the VMCA provides certificates to services that provide internal communication (either service-to-service or client-to-service) while public CA-signed certs are used in the specific places where 3rd-party clients will interact. In this model, the VMCA may act as either root or intermediate CA.
Confused? Just wait: it gets more complicated...

Migrating from one model to another will have risks & difficulties associated with it. The default installer will set you up with a root CA; you will have the option to make it an intermediate at time of install. As near as I can tell from the available documentation, you will need to reinstall the PSC if you start with it as a root CA and decide you want it instead to be an intermediate (or vice-versa). This is consistent with other CA types (eg, Microsoft Windows), so there's no surprise there; however, it's not clear what other replicated services will be impacted when trying to swap CA modes, as it will require previously-issued certificates to be revoked and new certificates to be issued.

You can switch some or all of the certificates it manages with 3rd-party (or Enterprise) signed certs, but once you do, you will have to deal with re-issue & expiration on your own. I can't find anything documenting whether this is handled gracefully & automatically with VMCA-signed certs & services, similar to the centralized/automated certificate management that Windows enjoys in an Enterprise CA environment.

There isn't any documentation on switching from 3rd-party (or Enterprise) back to a VMCA-signed certificate. Presumably, it'll be some CLI-based witchcraft...if it's allowed at all.

Finally, keep in mind that DNS names factor heavily into certificate trust. Successfully changing name and/or IP address of an SSO server—depending on which was used for service registration—can be challenging enough. Doing the same with a system that is also a certificate authority will be doubly so.

So: what's the architect going to do?

For small, non-complex environments, I'm going to recommend what VMware and other bloggers recommend: stick with the single, combined PSC and vCenter server. Use the VCSA (vCenter Server Appliance) to save on the Windows license if you must, but I personally still prefer the Windows version: I'm still far more comfortable with managing the Windows OS environment & database than I am with Linux. Additionally, you're going to want Update Manager—still a Windows service—so I find it easier to just keep them all together.

This also suggests using the VMCA as a root CA, and I'll stick with that recommendation unless you have an Enterprise CA already. If you have the Enterprise CA, why not make it an intermediate? At a minimum, you'll would eliminate the need for yet another root certificate to distribute. More importantly, however, is that it's vastly easier to replace an intermediate CA—even through the pain of re-issuing certificates—than a root CA.

What constitutes small, non-complex? For starters, any environment that exists with one—and only one—vCenter server. You can look up the maximums yourself, but we're talking about a single datacenter with one or two clusters of hosts, so less than 65 hosts for vSphere 5.5; in practice, we're really talking about environments with 20 or fewer hosts, but I have seen larger ones that would still meet this category because—other than basic guest management (eg, HA & DRS)—they aren't really using vCenter for anything. If it were to die a horrible death and be redeployed, the business might not even notice!

Even if you have a small environment by those standards, however, "complex" enters the equation as soon as you implement a feature that is significantly dependent on vCenter services: Distributed Virtual Switch, Horizon View non-persistent desktops, vRealize Automation, etc. At this point, you now need vCenter to be alive & well pretty much all the time.

In these environments, I was already counseling the use of a full SQL database instance, not SQL Express with all of its limitations. Even when you're inside the "performance bubble" for that RDBMS, there are a host of other administrative features you must do without that can compromise uptime. With vSphere 6, I'm continuing the recommendation, but taking it a step further: use AlwaysOn Availability Groups for that database as soon as it's certified. It's far easier to resurrect a cratered vCenter server with a valid copy of the database than rebuilding everything from scratch; I know VMware wants us all to treat the VCSA as this tidy little "black box," but I've already been on troubleshooting calls where major rework was required because no maintenance of the internal PosgreSQL database was ever done, and the whole-VM backup was found wanting...

Once you've got your database with high availability, split out the PSC from vCenter and set up at least two of them, the same way you'd set up at least two Active Directory domain controllers. This is going to be the hub of your environment, as both vCenter and other services will rely on it. Using a pair will also require a load balancing solution; although there aren't any throughput data available, I'd guess that the traffic generated for the PSC will be lower than the 10Mbps limit of the free and excellent Citrix NetScaler VPX Express. I've written about it before, and will be using it in my own environment.

Add additional single and/or paired PSCs in geographically distant locations, but don't go crazy: I've seen blogs indicating that the replication domain for the PSC database is limited to 8 nodes. If you're a global enterprise with many geographically-diverse datacenters, consider a pair in your primary, most critical datacenter and single nodes in up to 6 additional datacenters. Have more than 7 datacenters? Consider the resiliency of your intranet connectivity and place the nodes where they will provide needed coverage based on latency and reliability. If you're stumped, give your local Active Directory maven a call; he/she has probably dealt with this exact problem already—albeit on a different platform—and may have insight or quantitative data that may help you make your decision.

Finally, I'm waiting with anticipation on an official announcement for FT support of vCenter Server , which will eliminate the need for more-complex clustering solutions in environments that can support it (from both a storage & network standpoint: FT in vSphere 6 is completely different from FT in previous versions!). Until then, the vCenter Server gets uptime & redundancy more through keeping its database reliable than anything else: HA for host failure; good, tested backups for VM corruption.