Saturday, April 15, 2017

Upgrading to vSphere 6.5 with NSX already installed

This has been a slow journey: I have so many different moving parts in my lab environment (all the better for testing myriad VMware products) that migrating to vSphere 6.5 was taking forever. First I had to wait for Veeam Backup & Replication to support it (can't live without backups!), then NSX, then I had to decide whether to discard vCloud Director (yes, I'm still using it; it's still a great multitenancy solution) or get my company to give me access to their Service Provider version...

I finally (finally! after over a year of waiting and waiting) got access to the SP version of vCD, so it was time to plan my upgrade...

My environment supports v6.5 from the hardware side; no ancient NICs or other hardware anymore. I was already running Horizon 7, so I had two major systems to upgrade prior to moving vSphere from 6.0U2 to 6.5a:

  • vCloud Director: 5.5.5-->8.0.2-->8.20.0 (two-step upgrade required)
  • NSX: 6.2.2-->6.3.1
There was one hiccup with those upgrades, and I'm sure they may be familiar to people with small labs: the NSX VIBs didn't install without "manual assistance." In short, I had to manually place each host into maintenance mode, kick off the "reinstall" to push the VIBs into the boot block, then restart the host. This wouldn't happen in a larger production cluster, but because mine is a 3-node VSAN cluster, it doesn't automatically/cleanly go into Maintenance Mode.

Moving on...

Some time ago, I switched from an embedded PSC to an external, so I upgraded that first. No problems.

Upgrading the stand-alone vCenter required a couple of tweaks: I uninstalled Update Manager from its server (instead of running the migration assistant: I didn't have anything worth saving), and I reset the console password for the appliance (yes, I'd missed turning off the expiration, and I guess it had expired). Other than those items? Smooth sailing.

With a new vCenter in place, I could use the embedded Update Manager to upgrade the host. I had to tweak some of the 3rd-party drivers to make it compatible, but then I was "off to the races."

After the first host was upgraded, I'd planned on migrating some low-priority VMs to it in order to "burn in" the new host and see if some additional steps would be needed (ie removing VIBs for unneeded drivers that have caused PSODs in other environments I've upgraded). But I couldn't.

Trying to vMotion running machines to the new host, I encountered network errors. "VM requires Network X which is not available". Uh oh.

I also discovered that one of the two DVS (Distributed Virtual Switch) for the host was "out of sync" with vCenter. And no "resync" option that would normally have been there...

Honestly, I flailed around a bit, trying my google fu and experimenting with moving VMs around, both powered-on and off, as well as migrating to different vswitch portgroups. All failing.

Finally, something inspired me to look at my VXLAN status; it came to me after realizing I couldn't ping the vmknic for the VTEPs because they sit on a completely independent IP stack, making it impossible to use vmkping with a VTEP as a source interface.

Bingo!

The command esxcli network vswitch dvs vmware vxlan list resulted in no data for that host, but valid config information for the other hosts.

A quick look at NSX Host Preparation confirmed it, and a quick look at the VIBs on the host nailed it down: esx-vsip and esx-vxlan were still running 6.0.0 versions.

I went back through the process I'd used for upgrading NSX in the first place, and when the host came back up, DVS showed "in sync", NSX showed "green" install status and—most important of all—VMs could vMotion to the host and they'd stay connected!

UPDATE: The trick, it seems, is to allow the NSX Manager an opportunity to install the new VIBs for ESXi v6.5 before taking the host out of maintenance mode. By manually entering Maintenance Mode prior to upgrading, VUM will not take the host out of Maintenance, giving the Manager an opportunity to replace the VIBs. Once the Manager shows all hosts upgraded and green-checked, you can safely remove the host from Maintenance and all networking will work.