Sunday, June 19, 2011

SQL Server Failover Cluster -- Add-a-node gotcha

This isn't the sort of thing someone does all the time, so you're bound to be surprised by one thing or another...

We run instances of Microsoft SQL Server 2005 on hardware failover clusters (I'd love to run them all on VMware, but there are a number of reasons why we don't.). We had an occasion to replace the hardware for the nodes, and an associate & I were trying to get this accomplished.

Putting the 2 new nodes into the cluster was smooth as butter. That left me with a 5-node cluster (we were going to ultimately evict the older 3 nodes), and I set forth to add the new nodes to the first SQL instance on my list.

But I kept getting stopped by the "remote task will not start" error. I googled for a solution, and kept coming up with remote access issues: in essence, make sure you aren't using Remote Desktop to connect to the new passive node. I wasn't using RDP on any of the nodes, so I couldn't understand why it kept failing. Just about every combination of reboots and node selection was attempted. I made sure I was logged out from the new nodes. Still, no joy.

In final desperation, I logged out of all the nodes, not just the ones I was adding, then logged back in to the instance's active node and ran the setup from there. Voila! I was able to add a new node to the instance. For whatever strange reason, you cannot be logged into ANY of the nodes other than the one from which you're installing. Period. Not using RDP like they document, but also not using VNC or even the actual system console. NONE.