Sunday, February 1, 2015

Homelab Upgrades: stay tuned

I'm forever "messing" with my home lab. My latest set of updates will be based on a plan to get myself upgraded from all-gigabit to using 10Gbps for inter-host communication.

If I only had two hosts, it'd be fairly straightforward—especially if my hosts had on-board 10Gbase-T: build some Cat6 crossover cables and link the machines directly together. But I have three hosts (from my early experimentation with VSAN) and none of them have 10Gbps ports.

Why the move to 10Gb Ethernet?
Dual RJ-45 10Gbps NIC

My experimentation with VMware VSAN and PernixData FVP have led me to the conclusion that, while they certainly function in a 1Gbps environment, they are seriously limited by it (FVP especially so, as it cannot perform any multi-NIC bindings in its current incarnation).

With the growing prevalence of SSD in the datacenter—and the dropping price-per-gigabyte making it approachable for the homelab builder—the bandwidth and latency limitations in gigabit networks make 10Gbps networks almost a necessity as soon as you drop in that first SSD. Anything less, and you don't get full value for that dollar.
The same applies to your older 2/4Gbps Fibre Channel storage networks, but FC is pretty much unattainable by most homelab builders. That said: If you're spending top-dollar on an SSD FC array in your enterprise, don't hobble it with a slow network. For that matter, 8Gbps might even be too slow... Plus, with Ethernet upgrades, you get more bang for the buck: in addition to providing a boost to your storage network performance—especially NFS if your filer has 10Gbps—you can run vMotion and FT over it; an upgraded FC network only gives you a boost in block storage performance.
In my professional life, I currently work as a "delivery boy" for a soup-to-nuts value-added reseller. I spend a lot of time in client datacenters, performing all sorts of installations and upgrades. The ones I've participated in that involve upgrades from 1Gbps backbones to 10Gbps (or better, with 40Gbps being the common number even in non-Infiniband implementations by taking advantage of QSFP+ ports) were just amazing in the performance jump. In many ways, it brings back memories of putting in my first Fast Ethernet (100Mbps) switches for backbone services while the fan-out to the client workstations was all 10Mbps 10Base-2 "thin-net" coaxial. But I digress...

So my requirement is straightforward: I am working towards a 10Gbps solution for three hosts, and I'm satisfied if it only provides host-to-host intercommunication. That means that guest networking (and management) can still go over existing 1Gbps (simply because the demand is being fulfilled by 1Gbps, making 10Gbps overkill).

That meant, at a minimum, three single-port PCIe host adapters, a 10Gbps switch, and the appropriate interconnect cabling. But I wasn't going to be satisfied with "the minimum," even for my lab: I really, really want to have redundant 10Gbps connections for each host. You see, I don't rebuild everything every 60-90 days like some folks. This environment is fairly stable, supports a number of higher-level lab & training projects (like VDI and vCloud Director), and pretty much runs like production. It just does it in my basement instead of in a datacenter. With some specific VMware-related exceptions, I do most of my truly experimental work inside an Organizational Virtual Datacenter (OvDC) provisioned by vCloud Director; in that "bubble," I can even spin up virtual ESXi systems. So my requirements are a little more involved: dual 10Gbps in the hosts with multiple 1Gbps as fallback; this means a single 10Gbps switch would be acceptable, but not single ports in the hosts.

Netgear ProSafe XS708E
Depending on how you approach the problem, you can end up with several different solutions. If you start with the switch—and the popular 10Gbps switch for homelab use these days seems to be the Netgear ProSafe XS708E, with eight unmanaged copper (10Gbase-T) ports—then you need adapters & cabling that will work with it. Conversely, if you start with the adapters, you'll need a switch (and cabling) that will work with them. Many of the folks in the VMware community have been building their hosts with 10Gbps copper ports so for them, the copper switch made sense. I originally started down that path, and came up with the following bill-of-materials (BOM):

  • 10Gbps Switch (Netgear XS708E): $850-$900
  • 3 x 2-port 10Gbps PCIe adapters (Q-Logic QLE3442-CU): $1000-$1200
  • 6 x 6ft Cat6 (Monoprice): $10
  • Total: $1860-2100 US (tax, shipping not included)

That seemed like a pretty expensive option, so I hunted around and really didn't find many other deals. Variations exist when you check with eBay, but you're still in the $2000 price range. Lesson 1: if you don't already have existing 10Gbps ports in your hosts, you're looking at $350/port as a reasonable estimate for the low end for the price of entry.

I also chose to approach it from the other direction: get a line on the least-expensive compatible adapters, and build out from there.

Broadcom BCM57712A dual-SFP+
After finding some Broadcom-based dual-port SFP+ adapters for $70 apiece, I started to research SFP+ transceivers. Although they exist as 1Gbps SFP modules, you cannot get RJ45/Copper modules for SFP+/10Gbps; the specification for SFP+ simply doesn't provide sufficient power for UTP/STP cabling. That meant I'd have to get SFP+ transceivers (for both adapter and switch) as well as fiber cabling—or twinax direct-connect cables—to make this work. Plus a switch with SFP+ ports.

FiberStore SFP+ transceivers
As it turns out, the former problem (transceivers & cables) is readily solved: a China-based manufacturer/reseller (FiberStore) has compatible products at what I'd call "Monoprice Prices" for both their transceivers ($18/ea) and their twinax direct-connect cables ($22 for a 2M).
As much as I'd prefer the direct-connect option for this project, I was a little worried about Cisco/Broadcom compatibility; yes, FiberStore has satisfaction guarantees, but shipping back-and-forth to mainland China from the US would be a big pain if they didn't work the first time. Or the second. Or the third. I also didn't know until after purchasing them that the BCM57712A cards were OEM'd for Cisco.
Unless you're working with single-source for both the switch and endpoint—no matter what they say about standards—you can still have a direct-connect cable that won't work while transceivers would have worked fine. So I went the "spendy" route of selecting guaranteed-compatible 850nm multimode LC transceivers & fiber optic patch cables.
The big stickler, then, is finding a switch. Unfortunately, high-count SFP+ ports aren't typically found in consumer-grade gear, making the search more challenging: under typical sales models, SFP+ transceivers will set you back ~$100 apiece, which makes it a daunting financial proposition for a lab setup, even if the switch itself was cheap. After researching several lines of products, I was about to give up on the idea when I came across references to Cisco's "SG" line of small business switches. I've had the SG300-28 in my lab for a long time, so I know it well and like the product line. The SG500XG-8T8F with eight SFP+ and eight copper 100M/1G/10Gbps ports looked like a winner, but the prices (~$2500 street) were ridiculous compared to the copper-only Netgear. I found some alternative candidates from a variety of manufacturers on eBay, but some models were 10Gbps only (so I'd still need something to downlink them to 1Gbps if I wanted to access the 1Gbps-based storage in my environment) and others had "Buy Now" pricing in excess of $2500. Still too spendy.

But then I came across Cisco's $900 SG500X-24. After doing a bunch of reading—reviews, product guides, user manuals—I decided that this could be what I was after. In addition to having 24 copper 10M/100M/1Gbps ports, it also boasted 4x 10Gbps ports for uplinking (one pair of which would work as 5Gbps downlink ports when stacking—true, multi-chassis stacking—with other 1Gbps models in the Cisco SG500 line). Two variants existed, one with SFP+ ports, the other with 10Gbase-T. Alone, this switch wouldn't fit my requirement for dual host ports, but a pair of them—with one SFP+ port used to link the switches together—would fit the bill. Would it make budget?

2 x SG500X-24: $1800
3 x BCM57712A: $210
6 x SFP+ Tranceivers: $252
6 x 6ft LC patch cables: $36
1 x 1ft LC patch cable: $3
Total: $2300

Holy. Smokes. For a ~25% increase, I could build a dual switch configuration; one that would afford me some additional redundancy that wouldn't exist in the single switch setup. I checked with my CTO, and have approval: as long as it stays under $2500, I had the green light.

Go. Go! Go Baby Go!!!

UPDATE:
It turns out that those BCM57712A adapters I found were supplied with low-profile brackets and were originally designed to go inside the Cisco rackmount-series UCS servers. Not only were there no full-height brackets available—I checked with Broadcom (no answer), a local manufacturer rep listed on the Broadcom site ("check with the distributor"), a local distributor suggested by the manufacturer rep ("check with Broadcom") and the vendor from whom I purchased them ("check with Broadcom") and struck out—but the only way I was going to get the cards' firmware updated—not critical because ESXi 5.5 recognized the NIC and worked fine with the out-of-date firmware, but still very desirable, as research also suggested that there were good fixes in newer versions—was to put them into a compatible Cisco server and run the Host Update Utility (huu) on the system. It's my lab: if I'd been able to update the firmware or been able to source the full-height brackets, I'd have moved forward with them. Instead: Time for Plan B.

The next-cheapest dual-SFP+ adapters I'd found when doing my original were Mellanox-based HP adapters for ~$130 apiece. This was at the edge of my budget, and if I couldn't recoup the $210+ I'd already spent on the Broadcom adapters, I'd be even deeper in the hole (not to mention barreling right through the $2500 budget), but I'm going forward with this little experiment. I'll try and unload the other adapters on eBay, although this could be "throwing good money after bad" as easily as "in for a penny, in for a pound." We'll see.

UPDATE:
The Mellanox adapters (HP G2 [516937-B21]) arrived with full-height brackets. The firmware was out-of-date, but not to the point that the hardware and ESXi didn't recognize the card.Thanks to "Windows To Go" functionality in Windows 8.1, I was able to boot one of my hosts into Windows—with the help of a big USB drive—without screwing up my ESXi installation and update the card firmware of all 3 cards. ESXi had no problem recognizing the NIC (although this model was originally sold new in the vSphere 4.0 days) with both the as-received firmware as well as the latest-greatest. I'd also gone ahead and acquired the transceivers & cables after verifying that the Broadcom NICs were recognized by ESXi. With those in hand, I was able to do some basic point-to-point hardware checks for the adapters: the transceivers worked just as well in the HP cards as the Cisco cards (Enterprise guys: keep that in mind when you're looking at the transceiver prices from the OEM. There are enormous margins attached to these little gizmos, and purchasing from an original manufacturer could save your organization a ton of money). At this point, I'm in for less than $1000, but have a go/no-go on the switches. If I could get a single compatible switch for $1000, I'd be right where I was with the copper-based solution. Unfortunately, even eBay was a dry well: nothing for less than $1800.

Go. No-Go. Go. No-Go.

I am almost ready to pull the plug: I haven't ordered the switches yet (they're the single largest line-items on the BOM) because while they would be new and pretty sure to work as expected, the Frankenstein's monster of the adapters, SFP+ transceivers and fiber cables had no such guarantee. My project plan required me to start with the small and grow to the large as I proved each bit of infrastructure would work as envisioned. At this run rate, however, I'm going to exceed my budget; I should quit here and chalk it up to a learning experience. But I also wouldn't be able to do much with the existing infrastructure without a switch. Sure, I could try and create some weird full-mesh setup with multiple vMotion networks, but that would require a ton of manual overrides whenever I wanted to do anything. That, in turn, would make DRS essentially useless.

So: do I take the loss (or sit on the hardware and watch for dropping prices or a surprise on eBay) or push through and go over budget?

UPDATE:
Screw it. I want this so bad I can taste it. All I need is the the switches and I can make this work. Time to get crackin'.

UPDATE:
Nuts. I can only get one of the two switches I need for $900 from the supplier I found back in my original research. The next-cheapest price is an additional $60. I am again faced with the dilemma: do I stop here (this will get me the sub-optimal but fully-functional single 10Gbps port on each host), wait until another becomes available at the same price, or pay the additional cost to get the second switch?

UPDATE:
Screw it, part 2. I'm going ahead with the higher-price switch. This is going to work, and it's going to be awesome. I can feel it. I am going to save myself a few bucks, however, and select the "free" option on the shipping. Almost seems silly at this point...

UPDATE:
The first switch arrived before expected (Friday instead of the following Monday), so I get to start putting things together this weekend.

UPDATE:
Okay, I'm disappointed. All the documentation seems to read like you can use one to four of the 10Gbps ports for stacking—and you can—but what they don't say is that you must configure two or four ports for stacking, My original plan to use 3 ports for the hosts and 1 port for the multi-chassis stack has been foiled. Fortunately, I was only about 50% certain that it would work, so I've set the switch in "standalone" mode and moved forward with getting things cabled & configured. This is purely temporary until I get the second switch in and do some more validation; at some point, there will be "the great migration" so that I can evacuate the 1Gbps switches I'm currently using.

Once I finish, I'll be ejecting a 24-port dumb, unmanaged 10/100/1000 switch (in favor of my old SG300) and a pair of 28-port managed 10/100/1000 switches from the environment. This will have the effect of reducing the port count on each host from 9 (which includes the iDRAC) down to 7, but with two of those being the much-thinner 10Gbps ports. The cabling back to the switches will also be a bit more sveldt: two fiber cables take much less room in a wire loom than two (much less four) Cat6 cables take.

The hosts, each, have four gigabit ports on the motherboard, so I'm going to keep them in service as management and guest data networks.

The 10Gbps network will serve iSCSI (both host & guest use), NFS (I've hacked my Synology arrays to do trunked VLANs on bonded adapters) and vMotion. At the moment, I've reconfigured my in-host storage for use by HP StoreVirutal VSAs; this means I also can't have VSAN running anymore (not as robust as I'd like with my consumer-grade SSDs, and the Storage Profiles Service keeps 'going sideways' and making vCD unusable). I was able to source three of the H710 HBA adapters—along with SAS cables and backup battery for cache memory—intended for use in my hosts for a song. This should give me not only RAID5 for the spinning disks (something unavailable in the simple SAS HBA I'd been running before), but a more powerful HBA for managing both the SSD and the spinning disks.

For the same reason, I don't have an FVP cluster using SSD; luckily, however, I'm running v2, so I'm playing with the RAM-based acceleration and looking forward to seeing improvements once I get its network running on the 10G. My long-term hope is that I can find a cost-effective PCIe-based flash board that can serve as FVP's acceleration media; that, along with the 10Gbps network, should make a big difference while simultaneously giving me back all the RAM that I'm currently utilizing for FVP.

Tuesday, December 16, 2014

Remote Switchport Identification for ESXi

I was working by remote, trying to complete some work in a client's VMware environment when I discovered that one of the hosts didn't have the proper trunking to its network adapters. I had access to the managed switch, but for one reason or another, the ports weren't identified in the switch. Had the switch been from Cisco, the host itself could've told me what I needed: ESXi supports CDP on the standard virtual switch & uplinks.
But this was an HP switch.
Luckily, I had three things going for me:

  1. The HP switch supported LLDP
  2. I had access to temporary Enterprise Plus licensing
  3. The host had redundant links for the virtual switch.
How did that help? 

While the standard switch will only support CDP, the VMware Distributed Switch (VDS) supports either CDP or LLDP.

Here's how I managed to get my port assignments:
  1. Create a VDS instance
  2. Modify the distributed virtual switch (DVS) to use LLDP instead of CDP (the default)
  3. Update host licensing to temporary Enterprise Plus
  4. Add one (1) adapter to the DVS uplink group
  5. After 30 seconds, click on the "information" link for the adapter to retrieve switchport details
  6. Return adapter to the original standard switch
  7. Repeat steps 3-5 for additional adapters
  8. Remove host from DVS
  9. Return host licensing back to original license
  10. Repeat steps 3-9 for remaining hosts
  11. Remove DVS from environment

Saturday, November 8, 2014

Use Synology as a Veeam B&R "Linux Repository"

I posted a fix earlier today for adding back the key exchange & cipher sets that Veeam needs when connecting to a Synology NAS running DSM 5.1 as a Linux host for use as a backup repository. As it turns out, some folks with Synology devices didn't know that using them as a "native Linux repository" was possible. This post will document the process I used to get it going originally on DSM 5.0; it wasn't a lot of trial-and-error, thanks to the work done by others and posted to the Veeam forums.

Caveat: I have no clue if this will work on DSM 4.x, as it wasn't until I was already running 5.0 when I started to work on it.

  1. Create a shared folder on your device. Mine is /volume1/veeam
  2. Install Perl in the Synology package center.
  3. If running DSM 5.1 or later, update the /etc/ssh/sshd_conf file as documented in my other post
  4. Enable SSH (control panel --> system -->terminal & snmp)
  5. Enable User Home Service ( control panel --> user --> advanced)
Once this much is done, Veeam B&R will successfully create a Linux-style repository using that path. However, it will not be able to correctly recognize free space without an additional tweak, and for that tweak, you need to understand how B&R works with a Linux repository...

When integrating a Linux repository, B&R does not install software on the Linux host. Here's how it works: 
  1. connects to the host over SSH
  2. transmits a "tarball" (veeam_soap.tar)
  3. extracts the tarball into temporary memory
  4. runs some Perl scripts found in the tarball
It does this Every. Time. It. Connects.

One of the files in this bundle (lib/Esx/System/Filesystem/Mount.pm) uses arguments with the Linux 'df' command that the Synology's busybox shell doesn't understand/support. To get Veeam to correctly recognize the space available in the Synology volume, you'll need to edit the 'mount.pm' file to remove the invalid "-x vmfs" argument (line 72 in my version) in the file. However, that file must be replaced within the tarball so it can be re-sent to the Synology every time it connects. Which also means every Linux repository will get the change as well (in general, this shouldn't be an issue, because the typical Linux host won't have a native VMFS volume to ignore).

Requests in the Veeam forum have been made to build in some more real intelligence for the Perl module so that it will properly recognize when the '-x' argument is valid and when it isn't.

So how does one complete this last step? First task: finding the tarball. On my backup server running Windows Server 2012R2 and Veeam B&R 7, it's in c:\program files\veeam\backup and replication\backup. If you used a non-default install directory or have a different version of B&R, you might have to look elsewhere.

Second, I used a combination of  7-Zip and Notepad++ to manage the file edit on my Windows systems. Use whatever tool suits, but do not use an editor that doesn't respect *nix-style text file conventions (like the end-of-line character).

Once you edit the file and re-save the tarball, a rescan of the Linux repository that uses your Synology should result in valid space available results.

One final note: why do it this way? The Veeam forums have several posts suggesting that using an iSCSI target on the Synology--especially in conjunction with Windows 2012R2's NTFS dedupe capability--is a superior solution to using it as a Linux Repository. And I ran it that way for a long time: guest initiator in the backup host, direct attached to an iSCSI target. But I also ran into space issues on the target, and there aren't good ways to shrink things back down once you've consumed that space--even when thin provisioning for the target is enabled. No, it's been my experience that, while it's not as space-efficient, there are other benefits to using the Synology as a Linux repo. Your mileage may vary.

Repair Synology DSM5.1 for use as a Linux backup repository.

After updating my Synology to DSM 5.1-5004, the following morning I was greeted by a rash of error messages from my Veeam B&R 7 backup jobs: "Error: Server does not support diffie-hellman-group1-sha1 for keyexchange"

I logged into the backup host and re-ran the repository resync process, to be greeted by the same error.
Synology DSM 5.1 error
The version of SSH on the Synology was OpenSSH 6.6p2:

As it turns out, this version of SSH doesn't enable the required key exchange protocol by default; luckily, that's an easy edit of the /etc/ssh/sshd_config file. And to play it safe, I added not only the needed Kex parameter, I also added the published defaults.
KexAlgorithms diffie-hellman-group1-sha1,curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group14-sha1
After restarting SSH in the DSM control panel, then re-scanning the repository, all was not quite fixed:

Back to the manfile for sshd_conf...

The list of supported ciphers is impressive, but rather than add all of them into the list, I thought it would be useful to get a log entry from the daemon itself as it negotiated the connection with the client. Unfortunately, it wasn't clear where it was logging, so it took some trial-and-error with the config settings before I found a useful set of parameters:
SyslogFacility USER
LogLevel DEBUG
At that point, performing a rescan resulted in an entry in /var/log/messages:
Armed with that entry, I could add the Ciphers entry in sshd_conf, using the options from the Veeam ssh client to the defaults available in this version of sshd:
Ciphers aes128-cbc,blowfish-cbc,3des-cbc,aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com,aes256-gcm@openssh.com,chacha20-poly1305@openssh.com
One more rescan, and all was well, making it possible to retry the failed jobs.

Follow Up

There have been responses of both successes and failures from people using this post to get their repository back on line. I'm not sure what's going on, but I'll throw in these additional tips for editing sshd_config:
  1. Each of these entries (KexAlgorithms and Ciphers) are single line entries. You must have the keyword—case sensitive— followed by a single space, followed by the entries without whitespace or breaks.
  2. There's a spot in the default sshd_config that "looks" like the right place to put these entries; that's where I put them. It's a heading labelled "# Ciphers and keying." Just drop them into the space before the Logging section. In the screenshot below, you can see how there's no wrap, no whitespace, etc. This works for me.
  3. Restart the SSH service. You can use the command line (I recommend using telnet during this operation, or you'll loose your SSH connection as the daemon cycles) or the GUI control panel. If using the latter, uncheck SSH, save, check SSH.