Showing posts with label Lab. Show all posts
Showing posts with label Lab. Show all posts

Sunday, March 8, 2015

Fix vShield Manager after modifying vSphere VDS uplinks

If you've been following my posts about upgrading my home lab, you know that I removed the add-in 1Gbps NICs and consolidated the motherboard-based 1Gbps NICs on one DVS (distributed virtual switch) in order to add 10Gbps support to my hosts. In that process, I not only rearranged the physical NICs for the uplinks, I also updated the uplink names in order to keep my environment self-documenting.

Things pretty much went as planned, but I didn't expect vShield Manager (vSM) to choke on the changes: when updating the uplink names for the DVS that provided the VXLAN port group, I expected vSM to recognize the changes and handle creation of new VXLAN networks without issue. I was wrong.

The first symptom that I had an issue was the inability of vCloud Director (vCD) to create a new Organizational Network on a deployed Edge device:
So: something is off with the teaming policy. Time to look at vSM to determine whether vCD is sending a bad request to vSM, or if vSM itself is the source of the issue. The easiest way to check is to manually create a new network in vSM; if it succeeds, vCD is sending a bad request, otherwise I need to troubleshoot vSM--and possibly vCenter, too.
Boom: the problem is reproduced even for a test directly in vSM. Time to verify the teaming in the base portgroup in vCenter.
Oops. I hadn't updated the portgroup for VXLAN after moving the uplinks around, although I had done so for the other portgroups on the DVS.
Unfortunately, updating the portgroup to use all the available uplinks didn't help. However, in the process, I discovered an unexpected error in vCenter itself:
vSM was making an API call to vCenter that included one of the old uplink names, one which no longer existed on the DVS. To test the theory, I added a couple of additional uplink ports to the DVS and renamed one to match the missing port. It worked, but not as expected:
vSM was able to send a proper API call to vCenter, but the portgroup had sub-optimal uplink settings: of the two active uplinks, only one had an actual, physical uplink associated with it. This was not a redundant connection, even though it looked like it.

Time to restart vSM to get it to re-read the vCenter DVS config, right? Wrong. Even with a restart & re-entering the vCenter credentials, the state persisted.

At this point, my Google-fu failed me: no useful hits on a variety of search terms. Time to hit the VMware Community Forums with a question. Luckily, I received a promising answer in just a day or two.

I learned that one can use the REST API for vSM to reconfigure it, which can get it back in line with reality. But how do you work with arbitrary REST calls? It turns out, there's a REST client plug-in for Firefox, written to troubleshoot and debug REST APIs. It works a treat:
  1. Set up the client for authenticated headers
  2. Retrieve the DVS configuration as an XML blob in the body of a GET call
  3. Modify the XML blob so that it has the correct properties
  4. PUT the revised XML blob back to vSM.
Voila! Everything works.

Specifics:
1) Use an Authenticated GET on the switches API
2) Using the objectId of the desired DVS, get the specific switch data
3) Update the XML blob with the correct uplink names
4) PUT the revised XML blob
As soon as this blob was accepted with a 200 OK response, I re-ran my test in vSM: success! vCD was also able to successfully create the desired portgroup, too.

Key takeaways:
  1. REST Client for Firefox is awesome for arbitrary interaction with a REST API
  2. Sometimes, the only way to accomplish a goal is through the API; a GUI or CLI command may not exist to fix your problem.
  3. This particular fix allows you to arbitrarily rename your uplinks without having to reset the vShield Manager database and completely reinstall it to get VXLAN working again.

Tuesday, February 10, 2015

HP StoreVirtual VSA: The Gold Standard

HP has owned the Left Hand storage system since late 2008, and has made steady improvements since then. The product had already been officially supported on a VM; not only did the acquisition not destroy that option, but HP has embraced the product as a cornerstone of their "software-defined storage" marketing message.

Although other products existed back in 2008, a virtualized Left Hand node was one of the first virtual storage appliances (VSA) available with support for production workloads.

Fast-forward to August, 2012: HP elects to rebrand the Left Hand product as StoreVirtual, renaming the SAN/iQ to LeftHand OS in order to preserve its heritage. The 10.0 version update was tied to the rebranding, and the VSA arm of the portfolio—HP never stopped producing "bare-metal" arrays based on their 2U DL380 server chassis—promised to bring additional enhancements like increased capacity (10TB instead of 2TB) and better performance (2 vCPUs instead of 1) along with price drops.

The 11.0 version was released with even more features (11.5 is the production/shipping version for both bare-metal and VSA), chief of which—in my opinion—is Adaptive Optimization (AO), the ability for node-attached storage to be characterized in one of two tiers.

Note that this isn't a Flash/SSD-specific feature! Yes, it works with solid state as one of the tiers—and is the preferred architecture—but any two performance-vs-capacity tiers can be configured for a node: a pair of 15K RPM SAS drives as Tier 0 performance with 4-8 NL SAS drives as Tier 1 capacity is just as legitimate. HP cautions the architect, however, not to mix nodes with varying AO characteristics in the same way it cautions against mixing single-tier nodes in one cluster.

Personally, I've played with the StoreVirtual VSA off and on over the years. The original hold-back for getting deeply into it was the trial duration: 30 to 60 days is insufficient to "live with" a product and really get to know it. In early 2013, however, HP offered NFR licensing to qualified members of the VMware vExpert community, and those licenses had year-long duration associated with them.

Unfortunately, however, the hosts I was running at home were pretty unsuited to supporting the VSA: limited RAM and 2-4 grossly inferior desktop-class SATA hard drives in each of 2 hosts. I'd still load up the VSA for test purposes; not for performance, but to understand the LeftHand OS better and how failures are handled, configurations are managed, and how the product interacts with other software like Veeam Backup & Recovery. But then I'd also tear down the cluster when I finished with it in order to regain consumed resources.

When PernixData FVP was still in pre-GA beta, I was able to make some system upgrades to add SSD to newer hosts—still with essentially zero local capacity, however—and was able to prove to myself that a) solid state works very effectively at increasing the performance of storage and b) there is a place for storage in the local host.

With the release of the first VMware Virtual SAN beta, I decided it was time to make some additional investments into my lab, and I was able to not only add a third host (the minimum for supported VSAN deployment) but also provision them all with a second SSD and enterprise SATA disks for the experiment. In that configuration, I was able to use one SSD for iSCSI-based performance acceleration (using the now-GA FVP product) and a second SSD for VSAN's solid state tier. My hosts remained limited in the number of "spinning disk" drives that could be installed (four), but in aggregate across three hosts, the total seemed not only reasonable, but seemed to work in practice.

Unfortunately, I was plagued by hardware issues in this configuration: rarely a week went by without either FVP or VSAN complaining about a drive going offline or being in "permanent failure," and it seemed like the weeks when that didn't occur, the Profile Driven Storage service of vCenter—which is critical to making use of VSAN in other products like vCloud Director or Horizon View—would need to be restarted. Getting FVP or VSAN working correctly would usually require rebooting the host reporting an issue; in some cases, VMs would need to be evacuated from VSAN to provide the necessary free space to retain "availability."

In short, the lab environment with my 1Gbps networking and consumer-grade disk & HBA made VSAN and FVP a little too much work.

But I still had that VSA license... If I could get a better HBA—one that would perform true hardware-based RAID and have deeper queue, not to mention other enterprise SATA/SAS capabilities—I'd be able to leverage the existing disk investment with the VSA and have a better experience.

I was able to source a set of Dell PERC H700 adapters, cables and cache batteries from eBay; these were pulled from R610 systems, so dropping them into mine was trivial and the set cost considerably less than a single kit from Dell. Although I could have rebuilt the VSAN and FVP environments on the new HBA—each disk in the system would need to be set up as a single-spindle RAID0 'virtual volume'—I went with a RAID1 set for the pair of SSD and a RAID5 for the spindles. I would be able to continue leveraging PernixData for acceleration using the RAM-backed function, but I was done messing with VSAN for now.

Setting up the v11.5 VSA initially gave me pause: I was booting from SD card, so I could use 100% of the SSD/HDD for it, but how to do it? If the LeftHand OS had drivers for the PERC array—possible: the core silicon of the H700 is a LSI/Symbios product which might be supported in spite of being a Dell OEM—I could do a DirectPath I/O if there was another datastore available on which to run the VSA. A second, similar alternative would be to manually create Physical RDM mappings for the RAID volumes, but that still left the problem of a datastore for the VSA. Yes, I could run the VSA on another array, but if the host ever had issues with that array, then I'd also end up with issues on my LeftHand cluster—not a good idea!

My final solution is a hybrid: The HDD-based RAID group is formatted as a VMFS5 datastore, and the VSA is the only VM using it. A large, 1.25TB 'traditional' VMDK is presented using the same datastore (leaving ~100GB free for the VSA boot drive and files); the SSD-based RAID group is presented as Physical RDM. This configuration permitted me to enable AO on each node, and get an SSD performance boost along with some deep storage from the collection of drives across all three nodes.

In practice, this array has been more trouble-free than my VSAN implementation on (essentially) identical hardware. A key difference, however, has been the performance with respect to inter-node communication: With VSAN, up to four interfaces can be simultaneously configured for inter-node communication, increasing bandwidth and lowering latency. Even with the lower performance characteristics of the disks and HBA in each host, saturating two of the four gigabit interconnects I had configured was possible with VSAN (when performing sequential reads & writes, eg, backups & storage vMotion), so the single gigabit connection available to VSA was very noticeable.

I have since migrated my network environment to use 10Gbps Ethernet for my back-haul network connectivity (iSCSI, NAS, vMotion) and have objective evidence of improved performance of the LeftHand array. I'll be updating this post with subjective test results when the opportunity presents itself.

Citrix NetScaler UI changes

Which is worse?

  • Searching for the solution to a problem and not being able to find it
— or —
  • Finding the exact solution to a problem in a blog, but discovering that it's an older post using out-dated products and documenting an API or UI that no longer exists?

This question comes from some feedback I received on a series of posts I put together that documents my use of the Citrix NetScaler VPX Express virtual appliance as a reverse proxy.

Citrix is doing the right thing: they're rebuilding the GUI in the NetScaler to eliminate Java (as much as possible). It has been a slow-going process, starting with the 10.0 version (as of this writing, 10.5 is current, and there are still one or two places that use a Java module), and one of the drawbacks is that the new HTML-only UI elements can't duplicate the Java UI—so things are...different.
HA Setup, v10.0 & earlier
HA Setup, v10.5
In the screencaps above, you see the older Java-based dialog box and the newer HTML page. They have some of the same data, but they are neither identical, nor are they found in the same exact place from the principal UI.

How does a blogger serve his/her audience? Does one ignore the past and soldier on, or does one revisit the old posts and update them for a new generation of software? If I had positioned myself as a NetScaler expert, that answer is obvious: UI changes in and of themselves would be post-worthy, and revisiting old functions to make them clear under the new UI would make perfect sense.

In this case, however, I have only had a couple of requests for revised instructions using the equivalent UI; I'm not a NetScaler guru, and to be perfectly frank, I haven't the time needed to redo the series. If I get a lot more feedback that this series needs to be updated, I'll think about a second edition using the new UI, but as of now it's going to stay the way it is.

Homelab 2015: Hello 10Gbps!

New year, new post documenting the home lab. I've accomplished a number of upgrades/updates since my last full roundup of the lab, so rather than posting this as another delta, I'm doing this as a full re-documentation of the environment.

Compute: VMware vSphere 5.5

  • 3 x Dell R610, each spec'd as follows:
    • (2) Intel Xeon E5540 @ 2.53GHz
    • Hyperthreading enabled for 16 logical CPUs per host.
    • 96GiB RAM
    • Boot from 4 or 8GB SD card
    • Dell PERC H700 6Gbps SAS/SATA HBA
      • (4) 500GB Seagate Constellation.2 (ST9500620NS) SATA
        • RAID5
        • Formatted as local vmfs5 datastore
      • (2) 240GB Intel 530 SATA SSD
        • RAID1
        • RDM for StoreVirtual VSA (see below)
    • Quad port Broadcom BCM5709 Gigabit Copper (embedded)
    • Dual port Mellanox MT26448 10GigE (8-lane PCIe), 850nm SFP+ optics
    • iDRAC 6 Enterprise
    • Redundant power

Storage: IP-based

  • iomega StorCenter ix2-200 "Cloud Edition"
    • (2) 1TB Seagate Barracuda (ST1000DM003) 7200RPM SATA
    • RAID1
    • (1) 1000Base-T
    • LifeLine OS v3.2.10.30101
    • NFS export for VMs
  • 2 x Lenovo (iomega/EMC) px6-300d
    • (6) 2TB Hitachi Deskstar (HDS723020BLA642) 7200RPM SATA
    • RAID5
    • (2) 1000Base-T, bonded, multiple VLANs
    • LifeLine OS v4.1.104.31360
    • 2TB iSCSI Target for VMs
  • Synology DS2413+
    • (12) 2TB Seagate Barracuda (ST2000DM001) 7200RPM SATA
    • RAID1/0
    • (2) 1000Base-T, bonded, multiple VLANs (CLI-added)
    • DSM 5.1-5022 Update 2
    • NFS exports:
      • ISOs (readonly, managed by SMB)
      • Logs
      • VMs
  • Synology DS1813+
    • (8) 256GB Plextor PX-256M6S SSD
    • RAID5
    • (4) 1000Base-T
      • (1) Management network
      • (2) iSCSI network (multi-homed, not bonded)
    • ~1.6TB iSCSI Target (block mode)
  • HP StorVirtual "Lefthand OS"
    • (3) ESXi Virtual Appliances, 1 on each host
      • 1280GB VMDK on local storage; tier 1
      • 223GB RDM on SSD volume; tier 0
    • 4486.49GB Raw, 2243GB RAID1
    • (2) 1TB volumes for VMs
      • Thin provisioning
      • Adaptive Optimization
    • iSCSI network: 10GbE
    • Management: 1GbE

Networking:

  • (2) Cisco SG500X-24
    • (4) 850nm SFP+ optics for 10GbE
    • (24) 1000Base-T MDI/MDI-X
    • Primary ISL: 10GbE
    • Backup ISL: (1) 2x1GbE LACP LAG
    • STP Priority: 16384
  • Cisco SG300-28
    • (28) 1000Base-T MDI/MDI-X
    • (2) 2x1GbE LACP LAG for link to SG500X-24
    • STP Priority: 32768
  • Google Fiber (mk.1)
    • "network box"
    • "storage box"
    • "fiber box"
  • Various "dumb" (non-managed) 1GbE switches
  • Apple Airport Extreme (mk.4)/Express (mk.2)

Miscellaneous:

  • (4) APC BackUPS XS1500
  • Internet HTTP/SSL redirection via Citrix NetScaler VPX (HA pair)
  • Remote access via:
    • TeamViewer 10
    • Microsoft RDS Gateway
    • VMware Horizon View
    • Citrix XenApp

Connectivity Diagrams:

Host Configuration

Environment

Sunday, February 1, 2015

Homelab Upgrades: stay tuned

I'm forever "messing" with my home lab. My latest set of updates will be based on a plan to get myself upgraded from all-gigabit to using 10Gbps for inter-host communication.

If I only had two hosts, it'd be fairly straightforward—especially if my hosts had on-board 10Gbase-T: build some Cat6 crossover cables and link the machines directly together. But I have three hosts (from my early experimentation with VSAN) and none of them have 10Gbps ports.

Why the move to 10Gb Ethernet?
Dual RJ-45 10Gbps NIC

My experimentation with VMware VSAN and PernixData FVP have led me to the conclusion that, while they certainly function in a 1Gbps environment, they are seriously limited by it (FVP especially so, as it cannot perform any multi-NIC bindings in its current incarnation).

With the growing prevalence of SSD in the datacenter—and the dropping price-per-gigabyte making it approachable for the homelab builder—the bandwidth and latency limitations in gigabit networks make 10Gbps networks almost a necessity as soon as you drop in that first SSD. Anything less, and you don't get full value for that dollar.
The same applies to your older 2/4Gbps Fibre Channel storage networks, but FC is pretty much unattainable by most homelab builders. That said: If you're spending top-dollar on an SSD FC array in your enterprise, don't hobble it with a slow network. For that matter, 8Gbps might even be too slow... Plus, with Ethernet upgrades, you get more bang for the buck: in addition to providing a boost to your storage network performance—especially NFS if your filer has 10Gbps—you can run vMotion and FT over it; an upgraded FC network only gives you a boost in block storage performance.
In my professional life, I currently work as a "delivery boy" for a soup-to-nuts value-added reseller. I spend a lot of time in client datacenters, performing all sorts of installations and upgrades. The ones I've participated in that involve upgrades from 1Gbps backbones to 10Gbps (or better, with 40Gbps being the common number even in non-Infiniband implementations by taking advantage of QSFP+ ports) were just amazing in the performance jump. In many ways, it brings back memories of putting in my first Fast Ethernet (100Mbps) switches for backbone services while the fan-out to the client workstations was all 10Mbps 10Base-2 "thin-net" coaxial. But I digress...

So my requirement is straightforward: I am working towards a 10Gbps solution for three hosts, and I'm satisfied if it only provides host-to-host intercommunication. That means that guest networking (and management) can still go over existing 1Gbps (simply because the demand is being fulfilled by 1Gbps, making 10Gbps overkill).

That meant, at a minimum, three single-port PCIe host adapters, a 10Gbps switch, and the appropriate interconnect cabling. But I wasn't going to be satisfied with "the minimum," even for my lab: I really, really want to have redundant 10Gbps connections for each host. You see, I don't rebuild everything every 60-90 days like some folks. This environment is fairly stable, supports a number of higher-level lab & training projects (like VDI and vCloud Director), and pretty much runs like production. It just does it in my basement instead of in a datacenter. With some specific VMware-related exceptions, I do most of my truly experimental work inside an Organizational Virtual Datacenter (OvDC) provisioned by vCloud Director; in that "bubble," I can even spin up virtual ESXi systems. So my requirements are a little more involved: dual 10Gbps in the hosts with multiple 1Gbps as fallback; this means a single 10Gbps switch would be acceptable, but not single ports in the hosts.

Netgear ProSafe XS708E
Depending on how you approach the problem, you can end up with several different solutions. If you start with the switch—and the popular 10Gbps switch for homelab use these days seems to be the Netgear ProSafe XS708E, with eight unmanaged copper (10Gbase-T) ports—then you need adapters & cabling that will work with it. Conversely, if you start with the adapters, you'll need a switch (and cabling) that will work with them. Many of the folks in the VMware community have been building their hosts with 10Gbps copper ports so for them, the copper switch made sense. I originally started down that path, and came up with the following bill-of-materials (BOM):

  • 10Gbps Switch (Netgear XS708E): $850-$900
  • 3 x 2-port 10Gbps PCIe adapters (Q-Logic QLE3442-CU): $1000-$1200
  • 6 x 6ft Cat6 (Monoprice): $10
  • Total: $1860-2100 US (tax, shipping not included)

That seemed like a pretty expensive option, so I hunted around and really didn't find many other deals. Variations exist when you check with eBay, but you're still in the $2000 price range. Lesson 1: if you don't already have existing 10Gbps ports in your hosts, you're looking at $350/port as a reasonable estimate for the low end for the price of entry.

I also chose to approach it from the other direction: get a line on the least-expensive compatible adapters, and build out from there.

Broadcom BCM57712A dual-SFP+
After finding some Broadcom-based dual-port SFP+ adapters for $70 apiece, I started to research SFP+ transceivers. Although they exist as 1Gbps SFP modules, you cannot get RJ45/Copper modules for SFP+/10Gbps; the specification for SFP+ simply doesn't provide sufficient power for UTP/STP cabling. That meant I'd have to get SFP+ transceivers (for both adapter and switch) as well as fiber cabling—or twinax direct-connect cables—to make this work. Plus a switch with SFP+ ports.

FiberStore SFP+ transceivers
As it turns out, the former problem (transceivers & cables) is readily solved: a China-based manufacturer/reseller (FiberStore) has compatible products at what I'd call "Monoprice Prices" for both their transceivers ($18/ea) and their twinax direct-connect cables ($22 for a 2M).
As much as I'd prefer the direct-connect option for this project, I was a little worried about Cisco/Broadcom compatibility; yes, FiberStore has satisfaction guarantees, but shipping back-and-forth to mainland China from the US would be a big pain if they didn't work the first time. Or the second. Or the third. I also didn't know until after purchasing them that the BCM57712A cards were OEM'd for Cisco.
Unless you're working with single-source for both the switch and endpoint—no matter what they say about standards—you can still have a direct-connect cable that won't work while transceivers would have worked fine. So I went the "spendy" route of selecting guaranteed-compatible 850nm multimode LC transceivers & fiber optic patch cables.
The big stickler, then, is finding a switch. Unfortunately, high-count SFP+ ports aren't typically found in consumer-grade gear, making the search more challenging: under typical sales models, SFP+ transceivers will set you back ~$100 apiece, which makes it a daunting financial proposition for a lab setup, even if the switch itself was cheap. After researching several lines of products, I was about to give up on the idea when I came across references to Cisco's "SG" line of small business switches. I've had the SG300-28 in my lab for a long time, so I know it well and like the product line. The SG500XG-8T8F with eight SFP+ and eight copper 100M/1G/10Gbps ports looked like a winner, but the prices (~$2500 street) were ridiculous compared to the copper-only Netgear. I found some alternative candidates from a variety of manufacturers on eBay, but some models were 10Gbps only (so I'd still need something to downlink them to 1Gbps if I wanted to access the 1Gbps-based storage in my environment) and others had "Buy Now" pricing in excess of $2500. Still too spendy.

But then I came across Cisco's $900 SG500X-24. After doing a bunch of reading—reviews, product guides, user manuals—I decided that this could be what I was after. In addition to having 24 copper 10M/100M/1Gbps ports, it also boasted 4x 10Gbps ports for uplinking (one pair of which would work as 5Gbps downlink ports when stacking—true, multi-chassis stacking—with other 1Gbps models in the Cisco SG500 line). Two variants existed, one with SFP+ ports, the other with 10Gbase-T. Alone, this switch wouldn't fit my requirement for dual host ports, but a pair of them—with one SFP+ port used to link the switches together—would fit the bill. Would it make budget?

2 x SG500X-24: $1800
3 x BCM57712A: $210
6 x SFP+ Tranceivers: $252
6 x 6ft LC patch cables: $36
1 x 1ft LC patch cable: $3
Total: $2300

Holy. Smokes. For a ~25% increase, I could build a dual switch configuration; one that would afford me some additional redundancy that wouldn't exist in the single switch setup. I checked with my CTO, and have approval: as long as it stays under $2500, I had the green light.

Go. Go! Go Baby Go!!!

UPDATE:
It turns out that those BCM57712A adapters I found were supplied with low-profile brackets and were originally designed to go inside the Cisco rackmount-series UCS servers. Not only were there no full-height brackets available—I checked with Broadcom (no answer), a local manufacturer rep listed on the Broadcom site ("check with the distributor"), a local distributor suggested by the manufacturer rep ("check with Broadcom") and the vendor from whom I purchased them ("check with Broadcom") and struck out—but the only way I was going to get the cards' firmware updated—not critical because ESXi 5.5 recognized the NIC and worked fine with the out-of-date firmware, but still very desirable, as research also suggested that there were good fixes in newer versions—was to put them into a compatible Cisco server and run the Host Update Utility (huu) on the system. It's my lab: if I'd been able to update the firmware or been able to source the full-height brackets, I'd have moved forward with them. Instead: Time for Plan B.

The next-cheapest dual-SFP+ adapters I'd found when doing my original were Mellanox-based HP adapters for ~$130 apiece. This was at the edge of my budget, and if I couldn't recoup the $210+ I'd already spent on the Broadcom adapters, I'd be even deeper in the hole (not to mention barreling right through the $2500 budget), but I'm going forward with this little experiment. I'll try and unload the other adapters on eBay, although this could be "throwing good money after bad" as easily as "in for a penny, in for a pound." We'll see.

UPDATE:
The Mellanox adapters (HP G2 [516937-B21]) arrived with full-height brackets. The firmware was out-of-date, but not to the point that the hardware and ESXi didn't recognize the card.Thanks to "Windows To Go" functionality in Windows 8.1, I was able to boot one of my hosts into Windows—with the help of a big USB drive—without screwing up my ESXi installation and update the card firmware of all 3 cards. ESXi had no problem recognizing the NIC (although this model was originally sold new in the vSphere 4.0 days) with both the as-received firmware as well as the latest-greatest. I'd also gone ahead and acquired the transceivers & cables after verifying that the Broadcom NICs were recognized by ESXi. With those in hand, I was able to do some basic point-to-point hardware checks for the adapters: the transceivers worked just as well in the HP cards as the Cisco cards (Enterprise guys: keep that in mind when you're looking at the transceiver prices from the OEM. There are enormous margins attached to these little gizmos, and purchasing from an original manufacturer could save your organization a ton of money). At this point, I'm in for less than $1000, but have a go/no-go on the switches. If I could get a single compatible switch for $1000, I'd be right where I was with the copper-based solution. Unfortunately, even eBay was a dry well: nothing for less than $1800.

Go. No-Go. Go. No-Go.

I am almost ready to pull the plug: I haven't ordered the switches yet (they're the single largest line-items on the BOM) because while they would be new and pretty sure to work as expected, the Frankenstein's monster of the adapters, SFP+ transceivers and fiber cables had no such guarantee. My project plan required me to start with the small and grow to the large as I proved each bit of infrastructure would work as envisioned. At this run rate, however, I'm going to exceed my budget; I should quit here and chalk it up to a learning experience. But I also wouldn't be able to do much with the existing infrastructure without a switch. Sure, I could try and create some weird full-mesh setup with multiple vMotion networks, but that would require a ton of manual overrides whenever I wanted to do anything. That, in turn, would make DRS essentially useless.

So: do I take the loss (or sit on the hardware and watch for dropping prices or a surprise on eBay) or push through and go over budget?

UPDATE:
Screw it. I want this so bad I can taste it. All I need is the the switches and I can make this work. Time to get crackin'.

UPDATE:
Nuts. I can only get one of the two switches I need for $900 from the supplier I found back in my original research. The next-cheapest price is an additional $60. I am again faced with the dilemma: do I stop here (this will get me the sub-optimal but fully-functional single 10Gbps port on each host), wait until another becomes available at the same price, or pay the additional cost to get the second switch?

UPDATE:
Screw it, part 2. I'm going ahead with the higher-price switch. This is going to work, and it's going to be awesome. I can feel it. I am going to save myself a few bucks, however, and select the "free" option on the shipping. Almost seems silly at this point...

UPDATE:
The first switch arrived before expected (Friday instead of the following Monday), so I get to start putting things together this weekend.

UPDATE:
Okay, I'm disappointed. All the documentation seems to read like you can use one to four of the 10Gbps ports for stacking—and you can—but what they don't say is that you must configure two or four ports for stacking, My original plan to use 3 ports for the hosts and 1 port for the multi-chassis stack has been foiled. Fortunately, I was only about 50% certain that it would work, so I've set the switch in "standalone" mode and moved forward with getting things cabled & configured. This is purely temporary until I get the second switch in and do some more validation; at some point, there will be "the great migration" so that I can evacuate the 1Gbps switches I'm currently using.

Once I finish, I'll be ejecting a 24-port dumb, unmanaged 10/100/1000 switch (in favor of my old SG300) and a pair of 28-port managed 10/100/1000 switches from the environment. This will have the effect of reducing the port count on each host from 9 (which includes the iDRAC) down to 7, but with two of those being the much-thinner 10Gbps ports. The cabling back to the switches will also be a bit more sveldt: two fiber cables take much less room in a wire loom than two (much less four) Cat6 cables take.

The hosts, each, have four gigabit ports on the motherboard, so I'm going to keep them in service as management and guest data networks.

The 10Gbps network will serve iSCSI (both host & guest use), NFS (I've hacked my Synology arrays to do trunked VLANs on bonded adapters) and vMotion. At the moment, I've reconfigured my in-host storage for use by HP StoreVirutal VSAs; this means I also can't have VSAN running anymore (not as robust as I'd like with my consumer-grade SSDs, and the Storage Profiles Service keeps 'going sideways' and making vCD unusable). I was able to source three of the H710 HBA adapters—along with SAS cables and backup battery for cache memory—intended for use in my hosts for a song. This should give me not only RAID5 for the spinning disks (something unavailable in the simple SAS HBA I'd been running before), but a more powerful HBA for managing both the SSD and the spinning disks.

For the same reason, I don't have an FVP cluster using SSD; luckily, however, I'm running v2, so I'm playing with the RAM-based acceleration and looking forward to seeing improvements once I get its network running on the 10G. My long-term hope is that I can find a cost-effective PCIe-based flash board that can serve as FVP's acceleration media; that, along with the 10Gbps network, should make a big difference while simultaneously giving me back all the RAM that I'm currently utilizing for FVP.

UPDATE:
The second switch has been received & I'm getting it configured with the first switch; I'm also having to make some changes to the old switches to add connectivity to the new switches. It'll be a fun migration, but it should be physically possible to dismount the old switches and lay them down  next to the rack; that will free the rack space for the new switches, and I can perform a rolling conversion of the cabling for my hosts, followed by moving the arrays one-by-one (they all have bonded uplinks, so disconnecting them, one link at a time, should be safe).

UPDATE:
AARRRGGGHHH! I'm re-using one of the old switches as an aggregation switch—any device that has only one connection/link will be plugged into that switch, while anything with multiple connections will be connected to the new switches. All three switches are cross-connected, and RSTP is used to keep broadcast loops from forming. Fine so far. But I needed to update the STP priorities for my switches to make sure the new switches would prefer their 10Gbps link, and in the process I discovered that one of my cross-connect LAGs wasn't built properly. It wasn't until STP did a recalc and decided to use that LAG as the designated port—and traffic stopped flowing—that I discovered my error. And the symptoms were manifold & annoying...thank goodness the LeftHand VSA is so resilient.

UPDATE:
Yay! Everything is done, including the rebuild of the LeftHand nodes to accommodate some architectural changes needed as a result of the LAG/STP outage. Looking forward to getting this new setup documented & published...which makes this the last update, and a new post in the drafts folder!!!

Thursday, February 6, 2014

SSL Reverse Proxy using Citrix NetScaler VPX Express

Part 5 in a series

This part is the final post of the series; it builds on the previous posts by adding an SSL-based content switch on top of our previously-created simple HTTP content switch.

The NetScaler does a fine job of handling SSL traffic in a manner similar to the way it handles the unencrypted HTTP traffic. The key differentiator—other than making sure to distinguish the traffic as being SSL-bound—is the inclusion of certificate handling.

Of course, the "outside" or Content Switching virtual server must have an SSL certificate; the client trying to reach your host(s) is expecting an SSL connection, so the listener responding to the particular host request must respond with a conforming certificate or he/she will have to deal with certificate errors.

The "inside" server that's the target of Content Switching probably wants to communicate with its clients using SSL, too (In some special cases—known as "SSL Offload"—the inside server allows non-encrypted connections from specific hosts that are pre-configured to handle the overhead of SSL encryption; NetScaler can do this, too). In order for the NetScaler to perform as a proxy, it must have sets of SSL certificates for both the inside and the outside connections. Once you have those, you can quickly set up an SSL-based content switching configuration that mirrors the HTTP setup.

And the best part? Only the Content Switching virtual server needs to have an SSL certificate that is signed by a trusted root! (Caveat: it must be either a wildcard or multiple-SAN certificate. Remember: the DNS name must match either the CN [common name] or one of the DNS SAN [subject alternate name] entries of the host certificate) The "inside" servers that you're putting "behind" the NetScalers can have self-signed certificates or certificates signed by an in-house CA.


A little about Certificate files

The NetScaler has a ton of flexibility for working with many certificate formats—PEM and DER encoding, PKCS#12 bundles, etc.—but I find that it's easiest and most flexible when using individual, single-certificate (or key) PEM-type, Base64-encoded text files. It's easiest if you just have them ready-to-go; if you don't, you can learn about using OpenSSL, or you can simply use an online converter like SSL Shopper's Certificate Tools. Personally, I use a local copy of OpenSSL.

For the purpose of this tutorial, I'm going to assume you have all the certificates you need, already in PEM format.

SSL handling in the NetScaler

The SSL feature must be enabled to do any sort of SSL load balancing or proxy configuration; it is enabled in the same place that Load Balancing and Content Switching is enabled, off the System->Settings menu:

Preparing Certificates

Once that's enabled, the yellow warning symbol for the Traffic Management function disappears. The first step to managing certificates is to get certificate files uploaded to the NetScaler. Select the SSL option itself:

then "Manage Certificates / Keys /CSRs" in the Tools section of the right-hand column.
The dialog resembles a file management window because it essentially is: it's a tool that lets you upload certificate files to the NetScaler's certificate store. Click [Upload...] to load the certificate files on the NetScaler. You'll need both the certificate and its private key, plus any CA certificates—including intermediates—that were used in a signing chain.

Once you have your certificates loaded, close the file dialog and expand the SSL menu tree and select Certificates

Click [Install...]. This process both creates a configuration object that the NetScaler can use to bind certificates to interfaces, and it gives you the opportunity to link certificates together if they form a signing chain. Although you can also use this interface to perform the upload function, I find it works more consistently—especially when handling filenames—to upload in one step, then install.

The server certificate itself needs to be installed using both the certificate and its key file; signing CAs can be loaded with just the certificate file.

Once all the certificates in the chain are loaded, select the server certificate and click the [Action] dropdown, then the "Link..." option. 

If you've got a recognized file and the CA that signed the file is already installed, it will be pre-selected in the Link Certificate dialog. Click [OK].
Repeat with any other certificates in the chain, back to the CA root.


Creating the Content Switching Configuration

With minor exceptions, we'll follow the same process for creating a standard HTTP content-switching config. Specific differences will be highlighted using italic typeface.
  1. If they don't exist already, create your server entries. Because I'm building on the work previously documented, my servers are already present.
  2. Create SSL-based services for the servers; configure https as a monitor:
  3. Create SSL-based Load Balancing Virtual Servers
    1. Set the protocol to SSL
    2. Disable "Directly Addressable"
    3. Enable the SSL-based service
    4. Switch to the SSL Tab
    5. Highlight the server certificate and click [Add >] to bind the certificate to the server
  4. Create the new Content Switching policies. We can't use the previous ones—even if they're functionally identical—because we're going to use them on a different CS Virtual Server.
  5. Create (or modify) an SSL-based Content Switching Virtual Server
    1. Set the protocol to SSL
    2. Set the IP address for the virtual server. It can be the same address as the HTTP virtual server.
    3. Insert policies and set targets to SSL-based targets
    4. Switch to the SSL Tab
    5. Highlight the server certificate and click [Add >] to bind the certificate to the server
    6. Highlight the next CA cert in the signing chain; click the drop-down arrow on the [Add >] button and select [as CA >] to add the signing cert.
    7. Repeat step 5 for all remaining certificates in the signing chain.
    8. Click [Create] when complete.
As soon as the configuration is "settled" in the innards of the NetScaler, the "State" should indicate that it is "Up" and you can again test using your HOSTS file. Note: you may still get a certificate error if your URL doesn't match the name in the certificate bound to the Content Switching virtual server (eg, a short name will not properly resolve against a domain wildcard certificate).

Parts in this series:

HTTP Reverse Proxy using Citrix NetScaler VPX Express

Part 4 in a series

So far: the first three parts of this series dealt with the introduction of a problem (multiple servers behind a NAT firewall that use the same port) and solution (Citrix NetScaler VPX Express); laying the groundwork for configuring the solution; an overview of what we'll be configuring.

Because it is possible to set up content switching with a single host (the degenerate case), this is the method we'll begin with. While it doesn't really do much for us, simply repeating the steps for a second (and subsequent) will result in a working solution. Other guides lay down the steps with two hosts already in mind, and teasing apart the pieces to apply it to your situation might be more difficult.

Groundwork

Some planning must be done prior to doing this setup. The first is a set of IP addresses that you'll need to have handy. This post will use the following addresses; substitute them with your own:
HostIP
CS Virtual Server192.168.106.37
Target Server A192.168.106.38
Target Server B192.168.106.39

Enable Features

The bare-bones install of the NetScaler has a number of features enabled, but the ones we need for content switching are disabled. Open the System configuration tree and select Settings

Select "Configure basic features" and make sure the following features are enabled (checked):
  • Load Balancing
  • Content Switching
If you selected "Traffic Management" in the left menu before and after enabling the feature, this is what you'd see:
Default, features disabled
LB and CS enabled
Begin the setup by expanding "Load Balancing" under "Traffic Management" and select "Servers":

In the center section, click [Add...] and create the server. The "Server Name" is an identifier used in the NetScaler; it does NOT have to be the FQDN or short name for the server.


Then switch to the Services option

and create a protocol-specific entry for the server, including a monitor
(I like to use http because it doesn't require any customization; a custom http-ecv monitor can be created to check for the explicit function of the target server, but that's beyond the scope of this series).

I also recommend using a naming convention that includes the type of object you're creating ('svc' for the service) and the protocol it's tied to ('http'); that will make it more obvious where a given object comes from when you see them bound in other places.

Switch to the Virtual Servers menu


and click [Add...] to build the virtual server.

Make sure you uncheck the "Directly Addressable" option; this eliminates the need to give the virtual server its own address (we want to give an address to the Content Switching virtual server) and select the service we just created.

Switch to the Content Switching menu and select "Policies"


Click [Add...] to create a policy to trigger sending the traffic based on the hostname used in the HTTP header.

Select the Virtual Servers option under Content Switching

and click [Add..] to create a new virtual server.
This server gets the IP address to which we'll be forwarding traffic.

Click "Insert Policy" to insert a new policy

Select the new policy from the drop-down, then pull down the list of targets, selecting the new load balancing server. You will get a warning about the "Goto Expression"

Select [Yes], then [Create] to make the server.

At this point, your setup should function for the first server you configured!

Now: go back to the step for creating the outside server and repeat except for creating a new Content Switching server.




Now: Open the existing server

and add another policy, using the new server's policy and LB virtual server entry:




You can test this internally by either updating your DNS server entries or adding a line to your machine's HOSTS file:
192.168.106.37 serverA serverB

Point your browser at http://serverA after you make the change, and voila!, you get to the target. Switch to http://serverB, and you get that target instead.

Once you've verified the functionality from the inside, update the forwarding on your NAT firewall and test using an outside address (eg, use a cell phone that's not on your home WiFi).

Parts in this series: