Friday, February 26, 2016

NTFS, dedupe, and the "large files" conundrum.

Microsoft did the world a huge favor when they added the deduplication feature to NTFS with the release of Windows Server 2012. We can have a discussion outside of this context on whether inline or post-process dedupe would have been better (the NTFS implementation is post-process), but the end result is something that seems to have minimal practical impact on performance but provides huge benefits in storage consumption, especially on those massive file servers that collect files like a shelf collects dust.

On the underside, the dedupe engine collects the duplicate blocks and hides them under the hidden "System Volume Information" folder and leaves pointers in the main MFT. You can do a disk size scan and see very little on-disk capacity taken by a given folder, yet a ginormous amount of disk is being consumed in that hidden folder.


See that little slice of color on the far left? That's the stub of files that aren't sitting in the restricted dedupe store. The statistics tell a different story:


200GB of non-scannable data (in the restricted store) versus 510MB stored in the "regular" MFT space. Together they comprise some 140K files in 9K folders, and the net action of dedupe is saving over 50GB in capacity on that volume:


The implementation is fairly straightforward, and I've found few instances where it didn't save the client a bunch of pain.

Except when used as a backup target.

Personally, I though this was the perfect use case—and it is, but with the caveats discussed herein—because backup tools like Veeam can perform deduplication within a backup job, but job-to-job deduplication isn't in the cards. Moving the backup repository to a deduplicating volume would save a ton of space, giving me either space to store more data or more restore points for existing backups.

Unfortunately, I ran into issues with it after running backups for a couple of weeks. Everything would run swimmingly for a while, then suddenly backups would fail with filesystem errors. I'd wipe the backup chain and start again, only to have it happen again. Fed up, I started searching for answers...

Interestingly, the errors I was receiving (The requested operation could not be completed due to a file system limitation.) go all the way back to limitations on NTFS without deduplication, and the early assertions by Microsoft that "defragmentation software isn't needed with NTFS because it protects itself from fragmentation." Anyone else remember that gem?!? Well, the Diskeeper folks were able to prove that NTFS volumes do, in fact, become fragmented, and a cottage industry of competing companies popped up to create defrag software. Microsoft finally relented and not only agreed that the problem can exist on NTFS, but they licensed a "lite" version of Diskeeper and included it in every version of Windows since Windows 2000. They also went so far as to add additional API calls to the filesystem and device manager so that defragger software could better operate in a safe manner than "working around" the previous limitations.

I digress...

The errors and the underlying limitation have to do with the way NTFS handles file fragmentation. It has special hooks to readily locate multiple fragments across the disk (which is, in part, why Microsoft argued that a fragmented NTFS volume wouldn't suffer the same sort of performance penalty that an equivalently-fragmented FAT volume would experience), but the data structures to hold that information is a fixed resource. Once volume fragmentation reaches a certain level, the data structures are exhausted and I/O for the affected file is doomed. The fix? Run a defragger on the volume to free up those data structures (every fragment consumes essentially one entry in the table, so the fewer fragments that exist, the fewer table resources are consumed, irrespective of total file size) and things start working again.

Enter NTFS deduplication

Remember that previous description of how the dedupe engine will take duplicate blocks from the volume—whether they're within a single file or across multiple—and put it in the System Volume Information folder, then leave a pointer in the main MFT to let multiple files (or the same file) access to that block?

Well, we just deliberately engineered a metric crapton (yes, that's a technical description) of intentional fragmentation on the volume. So when individual deduplicated files grow beyond a certain size (personal evidence says it's ~200GB, but posts I've found here and there say it's as little as 100GB while MS says it's 500GB https://support.microsoft.com/en-us/kb/2891967) you can't do anything with the file. Worse, defrag tools can't fix it, because this fragmentation isn't something that the algorithms can "grab"; the only real fix—other than throwing away the files and starting over—is to disable dedupe. And if you're near the edge of capacity due to the benefit of dedupe, even that's no option: rehydrating the file will blow past your capacity. Lose-lose.

Luckily, Microsoft identified the issue and gave us a tool when building volumes intended for deduplication: "large files" flag in the format command. Unfortunately, as you might guess when referring to "format," it's destructive. The structures that are laid down on the physical media when formatting a volume are immutable in this case; only an evacuation and reformat fixes the problem.

Given that restriction, wouldn't it be helpful to know if your existing volumes support large files (ie extreme fragmentation) before you enable deduplication? Sure it would!

The filesystem command "fsutil" is your friend. From an administrative command prompt, run the following command + arguments (this is an informational argument that makes no changes to the volume, but requires administrative access to read the system information):

fsutil fsinfo ntfsinfo <drive letter>



Notice the Bytes Per FileRecord Segment value? On a volume that does not support high levels of fragmentation, you'll see the default value of 1024. You'll want to reformat that volume with the "/L" argument before enabling dedupe for big backup files on that bad boy. And no, the ability to do that format argument is not available in the GUI when creating a new volume; you've got to use the command line.

What does it look like after you've reformatted it? Here you go:


The Bytes Per FileRecord Segment value jumps up to the new value of 4096.

You'll still want to adhere to Microsoft's dedupe best practices (https://msdn.microsoft.com/en-us/library/windows/desktop/hh769303(v=vs.85).aspx), and if you're reformatting it anyway, by all means make sure you do it with the 64K cluster size so you don't run into any brick walls if you expect to expand the volume in the future. Note that the fsutil command also shows the volume's cluster size (Bytes per Cluster) if you're wanting to check that, too.

Special thanks to fellow vExpert Frank Buechsel, who introduced me to using fsutil for this enquiry.

Saturday, December 19, 2015

Veeam 9 and StoreOnce Catalyst

HPE has offered their StoreOnce deduplication platform as a free, 1TB virtual appliance for some time (the appliance is also available for licensed 5TB and 10TB variants). As a competitor for other dedupe backup targets, it offers similar protocols and features: virtual tape library, SMB (although they persist in calling it CIFS), NFS...and a proprietary protocol branded as Catalyst.
StoreOnce protocols
Catalyst is part of a unified protocol from HPE that ties together several different platforms, allowing "dedupe once, replicate anywhere" functionality. Like competing protocols, Catalyst also provides some performance improvements for both reads and writes as compared to "vanilla" file protocols.

Veeam has supported the StoreOnce platform since v8, but only through SMB (err... CIFS?) protocol. With the immanent release of Veeam 9—with support for Catalyst—I decided to give the free product a try and see how it works with v8, v9, and what the upgrade/migration process looks like.

HPE offers the StoreOnce VSA in several variants (ESXi stand-alone, vCenter-managed and Hyper-V) and is very easy to deploy, configure and use through its integrated browser-based admin tool. Adding a storage pool is as simple as attaching a 1TB virtual disk to the VM (ideally, on a secondary HBA) before initialization.

Creating SMB shares is trivial, but if the appliance is configured to use Active Directory authentication, share access must be configured through the Windows Server Manager MMC snap-in; while functional, it's about as cumbersome as one might think. StoreOnce owners would be well-served if HPE added permission/access functionality into the administrative console. Using local authentication eliminates this annoyance, and is possibly the better answer for a dedicated backup appliance...but I digress.

StoreOnce fileshare configuration
Irrespective of the authentication method configured on the appliance, local authentication is the only option for Catalyst stores, which are also trivial to create & configure. In practice, the data stored in a Catalyst store is not visible or accessible via file or VTL protocols—and vice-versa; at least one competing platform of which I'm familiar doesn't have this restriction. This functional distinction does make it more difficult to migrate stored data from one protocol to another; among other possible scenarios, this is particularly germane when an existing StoreOnce+Veeam user wishes to upgrade from v8 to v9 (presuming StoreOnce is also running a firmware version that is supported for Veeam's Catalyst integration) and has a significant amount of data in the file share "side" of the StoreOnce. A secondary effect is that there is no way to utilize the Catalyst store without a Catalyst-compatible software product: in my case, ingest is only possible using Veeam, whether it's one of the backup job functions or the in-console file manager.

Veeam 9 file manager
As of this writing, I have no process for performing the data migration from File to Catalyst without first transferring the data to an external storage platform that can be natively managed by Veeam's "Files" console. Anyone upgrading from Veeam 8 to Veeam 9 will see the existing "native" StoreOnce repositories converted to SMB repositories; as a side effect, file-level management of the StoreOnce share is lost. Any new Catalyst stores can be managed through the Veeam console, but the loss of file-management for the "share side" means there is no direct transfer possible. Data must be moved twice in order migrate from File to Catalyst; competing platforms that provide simultaneous access via file & "proprietary" protocols allow migration through simple repository rescans.

Administrative negatives aside, the StoreOnce platform does a nice job of optimizing storage use with good dedupe ratios. Prior to implementing StoreOnce (with Veeam 8, so only SMB access), I was using Veeam-native compression & deduplication on a Linux-based NAS device. With no other changes to the backup files, migrating them from the non-dedupe NAS to StoreOnce resulted in an immediate 2x deduplication ratio; modifying the Veeam jobs to dedupe appliance-aware settings (eg, no compression at storage) saw additional gains in dedupe efficiency. After upgrading to Veeam 9 (as a member of a partner organization, I have early to the RTM build)—and going through the time-consuming process of migrating the folders from File to Catalyst—my current status is approaching 5x, giving me the feeling that dedupe performance may be superior on the Catalyst stores as compared to File shares. As far as I'm concerned, this is already pretty impressive dedupe performance (given that the majority of the job files are still using sub-optimal settings) and I'm looking forward to increasing performance as the job files cycle from the old settings to dedupe appliance-optimized as retention points are aged out.

Appliance performance during simultaneous read, write operations
StoreOnce appliance performance will be variable, based not only on the configuration of the VM (vCPU, memory) but also on the performance of the underlying storage platform; users of existing StoreOnce physical appliances will have a fixed level of performance based on the platform/model. Users of the virtual StoreOnce appliance can inject additional performance into the system by upgrading the underlying storage (not to mention more CPU or memory, as dictated by the capacity of the appliance) to a higher performance tier.

Note: Veeam's deduplication appliance support—which is required for Catalyst—is only available with Enterprise (or Enterprise Plus) licensing. The 60-day trial license includes all Enterprise Plus features and can be used in conjunction with the free 1TB StoreOnce appliance license to evaluate this functionality in your environment, whether you are a current Veeam licensee or not.

Update

With the official release of Veeam B&R v9, Catalyst and StoreOnce are now available to those of you holding the Enterprise B&R licenses. I will caution you, however, to use a different method of converting from shares to Catalyst than I used. Moving the files does work, but it's not a good solution: you don't get to take advantage of the per-VM backup files that is a feature of v9 (if a backup starts with a monolithic file, it will continue to use it; only creating a new backup—or completely deleting the existing files—will allow per-VM files to be created. This is the preferred format for Catalyst, and the dedupe engine will work more efficiently with per-VM files than it will with monolithic files; I'm sure there's a technical reason for it, but I can vouch for it in practice. Prior to switching to per-VM files, my entire backup footprint, even after cycling through the monolithic files to eliminate dedupe-unfriendly elements like job-file compression, consumed over 1TB of raw storage with a dedupe ratio that never actually reached 5:1. After discarding all those jobs and starting fresh with cloned jobs and per-VM files, I now have all of my backups & restore points on a single 1TB appliance with room to spare and a dedupe ratio currently above 5:1.


I'm still fine-tuning, but I'm very pleased with the solution.

Monday, November 23, 2015

Long-term self-signed certs

While I'm a big proponent of using an enterprise-class certificate authority—either based on internal offline root/online issuing or public CAs—there are some instances when using a self-signed cert fits the bill. Unfortunately, most of the tools for creating a self-signed cert have defaults that result in less-than-stellar results: the digest algorithm is sha1, the cert is likely to have a 1024-bit key, and the extensions that define the cert for server and/or client authentication are missing.

With a ton of references discoverable on The Interwebz, I spent a couple of hours trying to figure out how to generate a self-signed with the following characteristics:

  • 2048-bit key
  • sha256 digest
  • 10-year certificate life (because, duh, I don't want to do this every year)
  • Accepted Use: server auth, client auth
It took pulling pieces from several different resources, documented herein:

Required Software

OpenSSL (command-line software)
Text editor (to create the config file for the cert)

Steps

  1. Create a text file that specifies the "innards" of the cert:
    [req]
    default_bits = 2048
    encrypt_key = no
    distinguished_name = req_dn
    prompt = no

    [ req_dn ]
    CN={replace with server fqdn}
    OU={replace with department}
    O={replace with company name}
    L={replace with city name}
    ST={replace with state name}
    C={replace with 2-letter country code}

    [ exts ]
    extendedKeyUsage = serverAuth,clientAuth
  2. Run the following openssl command (all one line) to create the new private key & certificate:
    openssl req -x509 -config {replace with name of config file created above} -extensions "exts" -sha256 -nodes -days 3652 -newkey rsa:2048 -keyout host.rsa -out host.cer
  3. Run the following openssl command to bundle the key & cert together in a bundle that can be imported into Windows:
    openssl pkcs12 -export -out host.pfx -inkey host.rsa -in host.cer

What's happening

The text file sets up a number of configuration items that you'd either be unable to specify at all (the extensions) or would have to manually input during creation (the distinguished name details).

The request in the second step creates a 2048-bit private key (host.rsa) and a self-signed certificate (host.cer) with a 10-year lifetime (3652 days) with the necessary usage flags and SHA256 digest.