Showing posts with label ntfs. Show all posts
Showing posts with label ntfs. Show all posts

Friday, February 26, 2016

NTFS, dedupe, and the "large files" conundrum.

Microsoft did the world a huge favor when they added the deduplication feature to NTFS with the release of Windows Server 2012. We can have a discussion outside of this context on whether inline or post-process dedupe would have been better (the NTFS implementation is post-process), but the end result is something that seems to have minimal practical impact on performance but provides huge benefits in storage consumption, especially on those massive file servers that collect files like a shelf collects dust.

On the underside, the dedupe engine collects the duplicate blocks and hides them under the hidden "System Volume Information" folder and leaves pointers in the main MFT. You can do a disk size scan and see very little on-disk capacity taken by a given folder, yet a ginormous amount of disk is being consumed in that hidden folder.


See that little slice of color on the far left? That's the stub of files that aren't sitting in the restricted dedupe store. The statistics tell a different story:


200GB of non-scannable data (in the restricted store) versus 510MB stored in the "regular" MFT space. Together they comprise some 140K files in 9K folders, and the net action of dedupe is saving over 50GB in capacity on that volume:


The implementation is fairly straightforward, and I've found few instances where it didn't save the client a bunch of pain.

Except when used as a backup target.

Personally, I though this was the perfect use case—and it is, but with the caveats discussed herein—because backup tools like Veeam can perform deduplication within a backup job, but job-to-job deduplication isn't in the cards. Moving the backup repository to a deduplicating volume would save a ton of space, giving me either space to store more data or more restore points for existing backups.

Unfortunately, I ran into issues with it after running backups for a couple of weeks. Everything would run swimmingly for a while, then suddenly backups would fail with filesystem errors. I'd wipe the backup chain and start again, only to have it happen again. Fed up, I started searching for answers...

Interestingly, the errors I was receiving (The requested operation could not be completed due to a file system limitation.) go all the way back to limitations on NTFS without deduplication, and the early assertions by Microsoft that "defragmentation software isn't needed with NTFS because it protects itself from fragmentation." Anyone else remember that gem?!? Well, the Diskeeper folks were able to prove that NTFS volumes do, in fact, become fragmented, and a cottage industry of competing companies popped up to create defrag software. Microsoft finally relented and not only agreed that the problem can exist on NTFS, but they licensed a "lite" version of Diskeeper and included it in every version of Windows since Windows 2000. They also went so far as to add additional API calls to the filesystem and device manager so that defragger software could better operate in a safe manner than "working around" the previous limitations.

I digress...

The errors and the underlying limitation have to do with the way NTFS handles file fragmentation. It has special hooks to readily locate multiple fragments across the disk (which is, in part, why Microsoft argued that a fragmented NTFS volume wouldn't suffer the same sort of performance penalty that an equivalently-fragmented FAT volume would experience), but the data structures to hold that information is a fixed resource. Once volume fragmentation reaches a certain level, the data structures are exhausted and I/O for the affected file is doomed. The fix? Run a defragger on the volume to free up those data structures (every fragment consumes essentially one entry in the table, so the fewer fragments that exist, the fewer table resources are consumed, irrespective of total file size) and things start working again.

Enter NTFS deduplication

Remember that previous description of how the dedupe engine will take duplicate blocks from the volume—whether they're within a single file or across multiple—and put it in the System Volume Information folder, then leave a pointer in the main MFT to let multiple files (or the same file) access to that block?

Well, we just deliberately engineered a metric crapton (yes, that's a technical description) of intentional fragmentation on the volume. So when individual deduplicated files grow beyond a certain size (personal evidence says it's ~200GB, but posts I've found here and there say it's as little as 100GB while MS says it's 500GB https://support.microsoft.com/en-us/kb/2891967) you can't do anything with the file. Worse, defrag tools can't fix it, because this fragmentation isn't something that the algorithms can "grab"; the only real fix—other than throwing away the files and starting over—is to disable dedupe. And if you're near the edge of capacity due to the benefit of dedupe, even that's no option: rehydrating the file will blow past your capacity. Lose-lose.

Luckily, Microsoft identified the issue and gave us a tool when building volumes intended for deduplication: "large files" flag in the format command. Unfortunately, as you might guess when referring to "format," it's destructive. The structures that are laid down on the physical media when formatting a volume are immutable in this case; only an evacuation and reformat fixes the problem.

Given that restriction, wouldn't it be helpful to know if your existing volumes support large files (ie extreme fragmentation) before you enable deduplication? Sure it would!

The filesystem command "fsutil" is your friend. From an administrative command prompt, run the following command + arguments (this is an informational argument that makes no changes to the volume, but requires administrative access to read the system information):

fsutil fsinfo ntfsinfo <drive letter>



Notice the Bytes Per FileRecord Segment value? On a volume that does not support high levels of fragmentation, you'll see the default value of 1024. You'll want to reformat that volume with the "/L" argument before enabling dedupe for big backup files on that bad boy. And no, the ability to do that format argument is not available in the GUI when creating a new volume; you've got to use the command line.

What does it look like after you've reformatted it? Here you go:


The Bytes Per FileRecord Segment value jumps up to the new value of 4096.

You'll still want to adhere to Microsoft's dedupe best practices (https://msdn.microsoft.com/en-us/library/windows/desktop/hh769303(v=vs.85).aspx), and if you're reformatting it anyway, by all means make sure you do it with the 64K cluster size so you don't run into any brick walls if you expect to expand the volume in the future. Note that the fsutil command also shows the volume's cluster size (Bytes per Cluster) if you're wanting to check that, too.

Special thanks to fellow vExpert Frank Buechsel, who introduced me to using fsutil for this enquiry.

Wednesday, June 3, 2015

Maximum NTFS Volume Expansion

A peer recently had an issue when working on a client system: After adding a second shelf of SAS-attached drives to a physical Windows Storage Server and doubling the available capacity of the environment from ~20TB to ~40TB, he was unable to extend the existing NTFS volume after extending the SAS array group.

The error was "The volume cannot be extended because the number of clusters will exceed the maximum number of clusters supported by the filesystem."
The original volume was reportedly formatted "using the defaults," which under most circumstances would mean it was using 4K clusters. Why wouldn't it allow extending the volume?

Because NTFS (as currently implemented) has a cluster limit of 232-1 clusters per volume.

When you "do the math," that cluster limit does impose some hard limits on the maximum size of the NTFS volume, irrespective of the actual drive space that is available for the volume. And trying to use tricks like dynamic disks and software RAID won't help: those tricks modify the underlying disk structure, not the NTFS filesystem that "rides" on top of it.

Max NTFS Volume by Cluster Size
cluster
size (B)
Bytes KB MB GB TB
512 2,199,023,255,040 2,147,483,648 2,097,152 2,048 2
1024 4,398,046,510,080 4,294,967,295 4,194,304 4,096 4
2048 8,796,093,020,160 8,589,934,590 8,388,608 8,192 8
4096 17,592,186,040,320 17,179,869,180 16,777,216 16,384 16
8192 35,184,372,080,640 34,359,738,360 33,554,432 32,768 32
16384 70,368,744,161,280 68,719,476,720 67,108,864 65,536 64
32768 140,737,488,322,560 137,438,953,440 134,217,728 131,072 128
65536 281,474,976,645,120 274,877,906,880 268,435,456 262,144 256

We knew that we had a functioning 20TB volume, so we verified my theory that the volume was actually formatted with 8K clusters (the smallest size that would support 20TB) using DISKPART's FILESYSTEM command. Sure enough: 8192 was the cluster size.

We gave the client several options for addressing the issue, including the purchase of software that could "live adjust" the cluster sizing. In the end, the client chose the "migrate->reformat->migrate" option, and while it took a long time to perform (20TB is a lot of data!), it was successful.

Friday, August 24, 2012

Remove security warning from Internet-sourced files


Ever been setting up or managing a system and run into a prompt like this:

It’s probably because you grabbed the original executable from an Internet site.

Using a modern browser to grab the file will typically result in a special NTFS stream added to the originally-downloaded file (eg, bginfo.zip) that gets promulgated to the executable you’re trying to run.

This can be a good thing when you're trying out software, but how do you fix it when you know you can trust the file? This sort of thing can be come quite annoying if it's tied to a Startup item like BGInfo.

The best solution is to “unblock” the file you download; that keeps the stream from being added to the extracted file(s). But what if you’ve already extracted them?

Same solution, but you apply it to the executable instead of the download. Right-click on the file to unblock, then select properties. You should see something a bit like this:
Note the [Unblock] button at the bottom. If you click that and save the properties, the NTFS stream metadata is removed from the file, and you won’t get the popup message whenever the app is run.

When I'm retrieving trusted files from my own web servers, I’ve simply gotten into the habit of unblocking files as soon as I download them; if the ZIP or installer file doesn’t have that metadata, the extracted files won’t inherit them.

Also: there’s no way to mass-unblock files; if you select a group of files and choose properties, you don’t get the option to edit the security. If you're downloading a zip file full of executables (like the SysInternals Suite), you definitely want to unblock the ZIP file before extracting it, or you'll have to unblock each executable individually.

Friday, April 15, 2011

Booting an NTFS-formatted thumbdrive

There are times when you'd like to boot from a thumbdrive that is formatted for NTFS. You dutifully follow instructions for copying bootable code to the drive, but it refuses to boot. What's wrong?
Unless you set the boot sector to make the drive bootable, no amount of "diddling" the data on the drive will make a difference.

So: how do you get the boot sector set?

Check your workstation to see if you have a copy of "bootsect.exe" on your system. If you do, you're already set; if you don't, you'll need to grab a copy from your installation media.

Once you have it, format your thumbdrive with NTFS, then issue the following command on the drive letter for the thumbdrive:
bootsect /nt60 x:
where "x:" is the letter for the thumbdrive:


Add your custom files, and the thumbdrive should behave as you'd expect.