I have a very large RAID 6 array that is used to store movies, tv shows, personal files, and various other things. It’s formatted capacity is about 36TB. Believe it or not, it’s pretty much full. It currently consists of 20x2TB hard drives and I really don’t want to add any more drives to it in its current form. Later this year I’m planning on building a new array to replace it, using fewer 6TB or 8TB drives. The server that manages the array had Server 2008R2 installed. After getting down to the last few gigs of free space it dawned on me, why not install Server 2012 R2 and set up data deduplication. I’ve read some pretty impressive articles online, where people were able to reclaim up to 60% of their storage using the dedup mechanism in Server 2012. So, I went ahead and upgraded. I started poking around and it wasn’t very obvious enabling dedup, so I put this guide together to help you get started.
Enabling Deduplication in Server 2012 R2
First, we need to install the Data Deduplication service. It’s part of File and Storage Services. Open Server Manager, select Local Server in the left side pane, then go to the Add Roles and Features wizard, under Manage.
Go through the first few windows, and when you get to Server Roles, you need to make sure Data Deduplication is selected, at minimum, under File and Storage Services. This is also a good opportunity to install any other roles or services you might be interested in.
After you finish the wizard and the services are installed, I suggest you go ahead and reboot. Once you’ve rebooted, you will see a new option in Server Manager called File and Storage Services.
Go ahead and select File and Storage Services, then go to Volumes. Right click on the Volume / Drive you want to enable dedup on, and select “Configure Data Deduplication.”
You now have a window where you can choose a few different options. Next to Data Deduplication, I suggest you select “General purpose file server” unless you have a really good reason to select VDI. You can also specify any folders you want to exclude from deduplication and specify a schedule. By default, background optimization is configured, and I suggest you leave it. Go ahead and click OK.
Now, you can sit back and wait (a couple hours, or days if you have a large array) and eventually the stats will update in the File and Storage Services window. However, I am impatient and wanted to see what it was doing and how quickly it was moving along. I also wanted the first “Optimization” to kick off immediately. For all this, we need to open up the Windows Powershell ISE. So, go ahead and fire it up.
Once you’re in the Powershell, there are a few commands we will be using. First and foremost, we want to kick off the Optimization so it gets started right away. To do this, use this command (substitute s: for the drive letter of your volume):
PS C:\Users\Administrator> start-dedupjob s: -type Optimization
This command kicks off an Optimization. Optimization is the process that deduplicates the data. Now, we want to monitor the progress. to do this, we will use the Get-DedupJob command.
PS C:\Users\Administrator> Get-DedupJob
As you can see, there is quite a lot of info given, including the progress %. Mine has been running for quite some time, and I’m at 95%.
Now, I want to see how much space is being saved and maybe get a little more info. For that, we’ll use the Get-DedupStatus command.
PS C:\Users\Administrator> Get-DedupStatus
As you can see, this gives you some more high level information.
Now you can just run those two commands periodically to see how things are moving along. If you have a large array, don’t be surprised if the first optimization takes days. Don’t worry, after the first optimization runs, they will move much quicker. I was able to save about 16TB of space on a 36TB array. So the savings are most definitely worth it. As far as performance goes, I have not noticed any difference as of yet. Time will tell.
If you run into any problems, feel free to post in the commands and I’ll try to assist. Thanks!