What to use deduplication for?
2112 words, 10 minutes
I recently discover a storage feature named “Data deduplication” also called “Deduplication”.
Quoting Wikipedia:
In computing, data deduplication is a specialized data compression technique for eliminating coarse-grained redundant data, typically to improve storage utilization. In the deduplication process, duplicate data is deleted, leaving only one copy of the data to be stored, along with references to the unique copy of data. Deduplication is able to reduce the required storage capacity since only the unique data is stored.
I was first thinking “well, a video/document/… is a file ; and a file is a sets of 0 and/or 1. Like a ZIP archive doesn’t care if it stores pictures or text files, I may be able to use deduplication to store my 32GB of personnal pictures into a smaller storage size… sounds great!”. That’s what I want to figure out.
POC environnement
I read a bunch of documentations and reports about deduplication that claim it was a great way to reduce storage space. Many of them cited virtual machine storage. Some were pointing email storage. I already know that Microsoft Exchange stores email and attachment in such a way that the space is saved ; basically, storing the data once and storing links to this data when it can.
But I found nothing on using deduplication to store your 64GB of holiday HD movies on a 16GB USB stick… Doesn’t it work ? Or would sticks reseller rather hide this from the buying crowd ?
The guys from the NexentaStor Project are king enough to provide a VMware image of their free Community Edition. It is ready to use with VMware virtualisation products and offers “up to 12TB of storage” ; much more than I require.
I downloaded the appliance, attached six 2GB SCSI virtual disks and run it
with VMware Fusion.
Depending on the testings, I configured various volume size and configurations. The only required thing was that ZFS was used.
Note that deduplication is only available since ZFS Pool Version 20. This leaves out the current FreeBSD and OpenSolaris implementations. NexentaStor uses v22.
Most of the configuration will be done through the Web interface. Here, it’s available on http://192.168.12.129:2000/.
I first created a RAIDZ volume with all available disks and deduplication set to “sha256,verify”:
nmc@nexenta:/$ zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
Stockage 11.9G 182K 11.9G 0% 1.00x ONLINE -
syspool 7.94G 2.37G 5.57G 29% 1.00x ONLINE -
nmc@nexenta:/$ zfs list
NAME USED AVAIL REFER MOUNTPOINT
Stockage 151K 9.70G 46.5K /volumes/Stockage
Then I created a volume using the whole space, the default record size of 128K and deduplication set to “sha256,verify”:
nmc@nexenta:/$ zfs list -o name,used,avail,refer,mounted,quota,dedup,compress
NAME USED AVAIL REFER MOUNTED QUOTA DEDUP COMPRESS
Stockage 222K 9.70G 48.1K yes none sha256,verify off
Stockage/Dropbox 46.5K 9.70G 46.5K yes none sha256,verify off
Finally, I enabled CIFS (SAMBA) sharing on this volume. That enables copying
data from remote computer. That’s the usual way to provide shared folder to
Windows computers. I’ll be copying from a Snow Leopard MacBook Pro but that
shouldn’t change anything at all.
The default configuration is “Anonymous Read-Write” but I enabled a full
access to a dedicated users. Mostly because OSX and Nexenta don’t work
together out of the box with anonymous access. The shared folder gets mounted
as //me@192.168.12.129/stockage_dropbox
.
Deduplication and text documents
I dropped about 550MB of text documents. Those are TXT, HTML, PDF and DOC files grabbed from various technical sources:
nmc@nexenta:/$ zpool list Stockage
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
Stockage 11.9G 662M 11.2G 5% 1.00x ONLINE -
humpf… not really convincing…
Let’s make a pure full copy of that directory on the same ZFS volume:
nmc@nexenta:/$ zpool list Stockage
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
Stockage 11.9G 668M 11.2G 5% 1.99x ONLINE -
Looks better ! Two times 550MB are stored using only 668MB.
That’s probably not useful in my environment… I rarely store two times the same file. But in the enterprise environment, this can make sense. There are many times when you attach a document to a mail, send it to someone who will end storing the attachment on the filer.
Let’s try with my full documentation repository 2.48GB. This would add PPT, RTFD and a bunch of other files. Maybe deduplication works better when you store really really much stuff…
nmc@nexenta:/$ zpool list Stockage
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
Stockage 11.9G 2.72G 9.15G 22% 1.02x ONLINE -
I’ll consider this as a “NO”.
Deduplication and images
Let’s see what happens when I copy 8.4GB from my photo library onto the ZFS volume:
nmc@nexenta:/Stockage/Dropbox$ zpool list Stockage
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
Stockage 11.9G 9.90G 1.98G 83% 1.00x ONLINE -
Same thing as with text documents… After all, at the disk level, there is no way to guess if a file a DOC, JPEG, PPT or PNG… So we gotta dig in some other directions…
Deduplication and iso files
I’ve read that dedup was nice in virtual machine environments. And to setup such environments, you have to store the installation ISO files. Let’s have a look at what we can get here.
I selected a few of the ISO files (3,1GB) we use (or used) here at work. Namely, Windows XP 32-bit and 64-bit, Windows XP with SP3 included and Windows Server 2003 Standard and Enterprise editions. There must be redundant things in those files…
nmc@nexenta:/Stockage/Dropbox$ zpool list Stockage
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
Stockage 11.9G 3.49G 8.38G 29% 1.00x ONLINE -
Damm… Can’t understand how dedup doesn’t find redundant parts between a Standard and an Enterprise edition of Windows 2003… Come on guy… Nothing to be deduplicated in those two 600GB ISO files ???
Deduplication and virtual machines
One says deduplication is great for virtual machine storage… Well, I used VMware Fusion to create two Windows Server 2003, a Windows XP and an OpenBSD virtual machines and stored their whole sets of files in the NexentaStor shared folder:
nmc@nexenta:/Stockage/Dropbox$ ls -alh
total 209
drwxr-xr-x+ 7 root root 8 Nov 11 16:50 .
drwxr-x---+ 2 root sys 3 Nov 10 15:15 .$EXTEND
drwxr-xr-x 3 root root 3 Nov 10 15:06 ..
-rwx------+ 1 jdoe staff 39K Nov 11 16:50 .DS_Store
drwx------+ 3 jdoe staff 14 Nov 11 18:19 OpenBSD #1.vmwarevm
drwx------+ 5 jdoe staff 12 Nov 11 22:17 Win2K3 #1.vmwarevm
drwx------+ 5 jdoe staff 12 Nov 11 22:14 Win2K3 #2.vmwarevm
drwx------+ 4 jdoe staff 12 Nov 11 19:30 WinXP #1.vmwarevm
nmc@nexenta:/Stockage/Dropbox$ zfs list -o name,used,avail,refer,mounted,quota,dedup,compress
NAME USED AVAIL REFER MOUNTED QUOTA DEDUP COMPRESS
Stockage 6.19G 3.80G 48.1K yes none sha256,verify off
Stockage/Dropbox 6.16G 3.80G 6.16G yes none sha256,verify off
nmc@nexenta:/Stockage/Dropbox$ zpool list Stockage
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
Stockage 11.9G 7.10G 4.77G 59% 1.04x ONLINE -
There used to be a nice DEDUP value when the VM were newly created. But since I finished the installation, configured “download but not automatically install updates” and leave the VM up for two days, the DEDUP value got down to this very low value…
The fact that VMware Fusion uses special technics to keep the VM disk small may have something to deal with poor dedup ratio. The Windows Server 2003 VMs have a 3GB disk but the whole sets only takes 2,24GB in the shared folder.
Deduplication and backup archives
Backup is usually configured with a policy that looks like:
- Full backup on Day 1
- Incremental backups on Day 2 to Day 7
- Keep n full backups
Incremental parts of backups would probably not deduplicate well ; after all,
there are the only varying bits of the data.
So I’ll look at the “archiving” part of backups. That is, the full backups
that have bits changed but also that may have lots in common.
What I’ll do is create archive files (ZIP) which contains more and more of my personal “Documents” repository. For example, archive ZIP1 will contain directories DIR1 and DIR2 ; archive ZIP2 will contain directories DIR1, DIR2 and DIR3 ; etc… Let’s see what we get (when copying one archive after the other) :
nmc@nexenta:/Stockage/Dropbox$ ls -alh
total 7203984
drwxr-xr-x+ 3 root root 8 Nov 12 00:10 .
drwxr-x---+ 2 root sys 3 Nov 10 15:15 .$EXTEND
drwxr-xr-x 3 root root 3 Nov 10 15:06 ..
-rwx------+ 1 jdoe staff 39K Nov 12 00:10 .DS_Store
-rwx------+ 1 jdoe staff 13M Nov 11 23:54 Archive 1.zip
-rwx------+ 1 jdoe staff 494M Nov 11 23:55 Archive 2.zip
-rwx------+ 1 jdoe staff 1.4G Nov 11 23:58 Archive 3.zip
-rwx------+ 1 jdoe staff 1.5G Nov 12 00:03 Archive 4.zip
nmc@nexenta:/Stockage/Dropbox$ zfs list -o name,used,avail,refer,mounted,quota,dedup,compress
NAME USED AVAIL REFER MOUNTED QUOTA DEDUP COMPRESS
Stockage 3.48G 6.64G 48.1K yes none sha256,verify off
Stockage/Dropbox 3.46G 6.64G 3.46G yes none sha256,verify off
nmc@nexenta:/Stockage/Dropbox$ zpool list Stockage
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
Stockage 11.9G 3.68G 8.20G 30% 1.14x ONLINE -
Not much deduplicated when you think that the archives 3 and 4 have 494MB in common with archive 2 and that archive 3 and 4 have 1.4GB in common… Well, at least in the functional point of view.
Here’s the results of a test with two mySQL dumps:
nmc@nexenta:/Stockage/Dropbox$ ls -alh
total 8292731
drwxr-xr-x+ 3 root root 6 Nov 12 07:36 .
drwxr-x---+ 2 root sys 3 Nov 10 15:15 .$EXTEND
drwxr-xr-x 3 root root 3 Nov 10 15:06 ..
-rwx------+ 1 jdoe staff 39K Nov 12 00:10 .DS_Store
-rwx------+ 1 jdoe staff 2.0G Sep 24 21:14 zarafa.dump
-rwx------+ 1 jdoe staff 1.9G Nov 12 07:29 zarafa.dump.old
nmc@nexenta:/Stockage/Dropbox$ zfs list -o name,used,avail,refer,mounted,quota,dedup,compress
NAME USED AVAIL REFER MOUNTED QUOTA DEDUP COMPRESS
Stockage 4.00G 5.71G 48.1K yes none sha256,verify off
Stockage/Dropbox 3.98G 5.71G 3.98G yes none sha256,verify off
nmc@nexenta:/Stockage/Dropbox$ zpool list Stockage
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
Stockage 11.9G 4.82G 7.06G 40% 1.00x ONLINE -
Those dumps are full of e-mail (from Zarafa) and web stuff (from WordPress and Drupal sandbox).
Here’s what happens when I drop the 2010 syslog’s content:
nmc@nexenta:/Stockage/Dropbox$ ls -alh
total 159
drwxr-xr-x+ 11 root root 12 Nov 12 07:55 .
drwxr-x---+ 2 root sys 3 Nov 10 15:15 .$EXTEND
drwxr-xr-x 3 root root 3 Nov 10 15:06 ..
-rwx------+ 1 jdoe staff 39K Nov 12 00:10 .DS_Store
drwx------+ 10 jdoe staff 10 Mar 24 2010 10.0.0.29
drwx------+ 9 jdoe staff 9 Feb 25 2010 airport
drwx------+ 28 jdoe staff 241 Nov 12 00:05 akela
drwx------+ 23 jdoe staff 23 Jun 1 22:04 guarana
drwx------+ 18 jdoe staff 22 Nov 7 23:24 luuna
drwx------+ 17 jdoe staff 17 Mar 8 2010 pak
drwx------+ 18 jdoe staff 23 Oct 30 00:05 thundera
drwx------+ 18 jdoe staff 299 Nov 12 00:09 zarafa
nmc@nexenta:/Stockage/Dropbox$ zfs list -o name,used,avail,refer,mounted,quota,dedup,compress
NAME USED AVAIL REFER MOUNTED QUOTA DEDUP COMPRESS
Stockage 180M 9.53G 48.1K yes none sha256,verify off
Stockage/Dropbox 165M 9.53G 165M yes none sha256,verify off
nmc@nexenta:/Stockage/Dropbox$ zpool list Stockage
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
Stockage 11.9G 222M 11.7G 1% 1.02x ONLINE -
The logs are zipped but there still would have a lot’s of redundancy in those files…
Out of scope
While copying the images, I had a look at I/O operations:
nmc@nexenta:/Stockage/Dropbox$ zpool iostat -v Stockage
capacity operations bandwidth
pool alloc free read write read write
---------- ----- ----- ----- ----- ----- -----
Stockage 3.94G 7.94G 5 86 174K 2.27M
raidz1 3.94G 7.94G 5 86 174K 2.27M
c2t0d0 - - 2 19 33.6K 471K
c2t1d0 - - 1 18 32.6K 470K
c2t2d0 - - 2 19 33.6K 471K
c2t3d0 - - 1 18 32.5K 470K
c2t4d0 - - 2 19 33.5K 472K
c2t5d0 - - 2 18 33.0K 470K
---------- ----- ----- ----- ----- ----- -----
That’s nice to see much more write than read operations when performing a global write.
RAIDZ1 gave a 1:17 ratio (r/w) and 7.94G of storage
RAID0 gave a 0:132 ratio (r/w) and 11GB of storage.
RAID1 gave a 1:54 ratio (r/w) and 1.85GB of storage.
RAID5 is not achievable with NexentaStor.
RAID10 gave 1:83 ratio (r/w) and 5.49GB of storage.
Obviously RAID0 rocks for writing operations… if you accept to loose all
your data on a single disk failure.
My RAID1 configuration is highly redundant… but the bandwidth is limited to
what the slowest disk can achieve.
RAID10 still looks better for databases access ; but half the storage is used
for redundancy.
Conclusion
I couldn’t find a scenario where reduplication really helps preserving storage.
Setting “deduplication” to “on” (rather than “sha256,verify”) doesn’t seem to
change dedup ratio.
Switching to “record size” of “4KB” or “512B” doesn’t seem to change the dedup
ratio either.
There’s probably something I missed here but I don’t guess what…
Sources
ZFS Deduplication, by Jeff Bonwick
Deduplication now in ZFS
Guide d’administration Oracle Solaris ZFS