Sunday, January 14, 2007

ZFS Compression

This post is a result of a thread in oracle-l. It is provided to show the basics of ZFS compression.

This was done on my single core Sun Ultra 20 workstation running Solaris. The output has been slightly reformatted to make it fit the screen.


Note that this is a very simple case on a machine that is not doing any significant amount of work. Please don't use the information here as the basis of any decision you will make, specially for critical systems. Your mileage will most likely vary.

Here's a ZFS pool named p1 whose compression bit is turned on.


root@u20# zfs get compression p1
NAME PROPERTY VALUE SOURCE
p1 compression on local
root@u20#

Let's create two filesystems under this pool and name them u (for uncompressed) and c (for compressed).

root@u20# zfs create p1/u
root@u20# zfs create p1/c

By default, their compression bits are turned on because they inherit this characteristic from their parent (p1).

root@u20# zfs get compression p1/u p1/c
NAME PROPERTY VALUE SOURCE
p1/c compression on inherited from p1
p1/u compression on inherited from p1
root@u20#

Let's turn off the compression bit for the u (uncompressed) filesystem.

root@u20# zfs set compression=off p1/u

Let's verify that it was actually turned off.

root@u20# zfs get compression p1/u p1/c
NAME PROPERTY VALUE SOURCE
p1/c compression on inherited from p1
p1/u compression off local
root@u20#

Let's go get some reasonably-sized test file.

root@u20# pwd
/10046/tmp
root@u20# ls -l
total 1589543
-r--r--r-- 1 jforonda staff 2962342454 Sep 11 2004 test_ora_17255_t.trc
root@u20#

There... that file is an Oracle extended SQL trace file that is around 2.8G in size.

Let's make two copies of this file. First, to the uncompressed filesystem:

root@u20# ptime cp /10046/tmp/test_ora_17255_t.trc /p1/u
real 1:20.604
user 0.002
sys 14.613
root@u20#

Then, to the compressed filesystem:

root@u20# ptime cp /10046/tmp/test_ora_17255_t.trc /p1/c
real 1:20.228
user 0.002
sys 12.637
root@u20#

Before going further, please note the output of ptime for each of the times we copied the files. There's less than one second difference between the two out of an approximate duration of 80 seconds for each copy operation. Also, note that the copy operation to the compressed filesystem was actually faster. Hmm... interesting.

Doing an ls on the two files will show that they have exactly the same number of bytes.

root@u20# ls -l /p1/[uc]/*
-r--r--r-- 1 root root 2962342454 Jan 13 14:23 /p1/c/test_ora_17255_t.trc
-r--r--r-- 1 root root 2962342454 Jan 13 14:21 /p1/u/test_ora_17255_t.trc
root@u20#

And in fact, the Solaris digest command tells us that they have exactly the same contents.

root@u20# digest -a md5 -v /p1/[uc]/*
md5 (/p1/c/test_ora_17255_t.trc) = 97f86fcfdfc3f21a68ffc1892a945e77
md5 (/p1/u/test_ora_17255_t.trc) = 97f86fcfdfc3f21a68ffc1892a945e77
root@u20#

But the amount of space that they occupy on disk is not the same. The file residing in the compressed filesystem is around 1/3 the size of the file that resides in the uncompressed filesystem.

root@u20# du -sh /p1/[uc]/*
776M /p1/c/test_ora_17255_t.trc
2.8G /p1/u/test_ora_17255_t.trc
root@u20#

A better way to see the compression ratio is to use the zfs get command:

root@u20# zfs get compressratio p1/u p1/c
NAME PROPERTY VALUE SOURCE
p1/c compressratio 3.64x -
p1/u compressratio 1.00x -
root@u20#

Applications don't have to know that a file is compressed -- ZFS does the compression and decompression on the fly. Applications should be able to read the files normally like this:

root@u20# tail -5 /p1/u/test_ora_17255_t.trc
END OF STMT
PARSE #21:c=0,e=123,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=740525621053
BINDS #21:
EXEC #21:c=0,e=259,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=740525621393
EXEC #3:c=0,e=3416,p=0,cr=1,cu=3,mis=0,r=1,dep=0,og=4,tim=740525622047
root@u20#

root@u20# tail -5 /p1/c/test_ora_17255_t.trc
END OF STMT
PARSE #21:c=0,e=123,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=740525621053
BINDS #21:
EXEC #21:c=0,e=259,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=4,tim=740525621393
EXEC #3:c=0,e=3416,p=0,cr=1,cu=3,mis=0,r=1,dep=0,og=4,tim=740525622047
root@u20#

Now, how good is the default ZFS compression compared to something like gzip? Let's find out by compressing the file that is in the uncompressed filesystem.

root@u20# ptime gzip /p1/u/test_ora_17255_t.trc
real 1:27.206
user 1:19.614
sys 2.985
root@u20#

As expected, gzip is much better.

root@u20# ls -hl /p1/[uc]/*
-r--r--r-- 1 root root 2.8G Jan 13 14:23 /p1/c/test_ora_17255_t.trc
-r--r--r-- 1 root root 99M Jan 13 14:21 /p1/u/test_ora_17255_t.trc.gz
root@u20#

root@u20# ls -l /p1/[uc]/*
-r--r--r-- 1 root root 2962342454 Jan 13 14:23 /p1/c/test_ora_17255_t.trc
-r--r--r-- 1 root root 104023266 Jan 13 14:21 /p1/u/test_ora_17255_t.trc.gz
root@u20#

root@u20# du -sh /p1/[uc]/*
776M /p1/c/test_ora_17255_t.trc
99M /p1/u/test_ora_17255_t.trc.gz
root@u20#

The above shows the following:

Size of original file: 2.8GB
Size of file compressed by ZFS: 776MB
Size of file compressed by gzip: 99MB

Why is this so? Well, programs like gzip can take their time and optimize for compression ratio in favor of elapsed time. ZFS, on the other hand, is concerned about other things beside compression ratio so it has to strike the right balance between speed and compression ratio. That said, the ZFS man page says:

compression=on | off | lzjb


Controls the compression algorithm used for this dataset. There is currently only one algorithm, "lzjb", though this may change in future releases.

3 comments:

Anonymous said...

Interesting article.
You might want to look at https://www.opensolaris.org/jive/thread.jspa?threadID=26910&tstart=195 gzip compression is now supported in zfs

jforonda said...

jresoort,

Thanks for the comment and the link.

I subscribe to Adam Leventhal's blog so I'm aware that he has been playing with ZFS/gzip.

I also subscribe to the ZFS-discuss list but unfortunately, I am way behind in reading it so I was not aware not it has actually been integrated into opensolaris. That's very good news.

Again, thanks for dropping by.

James

Anonymous said...

Hi!

A really good, to-the point reading!

Helped me in my work!

Thank you!
Ajay