Book Creator
Add this page to your book

Book Creator
Remove this page from your book

This is an old revision of the document!

ZFS

Creating ZFS dataset

zpool create INBOX /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3

# zpool list
NAME    SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
INBOX   780M   448K   780M     0%  1.00x  ONLINE  -

# zpool status
  pool: INBOX
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        INBOX       ONLINE       0     0     0
          loop0     ONLINE       0     0     0
          loop1     ONLINE       0     0     0
          loop2     ONLINE       0     0     0
          loop3     ONLINE       0     0     0

errors: No known data errors

Dataset “INBOX” is also automatically created based on zpool name “INBOX”. It is mounted as /INBOX

# zfs list
NAME    USED  AVAIL  REFER  MOUNTPOINT
INBOX   400K   748M   112K  /INBOX

Mount dataset

zfs mount INBOX

Create more datasets in pool

zfs create <pool name>/<data set name>

Add new block device (disc) to online pool

zpool add INBOX /dev/loop4

Deduplication

zfs set dedup=on INBOX

Tests For test I was using 3 files 16MB each of random data (/dev/urandom): B1, B2 and B3 Above 3 files takes 38,6M on disc:

# zdb -S INBOX
Simulated DDT histogram:

bucket              allocated                       referenced
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     1      309   38.6M   38.6M   38.6M      309   38.6M   38.6M   38.6M
 Total      309   38.6M   38.6M   38.6M      309   38.6M   38.6M   38.6M

dedup = 1.00, compress = 1.00, copies = 1.00, dedup * compress / copies = 1.00

Additionaly one big file with content B1|B2|B3 was added to filesystem:

# zdb -S INBOX
Simulated DDT histogram:

bucket              allocated                       referenced
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     2      384     48M     48M     48M      768     96M     96M     96M
 Total      384     48M     48M     48M      768     96M     96M     96M

dedup = 2.00, compress = 1.00, copies = 1.00, dedup * compress / copies = 2.00

Additionaly one big file with content B1|B2|B3|B1|B2|B3 was added to filesystem:

# zdb -S INBOX
Simulated DDT histogram:

bucket              allocated                       referenced
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     4      384     48M     48M     48M    1.50K    192M    192M    192M
 Total      384     48M     48M     48M    1.50K    192M    192M    192M

dedup = 4.00, compress = 1.00, copies = 1.00, dedup * compress / copies = 4.00

Next, new file with content 0|B1|B2|B3 (one dummy byte plus B1|B2|B3) was added:

# zdb -S INBOX
Simulated DDT histogram:

bucket              allocated                       referenced
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     1      385   48.1M   48.1M   48.1M      385   48.1M   48.1M   48.1M
     4      384     48M     48M     48M    1.50K    192M    192M    192M
 Total      769   96.1M   96.1M   96.1M    1.88K    240M    240M    240M

dedup = 2.50, compress = 1.00, copies = 1.00, dedup * compress / copies = 2.50

So ZFS cannot match shifted data and make deduplication!

Additional simple test. Two files:

0	B1	0	B2	0	B3	0
0	B1	B2	B3

Only beginning of both files |0|B1| was deduplicated (16MB saved)

By default ZFS provider block level deduplication based on checksums wchich we got almost for free. To perform full byte comparison, to catch shifted data like e-mail attachment:

zfs dedup=verify INBOX

Compression

Enable compression and dedupliaction in parent dataset (will be inherited by childs)

zfs set compression=on INBOX

But new attributed applies only to newly written data.

NAME    SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
INBOX   975M   724M   251M    74%  1.00x  ONLINE  -

The same data copied again to dataset after compression enabled

NAME    SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
INBOX   975M   563M   412M    57%  1.00x  ONLINE  -

zdb -S INBOX

zdb -b INBOX

Tests:

Filesystem was tested with 648MiB of e-mail stored in Maildir format (lots of binary attachment encoded as BASE64).

SquashFS=365MB vs ZFS=563MB

Deduplication:

Deduplication on file level works on ZFS and SquashFS (the same folder copied again).

Deduplication of 2 different 32MB blobs, with file concatenated from | 0 | blob1 | 0 | blob2 | 0 | blob1 | 0 | blob2 |0|

Deduplication on the same attachment inside different email doesn't work in ZFS.

References:

http://www.funtoo.org/ZFS_Fun

http://constantin.glez.de/blog/2011/07/zfs-dedupe-or-not-dedupe

http://www.oracle.com/technetwork/articles/servers-storage-admin/o11-113-size-zfs-dedup-1354231.html

Tools

menus and quick search

quick search

site status

Page Tools

meta data for this page

ZFS