meta data for this page
  •  

This is an old revision of the document!


ZFS

Creating ZFS dataset

zpool create INBOX /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3
# zpool list
NAME    SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
INBOX   780M   448K   780M     0%  1.00x  ONLINE  -
# zpool status
  pool: INBOX
 state: ONLINE
 scrub: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        INBOX       ONLINE       0     0     0
          loop0     ONLINE       0     0     0
          loop1     ONLINE       0     0     0
          loop2     ONLINE       0     0     0
          loop3     ONLINE       0     0     0

errors: No known data errors

Dataset “INBOX” is also automatically created based on zpool name “INBOX”. It is mounted as /INBOX

# zfs list
NAME    USED  AVAIL  REFER  MOUNTPOINT
INBOX   400K   748M   112K  /INBOX

Mount dataset

zfs mount INBOX

Create more datasets in pool

zfs create <pool name>/<data set name>

Add new block device (disc) to online pool

zpool add INBOX /dev/loop4

Deduplication

zfs set dedup=on INBOX

Tests For test I was using 3 files 16MB each of random data (/dev/urandom): B1, B2 and B3 Above 3 files takes 38,6M on disc:

# zdb -S INBOX
Simulated DDT histogram:

bucket              allocated                       referenced
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     1      309   38.6M   38.6M   38.6M      309   38.6M   38.6M   38.6M
 Total      309   38.6M   38.6M   38.6M      309   38.6M   38.6M   38.6M

dedup = 1.00, compress = 1.00, copies = 1.00, dedup * compress / copies = 1.00

Additionaly one big file with content B1|B2|B3 was added to filesystem:

# zdb -S INBOX
Simulated DDT histogram:

bucket              allocated                       referenced
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     2      384     48M     48M     48M      768     96M     96M     96M
 Total      384     48M     48M     48M      768     96M     96M     96M

dedup = 2.00, compress = 1.00, copies = 1.00, dedup * compress / copies = 2.00

Additionaly one big file with content B1|B2|B3|B1|B2|B3 was added to filesystem:

# zdb -S INBOX
Simulated DDT histogram:

bucket              allocated                       referenced
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     4      384     48M     48M     48M    1.50K    192M    192M    192M
 Total      384     48M     48M     48M    1.50K    192M    192M    192M

dedup = 4.00, compress = 1.00, copies = 1.00, dedup * compress / copies = 4.00

Next, new file with content 0|B1|B2|B3 (one dummy byte plus B1|B2|B3) was added:

# zdb -S INBOX
Simulated DDT histogram:

bucket              allocated                       referenced
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     1      385   48.1M   48.1M   48.1M      385   48.1M   48.1M   48.1M
     4      384     48M     48M     48M    1.50K    192M    192M    192M
 Total      769   96.1M   96.1M   96.1M    1.88K    240M    240M    240M

dedup = 2.50, compress = 1.00, copies = 1.00, dedup * compress / copies = 2.50

So ZFS cannot match shifted data and make deduplication!

Additional simple test. Two files:

0B10B20B30
0B1B2B3

Only beginning of both files |0|B1| was deduplicated (16MB saved)

By default ZFS provider block level deduplication based on checksums wchich we got almost for free. To perform full byte comparison, to catch shifted data like e-mail attachment:

zfs dedup=verify INBOX

Compression

Enable compression and dedupliaction in parent dataset (will be inherited by childs)

zfs set compression=on INBOX

But new attributed applies only to newly written data.

NAME    SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
INBOX   975M   724M   251M    74%  1.00x  ONLINE  -

The same data copied again to dataset after compression enabled

NAME    SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
INBOX   975M   563M   412M    57%  1.00x  ONLINE  -

zdb -S INBOX

zdb -b INBOX

Tests:

Filesystem was tested with 648MiB of e-mail stored in Maildir format (lots of binary attachment encoded as BASE64).

SquashFS=365MB vs ZFS=563MB

Deduplication:

Deduplication on file level works on ZFS and SquashFS (the same folder copied again).

Deduplication of 2 different 32MB blobs, with file concatenated from | 0 | blob1 | 0 | blob2 | 0 | blob1 | 0 | blob2 |0|

Deduplication on the same attachment inside different email doesn't work in ZFS.

References: