meta data for this page
This is an old revision of the document!
ZFS
Creating ZFS dataset
zpool create INBOX /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3
# zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT INBOX 780M 448K 780M 0% 1.00x ONLINE -
# zpool status pool: INBOX state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM INBOX ONLINE 0 0 0 loop0 ONLINE 0 0 0 loop1 ONLINE 0 0 0 loop2 ONLINE 0 0 0 loop3 ONLINE 0 0 0 errors: No known data errors
Dataset “INBOX” is also automatically created based on zpool name “INBOX”. It is mounted as /INBOX
# zfs list NAME USED AVAIL REFER MOUNTPOINT INBOX 400K 748M 112K /INBOX
Mount dataset
zfs mount INBOX
Create more datasets in pool
zfs create <pool name>/<data set name>
Add new block device (disc) to online pool
zpool add INBOX /dev/loop4
Deduplication
zfs set dedup=on INBOX
Tests For test I was using 3 files 16MB each of random data (/dev/urandom): B1, B2 and B3 Above 3 files takes 38,6M on disc:
# zdb -S INBOX Simulated DDT histogram: bucket allocated referenced ______ ______________________________ ______________________________ refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE ------ ------ ----- ----- ----- ------ ----- ----- ----- 1 309 38.6M 38.6M 38.6M 309 38.6M 38.6M 38.6M Total 309 38.6M 38.6M 38.6M 309 38.6M 38.6M 38.6M dedup = 1.00, compress = 1.00, copies = 1.00, dedup * compress / copies = 1.00
Additionaly one big file with content B1|B2|B3 was added to filesystem:
# zdb -S INBOX Simulated DDT histogram: bucket allocated referenced ______ ______________________________ ______________________________ refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE ------ ------ ----- ----- ----- ------ ----- ----- ----- 2 384 48M 48M 48M 768 96M 96M 96M Total 384 48M 48M 48M 768 96M 96M 96M dedup = 2.00, compress = 1.00, copies = 1.00, dedup * compress / copies = 2.00
Additionaly one big file with content B1|B2|B3|B1|B2|B3 was added to filesystem:
# zdb -S INBOX Simulated DDT histogram: bucket allocated referenced ______ ______________________________ ______________________________ refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE ------ ------ ----- ----- ----- ------ ----- ----- ----- 4 384 48M 48M 48M 1.50K 192M 192M 192M Total 384 48M 48M 48M 1.50K 192M 192M 192M dedup = 4.00, compress = 1.00, copies = 1.00, dedup * compress / copies = 4.00
Next, new file with content 0|B1|B2|B3 (one dummy byte plus B1|B2|B3) was added:
# zdb -S INBOX Simulated DDT histogram: bucket allocated referenced ______ ______________________________ ______________________________ refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE ------ ------ ----- ----- ----- ------ ----- ----- ----- 1 385 48.1M 48.1M 48.1M 385 48.1M 48.1M 48.1M 4 384 48M 48M 48M 1.50K 192M 192M 192M Total 769 96.1M 96.1M 96.1M 1.88K 240M 240M 240M dedup = 2.50, compress = 1.00, copies = 1.00, dedup * compress / copies = 2.50
Compression
Enable compression and dedupliaction in parent dataset (will be inherited by childs)
zfs set compression=on INBOX
But new attributed applies only to newly written data.
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT INBOX 975M 724M 251M 74% 1.00x ONLINE -
The same data copied again to dataset after compression enabled
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT INBOX 975M 563M 412M 57% 1.00x ONLINE -
zdb -S INBOX
zdb -b INBOX
Tests:
Filesystem was tested with 648MiB of e-mail stored in Maildir format (lots of binary attachment encoded as BASE64).
SquashFS=365MB vs ZFS=563MB
Deduplication:
Deduplication on file level works on ZFS and SquashFS (the same folder copied again).
Deduplication of 2 different 32MB blobs, with file concatenated from | 0 | blob1 | 0 | blob2 | 0 | blob1 | 0 | blob2 |0|
Deduplication on the same attachment inside different email doesn't work in ZFS.