====== ZFS ======
[[https://arstechnica.com/information-technology/2020/05/zfs-101-understanding-zfs-storage-and-performance/|ZFS 101—Understanding ZFS storage and performance]]
[[https://lists.debian.org/debian-user/2012/05/msg01026.html]]
Features:
* data pools (tanks) are abstraction aggregate block devices (simple, mirror, raidz, spares, etc)
* data set is created on data pool or another (parent) data set.
* whole data pool space is shared between dataset (no fixed partition size problem). Size of data set (and its descendants) can be limited using quota
* compression
* block level deduplication (not usable for emails with attachment, where attachment are shifted to different offset)
OpenZFS2.0.0 (Dec 20) [[https://github.com/openzfs/zfs/releases/tag/zfs-2.0.0]]:
* Sequential resilver (rebuild only used by data portions)
* Persistent L2ARC cache (survives between reboots)
* ZSTD
* Redacted replication (replicate with some data excluded)
* FreeBSD and Linux unification
Proposed use case:
POOL created on encrypted LUKS block device.
POOL
|-- /filer (quota)
| |- foto
| |- mp3 (dedup)
| |- movies
| +- backup (copies=2, compression)
|
|-- /home (compression, dedup, quota)
+-- /var (quota)
+- log (compression)
===== ZFS implementations =====
ZFS-Fuse 0.7 is using old pool version 23, where [[http://zfsonlinux.org|ZFSonLinux]] is using pool version 28.
[[http://exitcode.de/?p=106|zfs-fuse vs. zfsonlinux]]
===== Creating ZFS dataset =====
zpool create INBOX /dev/loop0 /dev/loop1 /dev/loop2 /dev/loop3
# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
INBOX 780M 448K 780M 0% 1.00x ONLINE -
# zpool status
pool: INBOX
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
INBOX ONLINE 0 0 0
loop0 ONLINE 0 0 0
loop1 ONLINE 0 0 0
loop2 ONLINE 0 0 0
loop3 ONLINE 0 0 0
errors: No known data errors
Dataset "INBOX" is also automatically created based on zpool name "INBOX". It is mounted as /INBOX
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
INBOX 400K 748M 112K /INBOX
===== Mount dataset =====
zfs mount INBOX
===== Create more datasets in pool =====
zfs create /
===== Add new block device (disc) to online pool =====
zpool add INBOX /dev/loop4
===== Deduplication =====
zfs set dedup=on INBOX
New attributed applies only to newly written data.
Tests
For test I was using 3 files 16MB each of random data (/dev/urandom): B1, B2 and B3
Above 3 files takes 38,6M on disc:
# zdb -S INBOX
Simulated DDT histogram:
bucket allocated referenced
______ ______________________________ ______________________________
refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE
------ ------ ----- ----- ----- ------ ----- ----- -----
1 309 38.6M 38.6M 38.6M 309 38.6M 38.6M 38.6M
Total 309 38.6M 38.6M 38.6M 309 38.6M 38.6M 38.6M
dedup = 1.00, compress = 1.00, copies = 1.00, dedup * compress / copies = 1.00
Additionaly one big file with content B1|B2|B3 was added to filesystem:
# zdb -S INBOX
Simulated DDT histogram:
bucket allocated referenced
______ ______________________________ ______________________________
refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE
------ ------ ----- ----- ----- ------ ----- ----- -----
2 384 48M 48M 48M 768 96M 96M 96M
Total 384 48M 48M 48M 768 96M 96M 96M
dedup = 2.00, compress = 1.00, copies = 1.00, dedup * compress / copies = 2.00
Additionaly one big file with content B1|B2|B3|B1|B2|B3 was added to filesystem:
# zdb -S INBOX
Simulated DDT histogram:
bucket allocated referenced
______ ______________________________ ______________________________
refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE
------ ------ ----- ----- ----- ------ ----- ----- -----
4 384 48M 48M 48M 1.50K 192M 192M 192M
Total 384 48M 48M 48M 1.50K 192M 192M 192M
dedup = 4.00, compress = 1.00, copies = 1.00, dedup * compress / copies = 4.00
Next, new file with content 0|B1|B2|B3 (one dummy byte plus B1|B2|B3) was added:
# zdb -S INBOX
Simulated DDT histogram:
bucket allocated referenced
______ ______________________________ ______________________________
refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE
------ ------ ----- ----- ----- ------ ----- ----- -----
1 385 48.1M 48.1M 48.1M 385 48.1M 48.1M 48.1M
4 384 48M 48M 48M 1.50K 192M 192M 192M
Total 769 96.1M 96.1M 96.1M 1.88K 240M 240M 240M
dedup = 2.50, compress = 1.00, copies = 1.00, dedup * compress / copies = 2.50
**So ZFS cannot match shifted data and make deduplication!**
Additional simple test. Two files:
|0|B1|0|B2|0|B3|0|
|0|B1|B2|B3|
Only beginning of both files |0|B1| was deduplicated (16MB saved)
ZFS provides block level deduplication based on block checksums which we got almost for free.
===== Compression =====
Enable compression and dedupliaction in parent dataset (will be inherited by childs)
zfs set compression=on INBOX
Possible parameters for compression=on | off | lzjb | gzip | gzip-[1-9] | zle
New attributed applies only to newly written data. For test data I was using Maildir with some huge e-mails.
^compression ^ logical size ^ physical size^ ratio |
|off | 702 MB | 703 MB | 1.0 |
|on = lzjb | 702 MB | 531 MB | 1.32 |
|gzip-1 | 702 MB | 374 MB | 1.87 |
|gzip=gzip-6 | 702 MB | 359 MB | 1.95 |
|gzip-9 | 702 MB | 353 MB | 1.96 |
|--
|squashfs | | 365 MB | |
zdb -S INBOX
zdb -b INBOX
zfs get compressratio
===== References: =====
[[http://docs.oracle.com/cd/E19253-01/819-5461/6n7ht6qu6/index.html]]
[[https://wiki.freebsd.org/ZFSQuickStartGuide]]
[[http://www.funtoo.org/ZFS_Fun|http://www.funtoo.org/ZFS_Fun]]
[[http://constantin.glez.de/blog/2011/07/zfs-dedupe-or-not-dedupe|http://constantin.glez.de/blog/2011/07/zfs-dedupe-or-not-dedupe]]
[[http://www.oracle.com/technetwork/articles/servers-storage-admin/o11-113-size-zfs-dedup-1354231.html|http://www.oracle.com/technetwork/articles/servers-storage-admin/o11-113-size-zfs-dedup-1354231.html]]