meta data for this page
  •  

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
linux:fs:zfs:dedup [2021/02/15 10:05] niziaklinux:fs:zfs:dedup [2021/03/14 22:44] niziak
Line 2: Line 2:
  
 For deduplication, it is recommended to have L2ARC cache size of 2-5GB per 1TB of disk. For deduplication, it is recommended to have L2ARC cache size of 2-5GB per 1TB of disk.
 +  * For every TB of pool data, you should expect 5 GB of dedup table data, assuming an average block size of 64K.
 +  * This means you should plan for at least 20GB of system RAM per TB of pool data, if you want to keep the dedup table in RAM, plus any extra memory for other metadata, plus an extra GB for the OS.
 +
 +[[https://superuser.com/questions/1169139/zfs-dedupe-again-is-memory-usage-dependent-on-physical-deduped-compressed|ZFS dedupe (again): Is memory usage dependent on physical (deduped, compressed) data stored or on logical used?]]
 +
 +
 +===== THINK TWICE ! =====
 +
 +Never ever turn on deduplication for whole pool. It is not possible to turn it off without sending whole pool to another zfs and receiving it back.
 +Also it is best to have plenty of RAM to fit all DDT into RAM, not SSD/NVMe.
 +
 +Huge CPU usage by over 96 ZFS kernel threads noticed with open-zfs v8.0.6 (ZFS On Linux), when some big parts of data deleted (auto snapshot rotation). It is connected with deduplication enabled and causes system to almost freeze because of high CPU usage!
 +
 +==== WARNING! ====
 +
 +Issue when deleting large portion of data and deduplication enabled.
 +ZFS driver creates 96 ''z_fr_iss'' threads. Load average of system goes immediatelly to 100 (soft watchdog can be set to reboot system when LA is too high).
 +These threads kills CPU and IO.
 +
 +  * [[https://github.com/openzfs/zfs/issues/6783|Large Deletes & Memory Consumption ]]
 +  * [[https://github.com/openzfs/zfs/issues/118|Reduce ZFS related processes/tasks]]
 +  * [[https://github.com/openzfs/zfs/issues/3976|High CPU usage by "z_fr_iss" after deleting large files]]
 +  * [[https://github.com/openzfs/zfs/issues/4227|rm / remove / delete a large file causes high load and irresponsiveness]]
 +
 +
 +
 +
 +
  
 ===== Turn on ===== ===== Turn on =====
Line 12: Line 40:
  
 Once all deduped datasets are destroyed the dedup table will be removed and the performance impact is cleared. Once all deduped datasets are destroyed the dedup table will be removed and the performance impact is cleared.
 +
  
 ===== status ===== ===== status =====
Line 19: Line 48:
 zfs get dedup | egrep '(on|off)' zfs get dedup | egrep '(on|off)'
 </code> </code>
 +
 +<code bash>
 +zpool list rpool
 +
 +NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
 +rpool   928G   888G  40,0G        -            25%    95%  1.19x    ONLINE  -
 +</code>
 +
  
 <code bash> <code bash>
 zpool status -D rpool zpool status -D rpool
 +zdb -DD rpool
 </code> </code>
 <code> <code>
-dedup: DDT entries 3755572, size 501B on disk, 161B in core+DDT-sha256-zap-duplicate550683 entries, size 474 on disk, 153 in core 
 +DDT-sha256-zap-unique: 3204889 entries, size 505 on disk, 163 in core 
 + 
 +DDT histogram (aggregated over all DDTs):
  
 bucket              allocated                       referenced           bucket              allocated                       referenced          
Line 30: Line 71:
 refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
 ------   ------   -----   -----   -----   ------   -----   -----   ----- ------   ------   -----   -----   -----   ------   -----   -----   -----
-        3,06M    341G    324G    325G    3,06M    341G    324G    325G +        3.06M    341G    324G    325G    3.06M    341G    324G    325G 
-         428K   44,2G   41,9G   42,1G     963K   98,7G   93,7G   94,0G +         428K   44.2G   41.9G   42.1G     963K   98.7G   93.7G   94.0G 
-        84,9K   6,34G   5,40G   5,46G     384K   27,0G   22,7G   22,9G +        84.9K   6.34G   5.40G   5.46G     384K   27.0G   22.7G   22.9G 
-        16,7K    245M    107M    137M     173K   2,52G   1,09G   1,40G +        16.7K    245M    107M    137M     173K   2.52G   1.09G   1.40G 
-    16    7,61K    144M   79,0M   88,9M     146K   2,79G   1,51G   1,69G +    16    7.61K    144M   79.0M   88.9M     146K   2.79G   1.51G   1.69G 
-    32      564   8,93M   3,35M   4,02M    22,8K    382M    146M    174M +    32      564   8.93M   3.35M   4.02M    22.8K    382M    146M    174M 
-    64      110   1,32M    588K    776K    9,46K    113M   51,6M   67,3M +    64      110   1.32M    588K    776K    9.46K    113M   51.6M   67.3M 
-   128       52   1,07M    670K    748K    8,78K    183M    114M    127M +   128       52   1.07M    670K    748K    8.78K    183M    114M    127M 
-   256       37    947K    576K    652K    13,7K    389M    239M    266M +   256       37    947K    576K    652K    13.7K    389M    239M    266M 
-   512        4   10,5K   10,5K     16K    3,17K   8,15M   8,15M   12,7M +   512        4   10.5K   10.5K     16K    3.17K   8.15M   8.15M   12.7M 
-    1K        6   43,5K     15K     28K    8,34K   60,6M   21,6M   38,8M +    1K        6   43.5K     15K     28K    8.34K   60.6M   21.6M   38.8M 
-    2K        1   36,5K      8K      8K    2,08K   75,9M   16,6M   16,6M +    2K        1   36.5K      8K      8K    2.08K   75.9M   16.6M   16.6M 
- Total    3,58M    392G    372G    373G    4,75M    473G    444G    446G+ Total    3.58M    392G    372G    373G    4.75M    473G    444G    446G 
 + 
 +dedup = 1.20, compress = 1.07, copies = 1.00, dedup * compress / copies = 1.27
 </code> </code>
  
 Where DDT table memory usage can be calculated: Where DDT table memory usage can be calculated:
-  * ''echo '3755572 161 /1024/1024' | bc'' is 576 MB used memory +  * ''echo '(550683 153 + 3204889 * 163) /1024/1024' | bc'' is 578 MB used memory 
-  * ''echo '3755572 501 /1024/1024' | bc'' is 1794 MB used on disk+  * ''echo '(550683 * 474 + 3204889 505) /1024/1024' | bc'' is 1792 MB used on disk
  
-<code bash> +SIZES: 
-zdb -DD rpool +  * DSIZE: (On Disk size) On pool there is 446GB of data stored on 373GB of disk (446 / 373 = 1,195 dedup ratio). 
-</code>+  * LSIZE: (logical in memory) 
 +  * PSIZE: (physical size) size required to store all data and DSIZE