DB usage

Check DB size and usage

ceph daemon osd.$(ls -1 /var/lib/ceph/osd/ceph-?/../ | cut -d "-" -f 2) perf dump | jq ".bluefs | {db_total_bytes, db_used_bytes}"

1st machine:

{
  "db_total_bytes": 3221217280,
  "db_used_bytes": 1162870784
}

2nd machine:

{
  "db_total_bytes": 4294959104,
  "db_used_bytes": 3879731200
}

It looks like DB is full on 2nd machine. Which is a bit strange because it is configured with the same configuration.

More deep look into BlueFS:

ceph daemon osd.6 bluefs stats
ceph tell osd.\* bluefs stats

1st machine:

Usage matrix:
DEV/LEV     WAL         DB          SLOW        *           *           REAL        FILES       
LOG         0 B         195 GiB     0 B         0 B         0 B         5.4 MiB     1           
WAL         0 B         1013 MiB    0 B         0 B         0 B         1009 MiB    8           
DB          0 B         731 MiB     0 B         0 B         0 B         717 MiB     24          
SLOW        0 B         0 B         2.6 GiB     0 B         0 B         2.6 GiB     43          
TOTALS      0 B         197 GiB     2.6 GiB     0 B         0 B         0 B         76     

2nd machine:

Usage matrix:
DEV/LEV     WAL         DB          SLOW        *           *           REAL        FILES       
LOG         0 B         180 GiB     15 GiB      0 B         0 B         13 MiB      1           
WAL         0 B         733 MiB     0 B         0 B         0 B         731 MiB     5           
DB          0 B         2.9 GiB     502 MiB     0 B         0 B         3.3 GiB     72          
SLOW        0 B         0 B         0 B         0 B         0 B         0 B         0           
TOTALS      0 B         183 GiB     15 GiB      0 B         0 B         0 B         78          

Rows is type of data to be placed, and columns show real storage of this kind of data. On 2nd machine, there is 2.9 GB placed onto DB and 502MB of DB data is placed on SLOW device !

Compacting helps a bit:

ceph tell osd.<osdid> compact
# or for all osds
ceph tell osd.\* compact
{
  "db_total_bytes": 4294959104,
  "db_used_bytes": 2881486848
}

Usage matrix:
DEV/LEV     WAL         DB          SLOW        *           *           REAL        FILES       
LOG         0 B         180 GiB     15 GiB      0 B         0 B         3.7 MiB     1           
WAL         0 B         90 MiB      24 MiB      0 B         0 B         109 MiB     9           
DB          0 B         2.6 GiB     319 MiB     0 B         0 B         2.9 GiB     55          
SLOW        0 B         0 B         0 B         0 B         0 B         0 B         0           
TOTALS      0 B         183 GiB     15 GiB      0 B         0 B         0 B         65          
ceph osd set noout
systemctl stop ceph.osd.target
ceph-bluestore-tool fsck --path /var/lib/ceph/osd/ceph-6