====== DB usage ======
===== Check DB size and usage =====
ceph daemon osd.$(ls -1 /var/lib/ceph/osd/ceph-?/../ | cut -d "-" -f 2) perf dump | jq ".bluefs | {db_total_bytes, db_used_bytes}"
1st machine:
{
"db_total_bytes": 3221217280,
"db_used_bytes": 1162870784
}
2nd machine:
{
"db_total_bytes": 4294959104,
"db_used_bytes": 3879731200
}
It looks like DB is full on 2nd machine. Which is a bit strange because it is configured with the same configuration.
More deep look into BlueFS:
ceph daemon osd.6 bluefs stats
ceph tell osd.\* bluefs stats
1st machine:
Usage matrix:
DEV/LEV WAL DB SLOW * * REAL FILES
LOG 0 B 195 GiB 0 B 0 B 0 B 5.4 MiB 1
WAL 0 B 1013 MiB 0 B 0 B 0 B 1009 MiB 8
DB 0 B 731 MiB 0 B 0 B 0 B 717 MiB 24
SLOW 0 B 0 B 2.6 GiB 0 B 0 B 2.6 GiB 43
TOTALS 0 B 197 GiB 2.6 GiB 0 B 0 B 0 B 76
2nd machine:
Usage matrix:
DEV/LEV WAL DB SLOW * * REAL FILES
LOG 0 B 180 GiB 15 GiB 0 B 0 B 13 MiB 1
WAL 0 B 733 MiB 0 B 0 B 0 B 731 MiB 5
DB 0 B 2.9 GiB 502 MiB 0 B 0 B 3.3 GiB 72
SLOW 0 B 0 B 0 B 0 B 0 B 0 B 0
TOTALS 0 B 183 GiB 15 GiB 0 B 0 B 0 B 78
Rows is type of data to be placed, and columns show real storage of this kind of data.
On 2nd machine, there is 2.9 GB placed onto DB and 502MB of DB data is placed on SLOW device !
Compacting helps a bit:
ceph tell osd. compact
# or for all osds
ceph tell osd.\* compact
{
"db_total_bytes": 4294959104,
"db_used_bytes": 2881486848
}
Usage matrix:
DEV/LEV WAL DB SLOW * * REAL FILES
LOG 0 B 180 GiB 15 GiB 0 B 0 B 3.7 MiB 1
WAL 0 B 90 MiB 24 MiB 0 B 0 B 109 MiB 9
DB 0 B 2.6 GiB 319 MiB 0 B 0 B 2.9 GiB 55
SLOW 0 B 0 B 0 B 0 B 0 B 0 B 0
TOTALS 0 B 183 GiB 15 GiB 0 B 0 B 0 B 65
ceph osd set noout
systemctl stop ceph.osd.target
ceph-bluestore-tool fsck --path /var/lib/ceph/osd/ceph-6