meta data for this page
  •  

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
vm:proxmox:disaster_recovery [2021/05/18 16:31] – created niziakvm:proxmox:disaster_recovery [2024/02/12 08:26] (current) niziak
Line 1: Line 1:
 ====== Disaster recovery ====== ====== Disaster recovery ======
 +
 +===== replace NVM device =====
 +
 +Only 1 NVM slot available, so idea is to copy nvm to hdd and then restore it on new nvm device.
 +
 +Stop CEPH:
 +<code bash>
 +systemctl stop ceph.target
 +systemctl stop ceph-osd.target
 +systemctl stop ceph-mgr.target
 +systemctl stop ceph-mon.target
 +systemctl stop ceph-mds.target
 +systemctl stop ceph-crash.service
 +</code>
 +
 +Backup partition layout
 +<code bash>
 +sgdisk -b nvm.sgdisk /dev/nvme0n1
 +sgdisk -p /dev/nvme0n1
 +</code>
 +
 +Move ZFS nvmpool to hdds:
 +<code bash>
 +zfs destroy hddpool/nvmtemp
 +zfs create -s -b 8192 -V 387.8G hddpool/nvmtemp  # not block size was forced to match existing device
 +
 +ls -l /dev/zvol/hddpool/nvmtemp
 +lrwxrwxrwx 1 root root 11 01-15 11:00 /dev/zvol/hddpool/nvmtemp -> ../../zd192
 +
 +zpool attach nvmpool 7b375b69-3ef9-c94b-bab5-ef68f13df47c /dev/zd192
 +</code>
 +And ''nvmpool'' resilvering will begin. Observe it with ''zpool status nvmpool 1''
 +
 +Remove NVM from ''nvmpool'':
 +<code bash>zpool detach nvmpool 7b375b69-3ef9-c94b-bab5-ef68f13df47c</code>
 +
 +Remove all ZILS, L2ARCs and swap:
 +<code bash>
 +swapoff -a
 +vi /etc/fstab
 +
 +zpool remove hddpool <ZIL DEVICE>
 +zpool remove hddpool <L2ARC DEVICE>
 +zpool remove rpool <L2ARC DEVICE>
 +</code>
 +
 +CEPH OSD will be created from scratch to force to rebuild OSD DB (which can be too big due to metadata bug from previous version of CEPH)
 +
 +Replace NVM.
 +
 +Recreate partitions or restore from backup <code bash>sgdisk -l nvm.sgdisk /dev/nvme0n1</code>
 +  * swap
 +  * rpool_zil
 +  * hddpool_zil
 +  * hddpool_l2arc
 +  * ceph_db (for 4GB ceph OSD create 4096MB+4MB)
 +
 +Add ZILs and L2ARCs.
 +
 +Start ''nvmpool'': <code bash>zpool import nvmpool</code>
 +
 +Move ''nvmpool'' to new NVM partition:
 +<code bash>
 +zpool attach nvmpool zd16 426718f1-1b1e-40c0-a6e2-1332fe5c3f2c
 +zpool detach nvmpool zd16
 +</code>
 +
 +===== Replace rpool device =====
 +
 +Proxmox rpool ZFS is located on 3rd partition (1st is Grub BOOT, 2nd is EFI, 3rd is ZFS).
 +To replace failed device it is needed to replicate partition layout:
 +
 +With new device of greater or equal size, simple replicate partitions:
 +<code bash>
 +# replicate layout from SDA to SDB
 +sgdisk /dev/sda -R /dev/sdb
 +# generate new UUIDs:
 +sgdisk -G /dev/sdb
 +</code>
 +
 +To replicate layout on smaller device, need manually create partitions:
 +<code bash>
 +sgdisk -p /dev/sda
 +
 +Number  Start (sector)    End (sector)  Size       Code  Name
 +                34            2047   1007.0 KiB  EF02  
 +              2048         1050623   512.0 MiB   EF00  
 +           1050624       976773134   465.3 GiB   BF01  
 +
 +sgdisk --clear /dev/sdb
 +sgdisk /dev/sdb -a1 --new 1:34:2047      -t0:EF02
 +sgdisk /dev/sdb     --new 2:2048:1050623 -t0:EF00
 +sgdisk /dev/sdb     --new 3:1050624      -t0:BF01
 +</code>
 +
 +Restore bootloader:
 +<code bash>
 +proxmox-boot-tool format /dev/sdb2
 +proxmox-boot-tool init /dev/sdb2
 +proxmox-boot-tool clean
 +</code>
 +
 +<code bash>
 +zpool attach rpool ata-SPCC_Solid_State_Disk_XXXXXXXXXXXX-part3 /dev/disk/by-id/ata-SSDPR-CL100-120-G3_XXXXXXXX-part3
 +zpool offline rpool ata-SSDPR-CX400-128-G2_XXXXXXXXX-part3
 +zpool detach rpool ata-SSDPR-CX400-128-G2_XXXXXXXXX-part3
 +</code>
  
 ===== Migrate VM from dead node ===== ===== Migrate VM from dead node =====
Line 17: Line 124:
 <code bash> <code bash>
 zfs send rpool2/data/vm-366-disk-0 | ssh pve5 zfs recv -d rpool zfs send rpool2/data/vm-366-disk-0 | ssh pve5 zfs recv -d rpool
 +zfs send rpool2/data/vm-366-disk-2 | ssh pve5 zfs recv -d rpool
 +</code>
 +
 +===== reinstall node =====
 +
 +Remember to clean any additional device partition belonging to ''rpool'' (i.e. ZIL). During Proxmox first startup ZFS detects that there are two ''rpool'' in system and stops requiring importing by its numerical id.
 +
 +Install fresh Proxmox. 
 +Create common cluster-wide mountpoints to local storage.
 +Copy all zfs datasets from backup ZFS pool:
 +<code bash>
 +zfs send rpool2/data/vm-708-disk-0 | zfs recv -d rpool
 +...
 +</code>
 +For CT volumes it getting more complicated:
 +<code>
 +root@pve3:~# zfs send rpool2/data/subvol-806-disk-0 | zfs recv -d rpool
 +warning: cannot send 'rpool2/data/subvol-806-disk-0': target is busy; if a filesystem, it must not be mounted
 +cannot receive: failed to read from stream
 +</code>
 +Reason of problem is that SOURCE is mounted. Solution:
 +<code bash>
 +zfs set canmount=off rpool2/data/subvol-806-disk-0
 +</code>
 +
 +
 +Try to join to cluster. From new (reinstalled) node ''pve3'' join to IP of any existing node.
 +Needs to use ''--force'' switch, because ''pve3'' node was previously in cluster.
 +
 +<code bash>
 +root@pve3:~# pvecm add 192.168.28.235 --force
 +
 +Please enter superuser (root) password for '192.168.28.235': ************
 +Establishing API connection with host '192.168.28.235'
 +The authenticity of host '192.168.28.235' can't be established.
 +X509 SHA256 key fingerprint is D2:68:21:D7:43:6D:BA:4D:EB:C6:32:DD:2C:72:6E:5B:6D:1A:2D:DB:82:EC:E6:41:72:46:6B:E6:B1:BF:94:84.
 +Are you sure you want to continue connecting (yes/no)? yes
 +Login succeeded.
 +check cluster join API version
 +No cluster network links passed explicitly, fallback to local node IP '192.168.28.233'
 +Request addition of this node
 +Join request OK, finishing setup locally
 +stopping pve-cluster service
 +backup old database to '/var/lib/pve-cluster/backup/config-1621353318.sql.gz'
 +waiting for quorum...OK
 +(re)generate node files
 +generate new node certificate
 +merge authorized SSH keys and known hosts
 +generated new node certificate, restart pveproxy and pvedaemon services
 +successfully added node 'pve3' to cluster.
 </code> </code>