meta data for this page
  •  

This is an old revision of the document!


Disaster recovery

Migrate VM from dead node

Simply move config file:

mv ./nodes/pve3/qemu-server/366.conf ./nodes/pve5/qemu-server/

If move is not possible (no quorum in cluster), simply reduce expected votes to 1:

pvecm e 1

Transfer needed storage. From source storage node pve3 send volume to pve5:

zfs send rpool2/data/vm-366-disk-0 | ssh pve5 zfs recv -d rpool
zfs send rpool2/data/vm-366-disk-2 | ssh pve5 zfs recv -d rpool

reinstall node

Install fresh Proxmox. Create common cluster-wide mountpoints to local storage. Copy all zfs datasets from backup ZFS pool:

zfs send rpool2/data/vm-708-disk-0 | zfs recv -d rpool
...

For CT volumes it getting more complicated:

root@pve3:~# zfs send rpool2/data/subvol-806-disk-0 | zfs recv -d rpool
warning: cannot send 'rpool2/data/subvol-806-disk-0': target is busy; if a filesystem, it must not be mounted
cannot receive: failed to read from stream

Reason of problem is that SOURCE is mounted. Solution:

zfs set canmount=off rpool2/data/subvol-806-disk-0

Try to join to cluster. From new (reinstalled) node pve3 join to IP of any existing node. Needs to use –force switch, because pve3 node was previously in cluster.

root@pve3:~# pvecm add 192.168.28.235 --force
 
Please enter superuser (root) password for '192.168.28.235': ************
Establishing API connection with host '192.168.28.235'
The authenticity of host '192.168.28.235' can't be established.
X509 SHA256 key fingerprint is D2:68:21:D7:43:6D:BA:4D:EB:C6:32:DD:2C:72:6E:5B:6D:1A:2D:DB:82:EC:E6:41:72:46:6B:E6:B1:BF:94:84.
Are you sure you want to continue connecting (yes/no)? yes
Login succeeded.
check cluster join API version
No cluster network links passed explicitly, fallback to local node IP '192.168.28.233'
Request addition of this node
Join request OK, finishing setup locally
stopping pve-cluster service
backup old database to '/var/lib/pve-cluster/backup/config-1621353318.sql.gz'
waiting for quorum...OK
(re)generate node files
generate new node certificate
merge authorized SSH keys and known hosts
generated new node certificate, restart pveproxy and pvedaemon services
successfully added node 'pve3' to cluster.