Sometimes, when Proxmox is used in not optimal (cheap) configutation, cluster can fence good node and cause it to reboot. Possible causes:
pvecm status Cluster information ------------------- Name: example Config Version: 11 Transport: knet Secure auth: on Quorum information ------------------ Date: Mon Jan 30 11:42:01 2023 Quorum provider: corosync_votequorum Nodes: 7 Node ID: 0x00000001 Ring ID: 1.1d66 Quorate: Yes Votequorum information ---------------------- Expected votes: 7 Highest expected: 7 Total votes: 7 Quorum: 4 Flags: Quorate Membership information ---------------------- Nodeid Votes Name 0x00000001 1 192.168.28.230 (local) 0x00000002 1 192.168.28.235 0x00000003 1 192.168.28.233 0x00000004 1 192.168.28.234 0x00000005 1 192.168.28.232 0x00000006 1 192.168.28.236 0x00000007 1 192.168.28.237
On Proxmox forum there was advice to remove HA from any node, or set it to “ignore”. Then check which nodes has active lrm
s.
Restart pve-ha-lrm
service on nodes where it is active (go to Node→System and restart service). After restarts all lrm
s should be in idle
state.
There will be always one active
master is normal.