meta data for this page
  •  

Disable fencing

Sometimes, when Proxmox is used in not optimal (cheap) configutation, cluster can fence good node and cause it to reboot. Possible causes:

  1. use only one network - corosync requires low latencies
  2. overloaded CPU
pvecm status
 
Cluster information
-------------------
Name:             example
Config Version:   11
Transport:        knet
Secure auth:      on
 
Quorum information
------------------
Date:             Mon Jan 30 11:42:01 2023
Quorum provider:  corosync_votequorum
Nodes:            7
Node ID:          0x00000001
Ring ID:          1.1d66
Quorate:          Yes
 
Votequorum information
----------------------
Expected votes:   7
Highest expected: 7
Total votes:      7
Quorum:           4  
Flags:            Quorate 
 
Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.28.230 (local)
0x00000002          1 192.168.28.235
0x00000003          1 192.168.28.233
0x00000004          1 192.168.28.234
0x00000005          1 192.168.28.232
0x00000006          1 192.168.28.236
0x00000007          1 192.168.28.237

On Proxmox forum there was advice to remove HA from any node, or set it to “ignore”. Then check which nodes has active lrms. Restart pve-ha-lrm service on nodes where it is active (go to Node→System and restart service). After restarts all lrms should be in idle state. There will be always one active master is normal.