Ensuring Cluster Health
An operational cluster will continuously answer requests and synchronize data. It will also evaluate its health in order to ensure availability as well as data integrity. As described earlier, a node will rather stop working than risk a split-brain situation. A split-brain situation develops when two nodes believe they are lone survivors and continue to serve requests, resulting in two different data sets.
To prevent accidental degradation of the cluster health, some precautions need to be taken. A planned network reconfiguration could be mistaken to be an emergency by the cluster, for example.
Maintenance operations on the cluster such as rebooting, updating, network reconfiguration, should be restricted to only one node at a time, with ample time for the node to reconnect and synchronize after the task is completed. Before you proceed to the next node, make sure that your cluster is back to full health.
Changing the IP Address of the Application Interface of a node in a three-node cluster
In a Hardware Appliance cluster, the internal communication is being transferred over the Application Interface. Hence, if you need to change the IP address of the Application Interface, cluster communication will fail at first and you will have to take some manual configuration steps to bring back the node into play:
- Before starting any configuration changes on a cluster node, it is good practice to assert that the node has been running fine up to now. This is the only way to know for sure whether you actually broke anything if the procedure does not succeed as expected.
- You might also want to make a last manual backup of the Hardware Appliance.
- We’ll assume here that you have announced this cluster node as being not operational (e.g. disabled in a front-end load balancer) for the time of the change.
- Now start the actual change by changing the Application Interface IP address on the cluster node in WebConf, see Network.
- Navigate your browser to the Cluster Configuration subtab of the WebConf on all of the other cluster nodes.
- Wait for the cluster node to appear offline/not connected in the cluster connections table, the IP address should now be in an editable input field.
- On every of the other cluster nodes, correct the application IP address of the cluster node in the cluster table.
- Confirm the operation by clicking Apply.
- After the cluster reconfiguration has finished, all cluster nodes should be connected to all of the other cluster nodes.
- When everything works as expected, you should not forget to bring back the node into the load balancer.
Replacing a failed cluster node
To replace a failed cluster node, proceed as follows:
- Go to the Cluster Status page and make sure the other two nodes are Active.
- Shut down the node that you want to replace. To avoid later accidental reconnection with the cluster, you can reset it to factory defaults.
- After a few moments, you can download a Cluster Setup Bundle from one of the other nodes. The cluster configuration doesn't need any changes.
If the Application IP of the replacement node is different, change the cluster configuration on both nodes:
Wait 1 minute between nodes and download a new Cluster Setup Bundle. - Connect the replacement node to the cluster with the Cluster Setup Bundle.
- Check the Cluster Status page on the other nodes if the replacement node has synced up and is Active.
Restoring the node from a backup will not work because the database content in the backup file will be outdated.