Backup, Restore and Update
In the domain of High Availability/Clustering, the topics of backup, restore and update have to be handled differently as compared to stand alone instances of the Hardware Appliance to not disrupt operation.
Backing up a cluster
Although that you have set up a High Availability Setup to prevent any outages, you should always take full-out scenario into consideration. In this case, and only in this case, you will have to recover your cluster from a backup. From operational perspective, it might make sense to decide to take backups only from node 3 (which is designed to be at a disaster recovery site off-location) to reduce load and network traffic on the nodes at the main site.
We recommend setting up an automated backup schedule on all of your nodes to make sure to be able to recover everything, out of every situation, even if perhaps a failure takes a long time to be discovered.
Generally, a backup always contains all information of a cluster node (configuration and database), including its node identity. For example, a backup file taken from node 3 will not just create any node of a cluster, but exactly node 3 when restored.
Restoring a cluster from backup
A backup file of a cluster node should only be used in the highest emergency of a full-out scenario. If at least one node remains operational, the cluster should always be reestablished from the last good node.
To recover as much of your data as possible, start by identifying the last good backup you have available from an Active node by analyzing the outage. For example, if the connection to a disaster recovery site went down long before a backup was made there, you might be better off with an older backup from the primary site after such outage.
Once you have identified the best possible backup from a previously Active node N, restore the backup to the Hardware Appliance designated to be node N and then reconnect the other nodes to this node.
For information on how to restore a backup to a Hardware Appliance, see Restore from Backup.
After reboot, the WebConf will be reachable and operational, but the database will refuse to start up in this situation, hence the applications will not yet be operational. Use the WebConf button Force into Active to force the cluster to continue operations from the restored data set.
Updating the software (firmware/applications) on a cluster
Updating the software of the Hardware Appliance will always require a reboot. A reboot of an Hardware Appliance in a cluster should always be scheduled with care as to not accidentally degrade cluster performance. It is a common mistake to ease up on the operational caution when it is known that some technical measures are in place to take care of outages and thus give away any safety margins. In a cluster, software update should be applied on a single node at a time. Only if the node you are currently working on is completely done with the update and confirmed to be back up and running should you proceed to updating the next node.
As of version 2.2.0, the Hardware Appliance firmware should be updated separately from the applications installed on the platform of the Hardware Appliance. Upgrade both the firmware and the application, starting with the firmware. A Hardware Appliance on a version older than 2.2.0 cannot simply be customer-upgraded due to major architectural changes. Please contact PrimeKey Support or your local PrimeKey partner for support.
For instructions on how to update a cluster on Hardware Appliance version 2.3.0 to an even newer version, refer to the later documentation delivered with the new software version.
Use-Case: Software update on a three-node cluster from 3.3 to 3.4
To update a three-node cluster from Hardware Appliance version 3.3 to 3.4, do the following:
- Before starting any configuration changes on a cluster node, confirm that the node has been running fine up to now. This is the only way to know for sure whether you actually broke anything if the procedure does not succeed as expected.
- You might also want to make a last manual backup of the Hardware Appliance
- Make sure this cluster node is declared as not operational, (e.g. disabling in load balancing frontend), so that:
- No other operator does any maintenance on any other node while we deliberately reduce redundancy on the cluster.
- Nobody relies on the availability of this node during maintenance downtime.
- No alarm is raised if this node gets unavailable.
- Start the software update procedure on this node by updating the Hardware Appliance firmware first, then updating the COS applications. This should generally be the same procedure as described in the Platform section: Install firmware, reboot, install application.
- After the cluster node has been rebooted, check that the node is operating correctly.
- After you asserted that this node is up and running, verify that the entire cluster is in good shape, i.e. that all of the cluster nodes of your cluster confirm that your cluster is back up and running with redundancy.
- Announce this cluster node to be operational back again or whatever you need to undo from step 3.
- Continue with updating your cluster by applying the same steps on the next cluster node, restarting at step 1.