Question

How does High Availability work in Operations Automation Cloud Infrastructure environment?

Answer

  1. When a Virtuozzo node is registered in OACI with Cloud Storage support enabled, OACI_hn_fail RPM is installed on the server. It stores the OACI IM address and credentials in /usr/local/etc/PACI_hn_fail.conf file and brings two additional shaman scripts /usr/share/shaman/50_node_crash and /usr/share/shaman/node_start.

  2. When shamand service on a node detects that the node is crashed (which means that it has become unavailable for the shaman master server), PACI_hn_fail script is called on the shaman master node and passes the IP address of the crashed node to OACI IM and triggers the failover procedure.

  3. The failover procedure:

    3.1. If the crashed node is in "INACTIVE" state in OACI or if other nodes in cluster have "INACTIVE" status in OACI and there is no candidate node to put resources on, then no virtual environments are relocated.

    3.2. If the crashed node is in "ACTIVE" or "LOCKED" state, then OACI IM chooses appropriate nodes in the same Cloud Storage cluster which currently host the least amount of containers and relocates the resources from the failed node to the healthy ones. During the relocation process, the VE gets a transient "FAILOVER_IN_PROGRESS" state in its history. After a successful failover, it gets a transient "FAILOVER_SUCCESS" state. All Load Balancers that were present on the failed node are recreated on healthy nodes.

    Note!: All VEs present in OACI would be relocated, it is impossible to disable failover for VEs created from OACI.

  4. A notification is sent to the affected customers that their servers have been relocated to other nodes.

Internal content