Symptoms
NG Load Balancer experiences a complete outage, all routing rules stop serving requests.
The following sequence of messages appears in /var/log/messages
:
Jan 19 22:54:58 loadbalancer2 nanny[8788]: [inactive] shutting down 176.28.103.210:80 due to connection failure
Jan 19 22:54:58 loadbalancer2 nanny[8788]: /sbin/ipvsadm command failed!
Jan 19 22:54:58 loadbalancer2 lvs[8779]: nanny died! shutting down lvs
Jan 19 22:54:58 loadbalancer2 lvs[8779]: shutting down virtual service 32
Jan 19 22:54:58 loadbalancer2 nanny[8786]: Terminating due to signal 15
This may happen after removing an NG node from the load balancer using the following command:
~# ipvsadm -d -f 100 -r xxx.xxx.xxx.xxx
Cause
In scope of Odin Service Automation platform, the issue is going to be addressed in scope of POA-100010, LVS configuration will allow removing a single routing entry without affecting all others.
Originally the issue derives from the following Red Hat behavior:
739223
Nanny crashes and shuts LVS down if a service is deleted using ipvsadm and then the corresponding real server goes down.
Resolution
In case an outage occurs, LVS services should be restarted:
# service pulse start
In order to avoid the outage during maintenance for any NG node, follow these instructions:
Stop the Apache service on the node. Load Balancer will not redirect any request to that node if the Apache service is not running.
# service httpd stop
Once the maintenance is done, start the Apache service.
# service httpd start