Symptoms

NG Load Balancer experiences a complete outage, all routing rules stop serving requests.

The following sequence of messages appears in /var/log/messages:

Jan 19 22:54:58 loadbalancer2 nanny[8788]: [inactive] shutting down 176.28.103.210:80 due to connection failure
Jan 19 22:54:58 loadbalancer2 nanny[8788]: /sbin/ipvsadm command failed!
Jan 19 22:54:58 loadbalancer2 lvs[8779]: nanny died! shutting down lvs
Jan 19 22:54:58 loadbalancer2 lvs[8779]: shutting down virtual service 32
Jan 19 22:54:58 loadbalancer2 nanny[8786]: Terminating due to signal 15

This may happen after removing an NG node from the load balancer using the following command:

~# ipvsadm -d -f 100 -r xxx.xxx.xxx.xxx

Cause

In scope of Odin Service Automation platform, the issue is going to be addressed in scope of POA-100010, LVS configuration will allow removing a single routing entry without affecting all others.

Originally the issue derives from the following Red Hat behavior:
739223 Nanny crashes and shuts LVS down if a service is deleted using ipvsadm and then the corresponding real server goes down.

Resolution

In case an outage occurs, LVS services should be restarted:

# service pulse start

In order to avoid the outage during maintenance for any NG node, follow these instructions:

  1. Stop the Apache service on the node. Load Balancer will not redirect any request to that node if the Apache service is not running.

    # service httpd stop
    
  2. Once the maintenance is done, start the Apache service.

    # service httpd start
    

Internal content

Link on internal Article