Symptoms
After a Virtuozzo node outage, a lot of VEs (more than 100) were relocated to a single node and some of them failed to start with the following error in /var/log/pa/vps.log
on OACI IM node:
2017-08-31 14:27:14,343 (bb598b68-b195-4439-8ff4-e7ec92e81ad5) INFO GenericVm2VfTask [Shared executor thread #11 @1 @INTERACTIVE] - done_with_message(-2147482544, PRL_ERR_TRY_AGAIN)
2017-08-31 14:27:14,344 (bb598b68-b195-4439-8ff4-e7ec92e81ad5) WARN GenericVm2VfTask [Shared executor thread #11 @1 @INTERACTIVE] - VM2VF operation [START] (reqId=113193) finished with rc=-2147482544
Cause
In case a lot of VEs are started simultaneously on a Virtuozzo node, they hit a bottleneck of parallel operations and some of them face the error and are not attempted to be started again. This behavior is planned to be improved in scope of internal request with ID CCU-17188.
Resolution
In order to prevent such effect, it is recommended to distribute VEs across the cluster more evenly, so that in case of a failover, VEs from a failed nodes are relocated across many nodes, but not on a single node in a large portion.
Such failed VEs should be started manually.