Any OACI VE action results in error, new VEs remain in "Failed to delete" state : CloudBlue Technical Support

Symptoms

Upon creating an OACI VE, it ends up in Failed to delete state.

Starting/stopping a VE results in a failure.

/var/log/pa/vps.log on the OACI IM node contains the messages:

2017-08-18 15:01:12,837 (aps_2d4a4257-f53f-4a16-a6bc-ff209670274a_102) WARN  Heartbeat [Shared executor thread #2 @1 @INTERACTIVE] - invoke stop_ve_cb to route 10.39.54.11 state: OPEN value: -93 ms

and:

2017-08-18 15:01:12,851 () ERROR VeOperationTask [Shared executor thread #2 @1 @INTERACTIVE] - VE operation failed
java.lang.reflect.UndeclaredThrowableException: null
        at com.sun.proxy.$Proxy96.stop_ve_cb(Unknown Source) ~[na:na]
        at com.parallels.c2u.vm2vf.VF.stopVe(VF.java:491) ~[im.jar:na]
        at com.parallels.c2u.im.interceptor.ReqIdHelper.invoke(ReqIdHelper.java:19) ~[im.jar:na]
...
Caused by: com.parallels.c2u.vm2vf.rpc.OpenCircuitException: null
        at com.parallels.c2u.vm2vf.rpc.CircuitBreaker$DefaultExceptionSupplier.get(CircuitBreaker.java:83) ~[im.jar:na]
        at com.parallels.c2u.vm2vf.rpc.CircuitBreaker$DefaultExceptionSupplier.get(CircuitBreaker.java:77) ~[im.jar:na]
        at com.parallels.c2u.vm2vf.rpc.HeartbeatAcceptorCircuitBreaker.invoke(HeartbeatAcceptorCircuitBreaker.java:106) ~[im.jar:na]
        at com.parallels.c2u.vm2vf.rpc.DefaultCircuitBreakerSelector.invoke(DefaultCircuitBreakerSelector.java:57) ~[im.jar:na]

also the following error message may represent the issue too:

Caused by: com.parallels.c2u.vm2vf.rpc.OpenCircuitException: node (10.39.54.11) did not receive heartbeat for 430308275 ms

The issue may have a cumulative effect: if >10 similar requests appear in a short period of time, each getting stuck for 4 minutes, this may lead to further delays in processing any new user operations:

VE start/stop/backup/migrate
web console
Plesk related operations

Cause

Time is not synchronized between OACI Instance Manager node and the Virtuozzo node(s) in question. OACI module highly relies on time synchronization, the requirement is available in the documentation: Installing OACI module

The issue may also cause a complete communication outage, as connectivity between OACI IM and vm2vf services is not restored automatically after failure, even if time gets synced later. This is acknowledged as CCU-16386: connectivity between Instance Manager and vm2vf service on Virtuozzo node is not restored automatically after failure.

Resolution

Such strict dependency on time synchronization has been removed since OACI version 17.13 in scope of request CCU-17323. It is highly advised to update OACI module to make it more stable in this regard.

To minimize the issue occurrence, configure NTP or any other time synchronization in the OACI environment to make sure there is no big difference in time between the nodes.

NTP should be configured on all Virtuozzo hardware nodes and on the Instance Manager. In case OACI IM is located inside a Virtuozzo container, NTP should be running on the corresponding Virtuozzo hardware node.

As a short-term immediate solution, make sure that time on Virtuozzo node is ahead of OACI IM node time and restart the service. For example:

current IM time:

[root@im ~]# date
Fri Apr 13 19:30:20 +05 2018

run the command below on the affected VZ node to make the time ahead for a couple of seconds:
```
[root@vz ~]# date +%T -s "19:30:25"
```

restart the service on VZ node:

[root@vz ~]# service PACI-vm2vf restart