Symptoms

There are failed tasks for VE backups in customer CP.

The tasks fail with timeout with no callback received from vm2vf daemon. In /var/log/IM/PACI-im.log the following error is found:

2015-11-23 15:21:19,589 () ERROR Vm2VfProviderBase [VM2VF callbacks watchdog thread] - No registered callback found: ActiveCallbackId [timestamp=2015-11-22 23:21:19.484, reqId=162737, corbaReqId=927068, info=backup ve [{246853d3-b2e6-4758-afb1-150bbcb3bc9c}] to the node [192.0.2.2], contextId=(95fd9507-9dd8-4ede-9502-631beafb9c85)]

In case of OA 7.0 the error is:

2016-11-21 16:11:18,054 (330f4970-6848-4b69-bed1-3d66704108c4) ERROR GenericVm2VfTask [Shared executor thread #7 @1 @INTERACTIVE] - timeout()
2016-11-21 16:11:18,054 (330f4970-6848-4b69-bed1-3d66704108c4) WARN  GenericVm2VfTask [Shared executor thread #7 @1 @INTERACTIVE] - VM2VF operation [BACKUP] (reqId=4070) finished with rc=-1 (0xFFFFFFFF)
2016-11-21 16:11:18,054 (330f4970-6848-4b69-bed1-3d66704108c4) WARN  GenericVm2VfTask [Shared executor thread #7 @1 @INTERACTIVE] - VM2VF operation [BACKUP] (reqId=4070) finished with rc=-1 (0xFFFFFFFF)

In /var/log/IM/PACI-vm2vf.log at the same time it is clear that the backup task is still in progress:

2015-11-23 15:21:40,678 (95fd9507-9dd8-4ede-9502-631beafb9c85) DEBUG NativeVm2VfCode [Thread-4272309] - [162737:927068] Callback invocation: in_progress(TIMEOUT: Dispatcher is still alive, 0)
2015-11-23 15:22:43,051 (95fd9507-9dd8-4ede-9502-631beafb9c85) INFO  NativeVm2VfCode [Thread-4272346] - [162737:927068] INF callback: job [__PrlSrv_CreateVmBackup] cbp 0x7f553ccdce08 index [162088] h type [PHT_EVENT/PET_DSP_EVT_BACKUP_PROGRESS_CHANGED] h/job [0x6f97d8/0x69fbac] (backup_action, backup.c, 492) @[common/generic_sdk_cb.c][366][generic_sdk_cb][51918])
2015-11-23 15:22:43,052 (95fd9507-9dd8-4ede-9502-631beafb9c85) DEBUG NativeVm2VfCode [Thread-4272347] - [162737:927068] Callback invocation: in_progress(PET_DSP_EVT_BACKUP_PROGRESS_CHANGED, 78)

In case of OA 7.2 the issue can be identified in /var/log/pa/vps.log on OACI Instance Manager:

2018-01-13 01:38:20,074 (dc143f87-796b-4f59-ade8-5b905a1c33cd) INFO  CustomLoggingFilter [Grizzly-worker(1)] - 93904 * Server in-bound request
93904 > POST http://10.128.40.10:4465/paci/v1.0/of/1009088/ve/ls/backup?desc=ls_13012018
..
2018-01-13 13:38:20,189 () INFO  CallbackProxy [pool-5-thread-39] - Invoke internal com.parallels.c2u.vm2vf.corba.AbortableOperations, name timeout
2018-01-13 13:38:20,189 () DEBUG CallbackProxy [pool-5-thread-39] - Invoking com.parallels.c2u.vm2vf.corba.AbortableOperations.timeout(...) for reqId = 28678, ctxId = (dc143f87-796b-4f59-ade8-5b905a1c33cd)
..
2018-01-13 13:38:20,220 (dc143f87-796b-4f59-ade8-5b905a1c33cd) ERROR GenericVm2VfTask [Shared executor thread #1 @1 @INTERACTIVE] - timeout()

Cause

The backup operation lasts longer than the task timeout.

Resolution

To solve the issue, find out if it is expected that the backup task lasts so long. Normal rate is 1 hour for a 100GB virtual machine.

If the VM is indeed huge and requires more than 4 hours to be completed, edit the task timeout in /usr/local/share/PACI-im/IM-config.xml:

change

<vm2vf callback-timeout-min="240" log="true" mode="remote">

to

<vm2vf callback-timeout-min="720 log="true" mode="remote">

Also, it is required to increase timeout for Cloud Infrastructure APS in order to avoid false-failed operations in APS.

Value is set up in milliseconds, and is recommended to be equal to the vm2vf timeout plus 30 minutes.

To do so, edit the following value in /usr/local/share/PACI-aps/PACI-config.xml on the OA Management node:

<callbackTimeout>14400000</callbackTimeout>

to

<callbackTimeout>45000000</callbackTimeout>

In order to apply the changes reload the IM configuration:

How to restart OACI Instance Manager services

And restart core services:

How to restart OA system services

Internal content

Link on internal Article