Symptoms
There are failed tasks for VE backups in customer CP.
The tasks fail with timeout with no callback received from vm2vf daemon. In /var/log/IM/PACI-im.log
the following error is found:
2015-11-23 15:21:19,589 () ERROR Vm2VfProviderBase [VM2VF callbacks watchdog thread] - No registered callback found: ActiveCallbackId [timestamp=2015-11-22 23:21:19.484, reqId=162737, corbaReqId=927068, info=backup ve [{246853d3-b2e6-4758-afb1-150bbcb3bc9c}] to the node [192.0.2.2], contextId=(95fd9507-9dd8-4ede-9502-631beafb9c85)]
In case of OA 7.0 the error is:
2016-11-21 16:11:18,054 (330f4970-6848-4b69-bed1-3d66704108c4) ERROR GenericVm2VfTask [Shared executor thread #7 @1 @INTERACTIVE] - timeout()
2016-11-21 16:11:18,054 (330f4970-6848-4b69-bed1-3d66704108c4) WARN GenericVm2VfTask [Shared executor thread #7 @1 @INTERACTIVE] - VM2VF operation [BACKUP] (reqId=4070) finished with rc=-1 (0xFFFFFFFF)
2016-11-21 16:11:18,054 (330f4970-6848-4b69-bed1-3d66704108c4) WARN GenericVm2VfTask [Shared executor thread #7 @1 @INTERACTIVE] - VM2VF operation [BACKUP] (reqId=4070) finished with rc=-1 (0xFFFFFFFF)
In /var/log/IM/PACI-vm2vf.log
at the same time it is clear that the backup task is still in progress:
2015-11-23 15:21:40,678 (95fd9507-9dd8-4ede-9502-631beafb9c85) DEBUG NativeVm2VfCode [Thread-4272309] - [162737:927068] Callback invocation: in_progress(TIMEOUT: Dispatcher is still alive, 0)
2015-11-23 15:22:43,051 (95fd9507-9dd8-4ede-9502-631beafb9c85) INFO NativeVm2VfCode [Thread-4272346] - [162737:927068] INF callback: job [__PrlSrv_CreateVmBackup] cbp 0x7f553ccdce08 index [162088] h type [PHT_EVENT/PET_DSP_EVT_BACKUP_PROGRESS_CHANGED] h/job [0x6f97d8/0x69fbac] (backup_action, backup.c, 492) @[common/generic_sdk_cb.c][366][generic_sdk_cb][51918])
2015-11-23 15:22:43,052 (95fd9507-9dd8-4ede-9502-631beafb9c85) DEBUG NativeVm2VfCode [Thread-4272347] - [162737:927068] Callback invocation: in_progress(PET_DSP_EVT_BACKUP_PROGRESS_CHANGED, 78)
In case of OA 7.2 the issue can be identified in /var/log/pa/vps.log
on OACI Instance Manager:
2018-01-13 01:38:20,074 (dc143f87-796b-4f59-ade8-5b905a1c33cd) INFO CustomLoggingFilter [Grizzly-worker(1)] - 93904 * Server in-bound request
93904 > POST http://10.128.40.10:4465/paci/v1.0/of/1009088/ve/ls/backup?desc=ls_13012018
..
2018-01-13 13:38:20,189 () INFO CallbackProxy [pool-5-thread-39] - Invoke internal com.parallels.c2u.vm2vf.corba.AbortableOperations, name timeout
2018-01-13 13:38:20,189 () DEBUG CallbackProxy [pool-5-thread-39] - Invoking com.parallels.c2u.vm2vf.corba.AbortableOperations.timeout(...) for reqId = 28678, ctxId = (dc143f87-796b-4f59-ade8-5b905a1c33cd)
..
2018-01-13 13:38:20,220 (dc143f87-796b-4f59-ade8-5b905a1c33cd) ERROR GenericVm2VfTask [Shared executor thread #1 @1 @INTERACTIVE] - timeout()
Cause
The backup operation lasts longer than the task timeout.
Resolution
To solve the issue, find out if it is expected that the backup task lasts so long. Normal rate is 1 hour for a 100GB virtual machine.
If the VM is indeed huge and requires more than 4 hours to be completed, edit the task timeout in /usr/local/share/PACI-im/IM-config.xml
:
change
<vm2vf callback-timeout-min="240" log="true" mode="remote">
to
<vm2vf callback-timeout-min="720 log="true" mode="remote">
Also, it is required to increase timeout for Cloud Infrastructure APS in order to avoid false-failed operations in APS.
Value is set up in milliseconds, and is recommended to be equal to the vm2vf timeout plus 30 minutes.
To do so, edit the following value in /usr/local/share/PACI-aps/PACI-config.xml
on the OA Management node:
<callbackTimeout>14400000</callbackTimeout>
to
<callbackTimeout>45000000</callbackTimeout>
In order to apply the changes reload the IM configuration:
How to restart OACI Instance Manager services
And restart core services:
How to restart OA system services