Symptoms

While the actual services are running and OA is available, service pem status command has the following output:

[root@core ~]# service pem status
pleskd is stopped

Service controllers are running:

[root@core ~]# ps aux | grep -c SoLoader
45

But watchdog is stopped:

[root@core ~]# ps aux | grep 'pleskd' | grep -v 'SoLoader'
root     27551  0.0  0.0 103252   884 pts/0    S+   14:38   0:00 grep pleskd

There is a huge amount of cancelled tasks in tm_tasks and tm_usual. In /var/log/poa.debug.log:

Jan  7 11:07:30 core : ERR [SYSTEM 0:3541:b46b7b00 TaskManager]: [txn:23 Tasks::impl::ROFacade::getQueue] Database inconsistency: usual task_id 796880 does not exist
Jan  7 11:07:30 core : ERR [SYSTEM 0:3541:b46b7b00 TaskManager]: [txn:23 Tasks::impl::ROFacade::getQueue] Database inconsistency: usual task_id 796881 does not exist
......

Cause

try_count limit is reached for pleskd-watchdog service, while TaskManager SC requires more time to process the large amount of tasks on startup.

[root@core ~]# grep try_count /usr/local/pem/bin/pleskd-watchdog
    try_count=120

Resolution

Set try_count to a larger value (600 for 10 minutes) and try to start the service again.

Internal content

Link on internal Article