Symptoms
While the actual services are running and OA is available, service pem status
command has the following output:
[root@core ~]# service pem status
pleskd is stopped
Service controllers are running:
[root@core ~]# ps aux | grep -c SoLoader
45
But watchdog is stopped:
[root@core ~]# ps aux | grep 'pleskd' | grep -v 'SoLoader'
root 27551 0.0 0.0 103252 884 pts/0 S+ 14:38 0:00 grep pleskd
There is a huge amount of cancelled tasks in tm_tasks and tm_usual. In /var/log/poa.debug.log
:
Jan 7 11:07:30 core : ERR [SYSTEM 0:3541:b46b7b00 TaskManager]: [txn:23 Tasks::impl::ROFacade::getQueue] Database inconsistency: usual task_id 796880 does not exist
Jan 7 11:07:30 core : ERR [SYSTEM 0:3541:b46b7b00 TaskManager]: [txn:23 Tasks::impl::ROFacade::getQueue] Database inconsistency: usual task_id 796881 does not exist
......
Cause
try_count limit is reached for pleskd-watchdog service, while TaskManager SC requires more time to process the large amount of tasks on startup.
[root@core ~]# grep try_count /usr/local/pem/bin/pleskd-watchdog
try_count=120
Resolution
Set try_count to a larger value (600 for 10 minutes) and try to start the service again.