Symptoms

The COMM_FAILURE events can be observerd in /var/log/pa/core.log on the Management node:

Dec 18 04:11:24.301 : DBG [openapi:5409682 1:58514:7f4aa07c0700 SAAS ]: [ APSC] Transaction locked: 52559048
Dec 18 04:11:24.328 : ERR [openapi:5409682 1:58514:7f4aa07c0700 SAAS ]: [ SDK::Platform::Tasks::BaseTask::schedule] TaskManager: system exception, ID 'IDL:omg.org/CORBA/COMM_FAILURE:1.0'
TAO exception, minor code = 6 (failed to recv request response; ENOENT), completed = MAYBE
Dec 18 04:11:24.328 : INF [openapi:5409682 1:58514:7f4aa07c0700 SAAS ]: [ SDK::Platform::Tasks::BaseTask::schedule] Failed to narrow object IOR:010000002700000049444c3a506c65736b2f5461736b732f5461736b4d616e61676572507269766174653a312e3000000100000000000000980000000101023a0c00000031302e32302e3130322e3100a62002004b00000014010f004e55500000002b0100000001000000526f6f74504f41004f626a6563744d616e6167657273005461736b4d616e616765720003000000010000005461736b4d616e616765722f37020200000000000000080000000000000054414f0001000000140000000000000005010001000000000001010900000000
Dec 18 04:11:24.328 : DBG [openapi:5409296 1:58514:7f4a9ffbf700 SAAS ]: [ APSC] Transaction locked: 52549898
Dec 18 04:11:24.328 : DBG [openapi:5409339 1:58514:7f4aa17c2700 SAAS ]: [ APSC] Transaction locked: 52548593
Dec 18 04:11:24.328 : ERR [openapi:5409296 1:58514:7f4a9ffbf700 SAAS ]: [ Naming::resolve] TaskManager: system exception, ID 'IDL:omg.org/CORBA/COMM_FAILURE:1.0'
TAO exception, minor code = 6 (failed to recv request response; ENOENT), completed = MAYBE

TaskManager periodically drops core dumps with such stack trace:

Thread 1 (Thread 0x7f0beaffd700 (LWP 60620)):
#0  0x00007f0c1d46b1f7 in raise () from /lib64/libc.so.6
#1  0x00007f0c1d46c8e8 in abort () from /lib64/libc.so.6
#2  0x00007f0c1d4aaf47 in __libc_message () from /lib64/libc.so.6
#3  0x00007f0c1d4b2619 in _int_free () from /lib64/libc.so.6
#4  0x00007f0c19053794 in std::_Deque_base<Plesk::Tasks::impl::Group::Info, std::allocator<Plesk::Tasks::impl::Group::Info> >::~_Deque_base() () from /usr/local/pem/libexec/TaskManager.so.7.4.0.7
#5  0x00007f0c1904ce91 in (anonymous namespace)::ScheduleKeeper::~ScheduleKeeper() () from /usr/local/pem/libexec/TaskManager.so.7.4.0.7
#6  0x00007f0c190ade77 in SDK::Platform::TransactionScope::Shelf::~Shelf() () from /usr/local/pem/libexec/TaskManager.so.7.4.0.7
#7  0x00007f0c19047c58 in (anonymous namespace)::getScheduleKeeper(Plesk::Tasks::impl::GlobalQueue*, SDK::Platform::TransactionScope::Keep&) () from /usr/local/pem/libexec/TaskManager.so.7.4.0.7
#8  0x00007f0c19048743 in Plesk::Tasks::impl::GlobalQueue::setTasksSubscription(int) () from /usr/local/pem/libexec/TaskManager.so.7.4.0.7

Similar cores from SaaS:

#0 0x00007f9cf49c04dc in free () from /lib64/libc.so.6
#1 0x00007f9cefc55d24 in Plesk::APS::APSC::APSCTransactionKeeper::~APSCTransactionKeeper() () from /usr/local/pem/libexec/SaaS.so.7.4.0.2002
#2 0x00007f9cefe61147 in SDK::Platform::TransactionScope::Shelf::~Shelf() () from /usr/local/pem/libexec/SaaS.so.7.4.0.2002
#3 0x00007f9cefc52096 in Plesk::APS::APSC::(anonymous namespace)::TransactMgr::getCurrentWeak() [clone .isra.96] ()
from /usr/local/pem/libexec/SaaS.so.7.4.0.2002
#4 0x00007f9cefc530ed in Plesk::APS::APSC::(anonymous namespace)::clnInterceptor::send_request(PortableInterceptor::ClientRequestInfo*) ()
from /usr/local/pem/libexec/SaaS.so.7.4.0.2002

Cause

Issue is acknowledged as OA-2851.

Resolution

Increase pau transaction timeout according to https://kb.cloudblue.com/en/130501

Internal content