Symptoms
Some applications of control panel like "Cloud Infrastructure" or the whole Provider's Control Panel became unavailable. Login attempt on management node's UI via http://mn_ip:8080
produces an error:
java.lang.NullPointerException: null
Different types of Operations Automation's tasks fail with the following error:
WFLYEJB0442: Unexpected Error
Common symptom for all tasks that in core.log
each of them ends with the following error message:
[task:159017197:17829 p:-default-threadpool;-w:-Idle:490 pau]: c.p.p.tracer exit by exception: com.parallels.pa.service.host.ejb.HCLSenderBean.sendHCLjava.lang.OutOfMemoryError: unable to create new native thread
Note the method that could not created thread - sendHCL
.
In console.log
OutOfMemory errors occur frequently:
SEVERE [org.glassfish.jersey.server.ServerRuntime$Responder] (pa-rest task-192) An exception was not mapped due to exception mapper failure. The HTTP 500 response will be returned.: com.google.common.util.concurrent.ExecutionError: com.google.common.util.concurrent.ExecutionError: java.lang.OutOfMemoryError: unable to create new native thread
Nevertheless, memory statistics shows no outage of resources.
Cause
Thread leak in method SendHCL
. Threads initiated in this method stay open and at some point, pau
process reaches system limit for allowed amount of threads.
Tasks "Get traffic usage" and "Collect resources usage statistics from web clusters" contribute mostly since by default they run frequently and send a lot of requests.
This issue was passed for further investigation to the Engineering team as POA-111472: "Outage of several WildFly applications, sendHCL java.lang.OutOfMemoryError".
Resolution
Issue could be workarounded by performing the following steps:
Increase thread limit to 16192 for user jboss in file
/etc/security/limits.conf
:jboss soft nproc 16192
Restart of OA services is required to apply changes.
Make "Get traffic usage info" and "Collect resources usage statistics from web clusters" tasks less frequent (once an hour)
- Restart OA services per KB during usual Maintenance time to reset thread count
Please contact your technical manager to clarify status of POA-111472.