Symptoms

Some applications of control panel like "Cloud Infrastructure" or the whole Provider's Control Panel became unavailable. Login attempt on management node's UI via http://mn_ip:8080 produces an error:

java.lang.NullPointerException: null

Different types of Operations Automation's tasks fail with the following error:

WFLYEJB0442: Unexpected Error

Common symptom for all tasks that in core.log each of them ends with the following error message:

[task:159017197:17829 p:-default-threadpool;-w:-Idle:490 pau]: c.p.p.tracer exit by exception: com.parallels.pa.service.host.ejb.HCLSenderBean.sendHCLjava.lang.OutOfMemoryError: unable to create new native thread

Note the method that could not created thread - sendHCL.

In console.log OutOfMemory errors occur frequently:

SEVERE [org.glassfish.jersey.server.ServerRuntime$Responder] (pa-rest task-192) An exception was not mapped due to exception mapper failure. The HTTP 500 response will be returned.: com.google.common.util.concurrent.ExecutionError: com.google.common.util.concurrent.ExecutionError: java.lang.OutOfMemoryError: unable to create new native thread

Nevertheless, memory statistics shows no outage of resources.

Cause

Thread leak in method SendHCL. Threads initiated in this method stay open and at some point, pau process reaches system limit for allowed amount of threads.

Tasks "Get traffic usage" and "Collect resources usage statistics from web clusters" contribute mostly since by default they run frequently and send a lot of requests.

This issue was passed for further investigation to the Engineering team as POA-111472: "Outage of several WildFly applications, sendHCL java.lang.OutOfMemoryError".

Resolution

Issue could be workarounded by performing the following steps:

  1. Increase thread limit to 16192 for user jboss in file /etc/security/limits.conf:

    jboss soft nproc 16192
    

    Restart of OA services is required to apply changes.

  2. Make "Get traffic usage info" and "Collect resources usage statistics from web clusters" tasks less frequent (once an hour)

  3. Restart OA services per KB during usual Maintenance time to reset thread count

Please contact your technical manager to clarify status of POA-111472.

Internal content