Symptoms

  • You are trying to perform an operation in the OA Billing UI, but the UI becomes unavailable.
  • The OA Billing Application Server and the OA Billing Database Server are in different networks.
  • The following information is provided in log files:

    1. Log files show that a connection between the OA Billing Application Server and the OA Billing Database Server is established:

      billing._amt_.log (OA Billing Application Server):
      ...
      [16-09-14 11:50:30.439 RDBMS       RQ00076 TH12802 INF] New session 3665
      [16-09-14 11:50:30.439 RDBMS       RQ00076 TH12802 INF] Direct execute [0x7f3bf004a8e0_0]: SET SESSION statement_timeout = 60000
      [16-09-14 11:50:30.440 RDBMS       RQ00076 TH12802 INF] Service execute [0x7f3bf004a8e0_0]: COMMIT
      [16-09-14 11:50:30.440 billing._am RQ00076 TH12802 NTE] Associating application transaction with database process = 3665
      ...
      
      postgresql.log (OA Billing Database Server):
      ...
      [3665] [57d91d6a.e51] [2016-09-14 11:50:34.433 CEST] [0]: LOG:  statement: BEGIN
      [3665] [57d91d6a.e51] [2016-09-14 11:50:34.434 CEST] [0]: LOG:  execute <unnamed>: SELECT pg_backend_pid()
      [3665] [57d91d6a.e51] [2016-09-14 11:50:34.434 CEST] [0]: LOG:  statement: SET SESSION statement_timeout = 60000
      [3665] [57d91d6a.e51] [2016-09-14 11:50:34.435 CEST] [0]: LOG:  statement: COMMIT
      ...
      
    2. After a while, an attempt to reuse the established connection fails:

      billing._amt_.log (OA Billing Application Server):
      ...
      [16-09-14 13:08:01.065 billing._am RQ03979 TH12802 NTE] Associating application transaction with database process = 3665
      [16-09-14 13:08:01.065 AHRC_Object RQ03979 TH12802 NTE] Entering method AHRC.AuthorizeUserByLogin(user = -1, SID = 0, lang = en, request = 0, localObject = 0, transaction = 0 (HP))
      [16-09-14 13:08:01.065 RDBMS       RQ03979 TH12802 INF] Prepare [0x7f3bf0048df0]: SELECT "UsersID" FROM "IntUsers" WHERE ("Login" = $1) AND ( NOT ("Login" IS NULL))
      [16-09-14 13:08:01.065 RDBMS       RQ03979 TH12802 INF] Execute [0x7f3bf0048df0]: 'admin'
      [16-09-14 13:08:01.065 RDBMS       RQ03979 TH12802 INF] Service execute [0x7f3bf004a8e0_0]: BEGIN (implicit on previous command)
      [16-09-14 13:23:31.066 RDBMS       RQ03979 TH12802 NTE] Code: 1000. could not receive data from server: Connection timed out
      [16-09-14 13:23:31.066 RDBMS       RQ03979 TH12802 INF] Invalid session in non transaction state in ASCursorImp. Try to reinit & retry 1 time.
      ...
      
      postgresql.log (OA Billing Database Server):
      ...
      [3665] [57d91d6a.e51] [2016-09-14 14:01:56.419 CEST] [0]: LOG:  could not receive data from client: Connection timed out
      ...
      
    3. A new connection between the OA Billing Application Server and the OA Billing Database Server is established as soon as a database client process realizes that the old connection is broken:

      billing._amt_.log (OA Billing Application Server):
      ...
      [16-09-14 13:23:31.066 RDBMS       RQ03979 TH12802 INF] Invalid session in non transaction state in ASCursorImp. Try to reinit & retry 1 time.
      [16-09-14 13:23:31.071 RDBMS       RQ03979 TH12802 INF] PGRECONNECT
      [16-09-14 13:23:31.071 RDBMS       RQ03979 TH12802 INF] Prepare [0x7f3c053bdb00]: SELECT pg_backend_pid()
      [16-09-14 13:23:31.071 RDBMS       RQ03979 TH12802 INF] Execute [0x7f3c053bdb00]:
      [16-09-14 13:23:31.071 RDBMS       RQ03979 TH12802 INF] Service execute [0x7f3bf004a8e0_1]: BEGIN (implicit on previous command)
      [16-09-14 13:23:31.072 RDBMS       RQ03979 TH12802 INF] New session 9677
      ...
      

Cause

This problem might be caused by an intermediate firewall or a router which is placed between the OA Billing Application Server and the OA Billing Database Server.

The firewall and the servers may handle idle TCP connections differently. For example, both of the servers consider connections which are idle for 120 minutes as alive whereas the firewall considers connections which are idle for 20 minutes as dead. This means that a connection established between the servers stops working if the idle time of the connection is more than 20 minutes.

Resolution

Try one of the options provided below. Note that these options are mutually exclusive.

  • Try decreasing the value of tcp_keepalive_time + ( tcp_keepalive_probes * tcp_keepalive_intvl ) by changing the corresponding kernel parameters on the OA Billing Database Server (see http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html to learn more about the kernel parameters).

    Note that changing the kernel parameters affects all TCP connections of the OA Billing Database Server.

    Example

    net.ipv4.tcp_keepalive_time = 7200
    net.ipv4.tcp_keepalive_probes = 9
    net.ipv4.tcp_keepalive_intvl = 75
    ->
    net.ipv4.tcp_keepalive_time = 900
    net.ipv4.tcp_keepalive_probes = 5
    net.ipv4.tcp_keepalive_intvl = 300
    
  • Try setting up custom values of the tcp_keepalives_idle, tcp_keepalives_interval, tcp_keepalives_count PostgreSQL settings in the /var/lib/pgsql/data/postgresql.conf PostgreSQL configuration file (see https://www.postgresql.org/ to learn more about the settings). After modifying the file, execute the service postgresql reload command.

    Note that changing the PostgreSQL settings affects only TCP connections of the PostgreSQL database server.

    Example

    #tcp_keepalives_idle = 0                # TCP_KEEPIDLE, in seconds;
    #tcp_keepalives_interval = 0            # TCP_KEEPINTVL, in seconds;
    #tcp_keepalives_count = 0               # TCP_KEEPCNT;
    ->
    tcp_keepalives_idle = 900                # TCP_KEEPIDLE, in seconds;
    tcp_keepalives_interval = 300            # TCP_KEEPINTVL, in seconds;
    tcp_keepalives_count = 5                 # TCP_KEEPCNT;
    

Note: You can use the tcpdump utilty to investigate and analyze this problem.

See also: #127858 "server closed the connection unexpectedly" error in the BA tasks or control panel

Internal content