postrgresql crashes inside containers : CloudBlue Technical Support

Symptoms

postgresql server is configured with a relatively large (compared to container's amount of RAM) shared_buffers limit:

[root@ct ~] grep ^shared_buffers /var/lib/pgsql/data/postgresql.conf
shared_buffers = 2048MB

Further symptoms may differ, depending on the kernel version.

Prior to 2.6.32-042stab085.17:

postgresql process crashes under excessive loads with SIGBUS error. The following messages written to the log file (/var/lib/pgsql/data/pg_log/postgresql-XXX.log)

LOG:  server process (PID 859) was terminated by signal 7: Bus error
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

After 2.6.32-042stab085.17:

It is not possible to start postgresql inside containers. The following errors can be found in /var/lib/pgsql/pgstartup.log:

FATAL:  could not create shared memory segment: Cannot allocate memory
DETAIL:  Failed system call was shmget(key=5432001, size=2278809600, 03600).
HINT:  This error usually means that PostgreSQL's request for a shared memory segment exceeded available memory or swap space, or exceeded your kernel's SHMALL parameter.  You can either reduce the request size or reconfigure the kernel with larger SHMALL.  To reduce the request size (currently 2278809600 bytes), reduce PostgreSQL's shared_buffers parameter (currently 262144) and/or its max_connections parameter (currently 504).
        The PostgreSQL documentation contains more information about shared memory configuration.

Cause

Insufficient amount of shared memory exists in the container (as limited by its configuration) or on the hardware node (1/2 of the total RAM).

The processing of shared memory requests from applications was changed in 2.6.32-042stab085.17 kernel. Previously, the applications could request any amount of shared memory for allocation, but crashed on trying to use above the container's limit (which is always equal to 1/2 of container's RAM, by default). Now applications fail with -ENOSPC, when they try to allocate more than the available limit.

Resolution

There are several possible solutions, depending on the server configuration.

Decrease the size of shared buffers, defined in postgresql configuration.
Increase the amount of RAM, defined for the container (assuming there's available amount of shared memory on the node).
Increase the amount of RAM on the hardware node.

Note: While changing configuration of the container, the total amount of RAM on the hardware node should be taken into account. It may happen that the defined amount of shared memory for container actually exceeds that amount of available shared memory on the node, and the failure of applications will become less obvious in scope of container.