Search Engine: Elastic

Article ID: 133847, created on Mar 14, 2019, last review on Apr 17, 2019

  • Applies to:
  • Operations Automation

Symptoms

Postgres fails to start on DB replica node. How to rebuild replication from scratch?

Examples of cases:

Incorrect WAL sequence stored on the SLAVE node.

/var/lib/pgsql/9.6/data/pg_log/:

[2019-03-11 02:39:18.583 -03] p=1658:5@1/0 c=@/: FATAL:  invalid memory alloc request size     1663074304
[2019-03-11 02:39:18.586 -03] p=1339:3@ c=@/: LOG:  startup process (PID 1658) exited with exit code 1
[2019-03-11 02:39:18.586 -03] p=1339:4@ c=@/: LOG:  terminating any other active server processes
[2019-03-11 02:39:18.596 -03] p=1339:5@ c=@/: LOG:  database system is shut down

OR

Postgresql hangs in recovery mode.

/var/lib/pgsql/9.6/data/pg_log/:

[2019-03-11 07:26:36.105 -03] p=20849:1@ c=@/: LOG:  database system was interrupted; last known up at 2019-03-11 07:22:09 -03
[2019-03-11 07:26:38.813 -03] p=20852:1@ c=[unknown]@172.20.234.64/[unknown]:[unknown] LOG:  connection received: host=172.20.234.64 port=37470
[2019-03-11 07:26:38.824 -03] p=20852:2@ c=oa_172_20_234_63@172.20.234.64/pba:[unknown] FATAL:  the database system is starting up

Resolution

Follow the steps below on the DB replica node to restore the replication:

  1. Stop the postgres service

    # service postgresql-9.6 stop
    
  2. Save values stored in /var/lib/pgsql/9.6/data/recovery.conf:

    # cat /var/lib/pgsql/9.6/data/recovery.conf
    standby_mode = 'on'
    primary_conninfo = 'user=<USER> password=<PASSWORD> host=<MASTER_IP> port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres'
    
    # cp -af /var/lib/pgsql/9.6/data/recovery.conf /tmp/recovery.conf.backup
    
  3. Save values stored in pg_hba.conf and postgresql.conf:

    # cp -af /var/lib/pgsql/9.6/data/pg_hba.conf /tmp/pg_hba.conf.backup
    # cp -af /var/lib/pgsql/9.6/data/postgresql.conf /tmp/postgresql.conf.backup
    
  4. Remove old database replica:

    # rm -rf `/var/lib/pgsql/9.1/data/*`
    
  5. Start replication process

    # su - postgres
    # pg_basebackup -x -P -c fast -h <MASTER_IP> -U <USER> -D /var/lib/pgsql/9.6/data/
    

    <USER> and <MASTER_IP> values are taken from step 2.

    Note: if pg_basebackup process ends with the following error:

    pg_basebackup: COPY stream ended before last file was finished
    

    It is required to increase the wal_keep_segments parameter in postgresql.conf to be set high enough for the total amount of time the backup takes.

  6. Restore the original configuration:

    # cp -af /tmp/pg_hba.conf.backup /var/lib/pgsql/9.6/data/pg_hba.conf 
    # cp -af /tmp/postgresql.conf.backup /var/lib/pgsql/9.6/data/postgresql.conf
    
  7. Start the postgres service:

    # service postgresql-9.6 start
    
  8. Check that postgres started succesfully:

    # service postgresql-9.6 status
    

5356b422f65bdad1c3e9edca5d74a1ae 8fc71f07abe5b233fea1ae0377cd5e3d

Email subscription for changes to this article
Save as PDF