How to Move from Amazon RDS to dedicated PostgreSQL server

A month ago, I had to transfer all our databases from Amazon RDS to a dedicated PostgreSQL server. In this article, I will share solutions to some of the difficulties that I encountered during the process. I will conclude the article with a list of different backup solutions, which one I picked and why.

PostgreSQL ships with a neat tool pg_dumpall. It will export all schemas from a database server. Unfortunately, I could not use that tool with RDS. The tool assumes that it is run with a user belonging to the superuser role. RDS admin user does not belong to the superuser role. I believe Amazon did it to prevent any poking around the server configuration which could cause malfunction of their service. I had a list of 9 database that I could not export with pg_dumpall. All of the databases had their own user that owned the schema. I wrote a very simple bash script that let me export all of the schema using pg_dump tool. Here is the snippet:

for database in table1 table2 tableN
    pg_dump -h $database > $database.sql

The snippet will prompt you for a password for every database.

Here is the snippet that I used for importing the schema back into the dedicated server:

for database in table1 table2 tableN
    psql $database < $database.sql

PostgreSQL configuration

The initial config that PostgreSQL v9.5 ships with on Ubuntu 14.04 is good enough. You will want to adjust the memory and cpu that the server uses depending on the instance type that you are running. You will also need to add the backup configuration which I will get into details later. Finally, ensure that SSL is turned on and that there is a public and private key that the server can use. I had to generate my own key pair.

Unicode support

To my surprise, PostgreSQL v9.5 on Ubuntu 14.04 does not default to unicode schema. I had to delete the default ascii template and create a new unicode template. Here is the snippet that I had to use:

psql -c "update pg_database set datallowconn = TRUE where datname = 'template0'"
psql -d template0 -c "update pg_database set datistemplate = FALSE where datname = 'template1'"
psql -c "drop database template1"
psql -c "create database template1 with template = template0 encoding = 'UTF-8' lc_ctype = 'en_US.UTF8' lc_collate = 'en_US.UTF8'"
psql -d template0 -c "update pg_database set datistemplate = TRUE where datname = 'template1'"
psql -d template1 -c "update pg_database set datallowconn = FALSE where datname = 'template0'"

After these changes, any database created will default to UTF-8 encoding.

Backup strategies

This is the list of strategies for backing up PostgreSQL that I explored:

Dumping the database to a file - An important advantage of pg_dump over the other backup methods described later is that pg_dump's output can generally be re-loaded into newer versions of PostgreSQL, whereas file-level backups and continuous archiving are both extremely server-version-specific. pg_dump is also the only method that will work when transferring a database to a different machine architecture, such as going from a 32-bit to a 64-bit server. The main disadvantage of this method is that it will not provide high reliability.

File System backup - This approach is impractical, or at least inferior to the method of dumping the database because the database server must be shut down in order to make the backup.

Continuous Archiving and Point-in-Time Recovery (PITR) - This approach is more complex to administer than the first method. Its main advantage is that if it is setup correctly, it is possible to restore the database to its state at any time. The main disadvantage of this method is that it involves archiving large amounts of files thus requiring a lot of storage space. It is the preferred method when high reliability is desired.

I've decided to further explore PITR. There are tools available to aid with PITR:

  • Barman: GPLv3, SSH for backup, written in Python

  • OmniPITR: PostgreSQL license, rsync and ssh, written in Perl

  • pghoard: Apache, AWS S3, GCS, Azure, uses LZMA(very fast) compression and encrypts the backup. Written in Python

My top choice from the above list is pghoard followed by Barman. pghoard wins me with its LZMA compression, encryption of the backup, and the support for cloud storage providers.

Initially, I picked pghoard and did the backup with it. Setting it up is straightforward when following the installation guidelines. After a week of running pghoard, the instance that was running PostgreSQL server went offline for 15 minutes because of unscheduled reboot of the instance. That prompted me into looking at master-slave replication that will ensure the availability of our database service.

Master-slave configuration

There are tools available that can make setting up master-slave PostgreSQL servers easy. I’ve picked repmgr. Setting it up was easy because its config file has only 5 variables. You will need to make changes to your postgresql.conf file and pg_hba.conf but it is all described in the instructions. The one step I had issue with was pg_hba.conf file - for backup or replication you will need to explicitly define database replication for repmgr/pghoard to work correctly. This will not work:

host    all             all                 md5

This will work:

host    replication             all                 md5

We’ve been using this master-slave replication for over a month now and there were no issues or outages at this time.