Backing up Postgres Databases with Docker

Ivar Thorson bio photo By Ivar Thorson

Docker is a pretty good abstraction, but has it some leaks. Today I was trying to deploy a test Postgres database with some data stored in it, and ran into problems shipping the data around between my machines using Docker.

In particular,

  1. Why is there no ability to load/save docker volumes? This is just a real headscratcher, and would be so nice to have built-in.

  2. Why does docker commit, docker save, and docker load not work with Postgres? My conclusion is that the container abstraction appears to be leaky, and that one must use Postgres-specific backup techniques to store what is in a container.

Let’s go through the five stages of Docker Grieving to see how I ended up at this perspective, using the data ingest technique from an earlier post.

Stage 1: Denial (that docker does not have volume save/load)

After creating and mounting docker volumes that contain the postgres data directories:


$ APP="my-app-name"

$ docker volume save my-docker-postgres-volume | gzip -c > some-file.gz

Usage:	docker volume COMMAND

Manage volumes

Commands:
  create      Create a volume
  inspect     Display detailed information on one or more volumes
  ls          List volumes
  prune       Remove all unused local volumes
  rm          Remove one or more volumes

Run 'docker volume COMMAND --help' for more information on a command.

wat? There’s no docker volume save or docker volume load?

Stage 2: Anger (at writing my own backup script)

OK, let’s google this! Stack overflow has some helpful suggestions, and the number of blog posts this is a common problem. Everybody seems to roll some variant of this approach:

# Backup the volume, using a temporary "debian:jessie" instance to do make the tarball, and temporarily putting the resulting file in /tmp.
docker run --rm -v /tmp:/backup-to -v ${APP}-postgres-volume:/backup-from debian:jessie tar -czf /backup-to/backup.tar.gz /backup-from
mv -f /tmp/backup.tar.gz "./${APP}-postgres-volume.tar.gz"

# Create a new volume
docker volume create --name ${APP}-postgres-volume2

# Unpack the tar.gz into the volume2
cp "./${APP}-postgres-volume.tar.gz" /tmp/backup.tar.gz
docker run --rm -v ${APP}-postgres-volume2:/recover -v /tmp:/backup debian:jessie bash -c "cd /recover && tar -xzf /backup/backup.tar.gz"

# Spawn a second ephemeral container to test the new volume
docker run  --name ${APP}-postgres-container2 --env POSTGRES_PASSWORD=mysecretpassword -p 5432:5432 --detach --rm -v ${APP}-postgres-volume2:/var/lib/postgresql/data postgres

While it looks like everything succeed, when you try to actually connect, you’ll see an error message saying the database does not exist:

...
Exception in thread "main" org.postgresql.util.PSQLException: FATAL: database "csv2sql" does not exist
...

Stage 3: Bargaining (and Trying to Back Up Volumeless Images)

OK, so maybe I can just forget about using Docker Volumes, and do the uglier thing and store all the data in the container itself, convert the container into an image, and then use docker load and docker save on the image.

# Make a container (w/o any volumes)
docker run  --name ${APP}-postgres-standalone --env POSTGRES_PASSWORD=mysecretpassword -p 5432:5432 --detach postgres

# Link a temporary image to jump in and create the database: (will prompt for password)
docker run -it --rm --link ${APP}-postgres-standalone:postgres postgres psql -h postgres -U postgres --command='CREATE DATABASE csv2sql' 

# Do the data ingest
# See: https://github.com/ivarthorson/csv2sql
CSVDIR=/path/to/csvs/ java -jar /path/to/csv2sql-0.1.0-SNAPSHOT-standalone.jar

# Stop it
docker stop ${APP}-postgres-standalone

# Make it an image
docker commit ${APP}-postgres-standalone ${APP}-postgres-standalone:latest

# Export it
docker save ${APP}-postgres-standalone | gzip -c > ${APP}-postgres-standalone-image.tgz

# Import it
docker load < ${APP}-docker-standalone-image.tgz

# Try to launch a new instance
docker run  --name ${APP}-postgres-container2 --env POSTGRES_PASSWORD=mysecretpassword -p 5432:5432 --detach --rm -v ${APP}-postgres-volume2:/var/lib/postgresql/data ${APP}-postgres

Same problem as before – the database doesn’t appear to exist to clients.

Stage 4: Depression

(goes off to cry into his beer)

Stage 5: Acceptance

I guess I’m missing something important here about how Postgres and Docker interact, but I was sick of this stupid problem, and sick of stupid Docker.

I ended up building the uberjar and the postgres data dump like this:

#!/bin/bash

APP="my-app"

echo "Building an uberjar..."
lein do clean, uberjar

cp -f target/${APP}.jar standalone

echo "Building the ${APP}-postgres-container..."
docker container stop ${APP}-postgres-container
docker container rm ${APP}-postgres-container
docker image rm ${APP}-postgres
docker run  --rm --name ${APP}-postgres-container --env POSTGRES_PASSWORD=mysecretpassword -p 5432:5432 --detach postgres
sleep 5  # Give postgres time to start up

echo "Filling ${APP}-postgres-container with data..."
docker run -it --rm --link ${APP}-postgres-container:postgres --env PGPASSWORD=mysecretpassword postgres psql -h postgres -U postgres --command='CREATE DATABASE csv2sql;'
# See: https://github.com/ivarthorson/csv2sql
CSVDIR=/path/to/csvs/ java -jar /path/to/csv2sql-0.1.0-SNAPSHOT-standalone.jar

echo "Exporting the database..."
docker exec -t ${APP}-postgres-container pg_dumpall -c -U postgres | gzip -c > standalone/${APP}-postgres.sql.gz

docker stop ${APP}-postgres-container

From there, running the standalone (deployable) stuff is easy:

#!/bin/bash

APP="my-app-name"

THISDIR=`pwd`
OBJ_DIR=${THISDIR}'/obj'  # Please set this to where the object files are saved/stored

echo "Using object directory: $OBJ_DIR"

echo "Stupidly stopping and deleting any running containers...(errors are OK here)"
docker container stop ${APP}-postgres-container
docker container rm ${APP}-postgres-container

echo "Creating postgres container..."
docker run --name ${APP}-postgres-container \
       --env POSTGRES_PASSWORD=mysecretpassword \
       -p 5432:5432 \
       --rm \
       --detach \
       postgres

echo "Waiting for postgres to start up..."
sleep 5;

echo "Dumping data into the postgres container..."
cat ${APP}-postgres.sql.gz | gzip -c -d | docker exec -i ${APP}-postgres-container psql -q -U postgres

echo "Launching the HTTP Server"
java -jar ${APP}.jar

I’m thinking of coining a new rule for myself:

"Docker is for managing software builds, not for managing data."