Docker is a pretty good abstraction, but has it some leaks. Today I was trying to deploy a test Postgres database with some data stored in it, and ran into problems shipping the data around between my machines using Docker.
In particular,
-
Why is there no ability to load/save docker volumes? This is just a real headscratcher, and would be so nice to have built-in.
-
Why does
docker commit
,docker save
, anddocker load
not work with Postgres? My conclusion is that the container abstraction appears to be leaky, and that one must use Postgres-specific backup techniques to store what is in a container.
Let’s go through the five stages of Docker Grieving to see how I ended up at this perspective, using the data ingest technique from an earlier post.
Stage 1: Denial (that docker does not have volume save/load)
After creating and mounting docker volumes that contain the postgres data directories:
$ APP="my-app-name"
$ docker volume save my-docker-postgres-volume | gzip -c > some-file.gz
Usage: docker volume COMMAND
Manage volumes
Commands:
create Create a volume
inspect Display detailed information on one or more volumes
ls List volumes
prune Remove all unused local volumes
rm Remove one or more volumes
Run 'docker volume COMMAND --help' for more information on a command.
…wat? There’s no docker volume save
or docker volume load
?
Stage 2: Anger (at writing my own backup script)
OK, let’s google this! Stack overflow has some helpful suggestions, and the number of blog posts this is a common problem. Everybody seems to roll some variant of this approach:
# Backup the volume, using a temporary "debian:jessie" instance to do make the tarball, and temporarily putting the resulting file in /tmp.
docker run --rm -v /tmp:/backup-to -v ${APP}-postgres-volume:/backup-from debian:jessie tar -czf /backup-to/backup.tar.gz /backup-from
mv -f /tmp/backup.tar.gz "./${APP}-postgres-volume.tar.gz"
# Create a new volume
docker volume create --name ${APP}-postgres-volume2
# Unpack the tar.gz into the volume2
cp "./${APP}-postgres-volume.tar.gz" /tmp/backup.tar.gz
docker run --rm -v ${APP}-postgres-volume2:/recover -v /tmp:/backup debian:jessie bash -c "cd /recover && tar -xzf /backup/backup.tar.gz"
# Spawn a second ephemeral container to test the new volume
docker run --name ${APP}-postgres-container2 --env POSTGRES_PASSWORD=mysecretpassword -p 5432:5432 --detach --rm -v ${APP}-postgres-volume2:/var/lib/postgresql/data postgres
While it looks like everything succeed, when you try to actually connect, you’ll see an error message saying the database does not exist:
...
Exception in thread "main" org.postgresql.util.PSQLException: FATAL: database "csv2sql" does not exist
...
Stage 3: Bargaining (and Trying to Back Up Volumeless Images)
OK, so maybe I can just forget about using Docker Volumes, and do the uglier thing and store all the data in the container itself, convert the container into an image, and then use docker load
and docker save
on the image.
# Make a container (w/o any volumes)
docker run --name ${APP}-postgres-standalone --env POSTGRES_PASSWORD=mysecretpassword -p 5432:5432 --detach postgres
# Link a temporary image to jump in and create the database: (will prompt for password)
docker run -it --rm --link ${APP}-postgres-standalone:postgres postgres psql -h postgres -U postgres --command='CREATE DATABASE csv2sql'
# Do the data ingest
# See: https://github.com/ivarthorson/csv2sql
CSVDIR=/path/to/csvs/ java -jar /path/to/csv2sql-0.1.0-SNAPSHOT-standalone.jar
# Stop it
docker stop ${APP}-postgres-standalone
# Make it an image
docker commit ${APP}-postgres-standalone ${APP}-postgres-standalone:latest
# Export it
docker save ${APP}-postgres-standalone | gzip -c > ${APP}-postgres-standalone-image.tgz
# Import it
docker load < ${APP}-docker-standalone-image.tgz
# Try to launch a new instance
docker run --name ${APP}-postgres-container2 --env POSTGRES_PASSWORD=mysecretpassword -p 5432:5432 --detach --rm -v ${APP}-postgres-volume2:/var/lib/postgresql/data ${APP}-postgres
Same problem as before – the database doesn’t appear to exist to clients.
Stage 4: Depression
(goes off to cry into his beer)
Stage 5: Acceptance
I guess I’m missing something important here about how Postgres and Docker interact, but I was sick of this stupid problem, and sick of stupid Docker.
I ended up building the uberjar and the postgres data dump like this:
#!/bin/bash
APP="my-app"
echo "Building an uberjar..."
lein do clean, uberjar
cp -f target/${APP}.jar standalone
echo "Building the ${APP}-postgres-container..."
docker container stop ${APP}-postgres-container
docker container rm ${APP}-postgres-container
docker image rm ${APP}-postgres
docker run --rm --name ${APP}-postgres-container --env POSTGRES_PASSWORD=mysecretpassword -p 5432:5432 --detach postgres
sleep 5 # Give postgres time to start up
echo "Filling ${APP}-postgres-container with data..."
docker run -it --rm --link ${APP}-postgres-container:postgres --env PGPASSWORD=mysecretpassword postgres psql -h postgres -U postgres --command='CREATE DATABASE csv2sql;'
# See: https://github.com/ivarthorson/csv2sql
CSVDIR=/path/to/csvs/ java -jar /path/to/csv2sql-0.1.0-SNAPSHOT-standalone.jar
echo "Exporting the database..."
docker exec -t ${APP}-postgres-container pg_dumpall -c -U postgres | gzip -c > standalone/${APP}-postgres.sql.gz
docker stop ${APP}-postgres-container
From there, running the standalone (deployable) stuff is easy:
#!/bin/bash
APP="my-app-name"
THISDIR=`pwd`
OBJ_DIR=${THISDIR}'/obj' # Please set this to where the object files are saved/stored
echo "Using object directory: $OBJ_DIR"
echo "Stupidly stopping and deleting any running containers...(errors are OK here)"
docker container stop ${APP}-postgres-container
docker container rm ${APP}-postgres-container
echo "Creating postgres container..."
docker run --name ${APP}-postgres-container \
--env POSTGRES_PASSWORD=mysecretpassword \
-p 5432:5432 \
--rm \
--detach \
postgres
echo "Waiting for postgres to start up..."
sleep 5;
echo "Dumping data into the postgres container..."
cat ${APP}-postgres.sql.gz | gzip -c -d | docker exec -i ${APP}-postgres-container psql -q -U postgres
echo "Launching the HTTP Server"
java -jar ${APP}.jar
I’m thinking of coining a new rule for myself:
"Docker is for managing software builds, not for managing data."