Setting a custom data directory for a new PostreSQL installation on Linux

When you install Postgres on a Debian-based Linux distro (in my case, Linux Mint 18), the installer will automatically create a database cluster with a data directory in /var/lib/postgresql/9.5/main. But it’s fairly common to run a server with a small root partition, and have the database cluster, which can grow very large, on a separate data partition.

I didn’t have any trouble using initdb to initialize the cluster in the desired location. But at system boot, the server automatically started using the default data cluster, not the one I had created. The PostgreSQL documentation on how to customize server start was pretty obtuse.

Fortunately, Debian-based distros ship with some utilities that allow you to create multiple data clusters, and integrates them all into a multicluster management architecture:

pg_lsclusters is used to list clusters available (whether running or not) on the server
pg_dropcluster is used (surprise!) to drop an existing cluster
pg_createcluster is used to create a cluster

Some important points about pg_createcluster:

An existing cluster will be added to the list of clusters, rather than reinitialized. This was important, because I had already run a load script overnight to restore a database dumped from a prior version of Postgres/PostGIS.
PostgreSQL uses the default port number 5432. New clusters will be automatically initialized to use a new port one number higher than the highest existing Postgres port (5433 for the second cluster, 5434 for the third cluster, etc.).

I did not need the default cluster. I also intend to run only one cluster on this machine, and wanted it to use the standard port (5432). pg_dropcluster and pg_createcluster both take the Postgres version number (9.5 in my case) and cluster name as arguments. The default cluster is named “main”. My computer is named tycho, so I decided to name my cluster “tycho_main” so that it would be clear this was not the default cluster. So I ran the following commands:

sudo pg_dropcluster 9.5 main
sudo pg_createcluster -d /data/pg_data --start 9.5 tycho_main

pg_dropcluster does take --stop as an option to stop a running server, but I already had both servers stopped. The -d option to pg_createcluster is where you set the data directory for the new cluster, and the --start option start the cluster immediately after adding it to the list of available clusters.

After running those commands, I confirmed I could connect to the running server. Then I rebooted to make sure the server started at boot, which it did.

Done and done.

NOTE: Tablespaces can also be used within the database to determine where the database stores the data. The primary use case for this, however, is if the DBA wants to store data that is accessed with varying frequency on media of different speed: commonly accessed tables on fast hard drives or solid state drives, infrequently accessed tables on slow hard drives. It adds complexity to the installation, and is not really intended for the case where the DBA wants the entire database cluster on one partition, but in a nonstandard location.

Open Geospatial Technologies

Tools, Tips & Trends in GIS, Spatial Databases & Geovisualization

Setting a custom data directory for a new PostreSQL installation on Linux

Need help with the Commons?