Setting a custom data directory for a new PostreSQL installation on Linux

When you install Postgres on a Debian-based Linux distro (in my case, Linux Mint 18), the installer will automatically create a database cluster with a data directory in /var/lib/postgresql/9.5/main. But it’s fairly common to run a server with a small root partition, and have the database cluster, which can grow very large, on a separate data partition.

I didn’t have any trouble using initdb to initialize the cluster in the desired location. But at system boot, the server automatically started using the default data cluster, not the one I had created. The PostgreSQL documentation on how to customize server start was pretty obtuse.

Fortunately, Debian-based distros ship with some utilities that allow you to create multiple data clusters, and integrates them all into a multicluster management architecture:

Some important points about pg_createcluster:

  1. An existing cluster will be added to the list of clusters, rather than reinitialized. This was important, because I had already run a load script overnight to restore a database dumped from a prior version of Postgres/PostGIS.
  2. PostgreSQL uses the default port number 5432. New clusters will be automatically initialized to use a new port one number higher than the highest existing Postgres port (5433 for the second cluster, 5434 for the third cluster, etc.).

I did not need the default cluster. I also intend to run only one cluster on this machine, and wanted it to use the standard port (5432). pg_dropcluster and pg_createcluster both take the Postgres version number (9.5 in my case) and cluster name as arguments. The default cluster is named “main”. My computer is named tycho, so I decided to name my cluster “tycho_main” so that it would be clear this was not the default cluster. So I ran the following commands:

sudo pg_dropcluster 9.5 main
sudo pg_createcluster -d /data/pg_data --start 9.5 tycho_main

pg_dropcluster does take --stop as an option to stop a running server, but I already had both servers stopped. The -d option to pg_createcluster is where you set the data directory for the new cluster, and the --start option start the cluster immediately after adding it to the list of available clusters.

After running those commands, I confirmed I could connect to the running server. Then I rebooted to make sure the server started at boot, which it did.

Done and done.

NOTE: Tablespaces can also be used within the database to determine where the database stores the data. The primary use case for this, however, is if the DBA wants to store data that is accessed with varying frequency on media of different speed: commonly accessed tables on fast hard drives or solid state drives, infrequently accessed tables on slow hard drives. It adds complexity to the installation, and is not really intended for the case where the DBA wants the entire database cluster on one partition, but in a nonstandard location.

Installing PostgreSQL/PostGIS on Linux Mint 17

Linux Mint 17 (based on Ubuntu 14.04) was released before the Summer, but I finally got around to upgrading from Linux Mint 13. I was previously running PostgreSQL 9.0 / PostGIS 2.0, and it was time to upgrade those as well. If you are upgrading from previous version of PostgreSQL/PostGIS and have data you want to bring over to your new cluster, you have to begin with a dump of your old data:

pg_dump -Fc my_db > my_db.dump

The extension *.dump is arbitrary. Some people prefer *.backup. I should have timed the process for this post, but expect something in the order of hours. For smallish DBs (tens to hundreds of gigabytes), I have seen compression ratios of about 10-20%. Not sure what to expect for much larger DBs. Continue reading Installing PostgreSQL/PostGIS on Linux Mint 17

Load PostGIS geometries in R without rgdal

As I said in my last post, rgdal lacks some of the features of GDAL, including the ability to subset columns and rows the source layer, and I demonstrated a workaround. The workaround relied upon the RPostgreSQL package, and this raises a question: Is it possible to transfer geographic data from PostGIS to R just using RPostgreSQL?

It is, and in fact, this may be necessary if you are working on Windows. Continue reading Load PostGIS geometries in R without rgdal

Subsetting in readOGR

The function readOGR in the rgdal package is used to bring vector spatial data sources into R. readOGR() relies upon OGR (part of the GDAL/OGR library) for format conversion. Unfortunately, while OGR supports the ability to subset columns (with the -select switch) or rows (with the -where switch), or even to request a layer using a full SQL statement, none of that functionality is available in readOGR. Every so often this gets discussed in the mailing lists, but the functionality has not yet been implemented.

If your data is already in a SQL database though, you’re in luck. You can accomplish the same thing within R by creating a spatial view in-database, loading the spatial view with readOGR, then dropping the view. I’ve created a function that does just that for PostGIS data sources. Continue reading Subsetting in readOGR


Hexbinning is a method for visualizing point data when many similar values mean there is a lot of overplotting. Although it originated in the data visualization field as a an enhancement to the traditional XY scatterplot, within the last few years hexbinning has been used more and more in cartography. (See this great blog post by Zachary Forest Johnson which traces this history and explains how to create hexbin maps using D3.js.)

I wanted to map police stops under the NYPD Stop and Frisk program (official source, easier to use version from NYCLU), but at city or borough scale, this just looks like a continuous carpet of dots. Visualizing that kind of dense point data is exactly what hexbinning is for.
Continue reading Hexbinning

Finding Islands (or the Converse)


I have a PostGIS polygon layer with a lot of small “islands” (actually, lakes and ponds), polygons which are not touching any other polygons. I wanted to delete them from the layer, or rather, since I wanted to keep a version with and without the islands, I wanted to copy only the polygons with neighbors to a new table. I’ll show the solution first, and then how I got there. Continue reading Finding Islands (or the Converse)

Welcome to the Open Geospatial Technologies Blog

Welcome to my new blog. I have previously blogged (somewhat sporadically) at Free City, where I have occasionally posted some technical stuff about installing and using open source GIS software. I have decided to break out the GIS and other geospatial technology material into a separate blog. My goal with this new site is to collect information about using various geospatial tools effectively. Often, that means I will be documenting methods for things that I am currently working on. It will probably grow a little haphazardly, but hopefully will be of interest to other geonerds.

Feel free to comment or contact me. Enjoy!