Installing PostgreSQL/PostGIS on Linux Mint 17

Linux Mint 17 (based on Ubuntu 14.04) was released before the Summer, but I finally got around to upgrading from Linux Mint 13. I was previously running PostgreSQL 9.0 / PostGIS 2.0, and it was time to upgrade those as well. If you are upgrading from previous version of PostgreSQL/PostGIS and have data you want to bring over to your new cluster, you have to begin with a dump of your old data:

pg_dump -Fc my_db > my_db.dump

The extension *.dump is arbitrary. Some people prefer *.backup. I should have timed the process for this post, but expect something in the order of hours. For smallish DBs (tens to hundreds of gigabytes), I have seen compression ratios of about 10-20%. Not sure what to expect for much larger DBs.

After that the next step for me was installing Linux Mint 17. If you’re not installing/upgrading an OS, your next step might involve going straight to installing/upgrading PostgreSQL 9.3/PostGIS 2.1. If you are not yet using the ubuntugis-unstable PPA, go ahead and set that up. This will give you access to the latest PostGIS, GDAL/OGR drivers, SpatiaLite, QGIS, etc. In spite of the name, the repository is very stable and AFAICT is what all geolinux geeks are using.

sudo add-apt-repository ppa:ubuntugis/ubuntugis-unstable
sudo add-apt update

Then install PostgreSQL and PostGIS:

sudo apt-get install postgresql-9.3-postgis-2.1 pgadmin3

Since the postgresql-9.3-postgis-2.1 package depends on postgresql-9.3 and recommends postgis and postgresql-9.3-contrib, you should get everything you need. If you use pgAdmin III, that has to be added separately, as I have done here.

Before you can restore your database you have to initialize your cluster. Apt will automatically create a postgres user and initialize the cluster in /var/lib/postgresql/9.3/main.  Perhaps, like me, you keep a separate drive partition for your OS, home, and data. In that case, it won’t do you much good to have your database cluster in your OS partition. Create the directory that will hold the cluster (in the example below, I use /data/pg_data) and set the owner to postgres.

You need to initialize the cluster as the postgres user. It is my longstanding habit, learned from the developer who first introduced me to postgres, to never give a password to the postgres user. Instead, I create a superuser account who does all the potentially destructive in-database stuff, and when I have to do something as postgres, like initialize a cluster, I get to postgres through root. So instead of:

su - postgres

…become root first, then become postgres. Generally I’m not showing the command prompt, but as we step through various accounts the command prompt will change, so I will show the prompts in this listing:

user@host ~ $ su -
host ~ # su - postgres
postgres@host ~ $ /usr/lib/postgresql/9.3/bin/initdb -A trust -D /data/pg_data
postgres@host ~ $ psql
postgres=# CREATE USER superusername WITH SUPERUSER CREATEDB CREATEROLE PASSWORD 'password';
postgres=# \q
postgres@host ~ $ exit
host ~ # exit

Since I am installing on my laptop and no one else is really using this laptop, I have used -A trust to indicate that all local users are trusted. This may not apply to your situation, so use this option judiciously. The final parameter indicates where the cluster will be initialized. I have used /data/pg_data.

Now you need to edit configuration files. In Ubuntu/LM, these files will be located in /etc/postgresql/9.3/main, and will need to be edited as root. You could take superuser and use vim, or you could gksu gedit if you prefer graphical text editors. Personally, I launch a root file manager and then navigate to the desired files and double-click to edit.

Once in the configuration directory, first, edit environment (no extension, the file is just named “environment”) and add the following two lines:

POSTGIS_ENABLE_OUTDB_RASTERS=1
POSTGIS_GDAL_ENABLED_DRIVERS=ENABLE_ALL

Now edit pg_hba.conf to point to the new data cluster. The file will have the line:

data_directory = '/var/lib/postgresql/9.3/main'        # use data in another directory

…which you will need to change to:

data_directory = '/data/pg_data'        # use data in another directory

…or whatever path is appropriate on your system.

Now restart the postgres service, or just reboot, so that postgres restarts using the new cluster.

With a typical PostgreSQL dump/restore, the restore will create the database and load any functions, as well as schemas and data. With PostGIS, the new version will have different spatial functions, so we want to create the database first, create the new spatial functions and necessary structures (such as the spatial_ref_sys table) by loading the PostGIS extension, and only then do we load the data, which we do using a special script rather than the standard pg_restore command.

I do the following steps in pgAdmin. First, create a connection to the server:

pgAdminNewServerRegistration

In pgAdmin it is easy to create a database graphically (which I will not show) by right-clicking on the Databases “can” and choosing New Database, or just open the SQL Editor window to follow the commands below. If you prefer to use psql, now that you have added your system user as a database superuser, you can

psql postgres

Either way, you end up at a SQL editor. You are in the postgres database, which should be used only for system data, so create a new database to store your “real” data.

CREATE DATABASE my_db WITH OWNER superusername;

Now create the necessary PostGIS extensions. The only one that is really necessary is PostGIS, but you might want to install topology support and the TIGER geocoder (which is made more useful by the fuzzystrmatch extension) as well. You need to issue the SQL commands in the correct DB, so in pgAdmin, close the SQL Editor, connect to the new database, and reopen the SQL Editor. In psql just do:

\connect my_db

Now create the extensions:

CREATE EXTENSION postgis;
CREATE EXTENSION postgis_topology;
CREATE EXTENSION fuzzystrmatch;
CREATE EXTENSION postgis_tiger_geocoder;

If you are restoring a previous database, now you are ready to reload your data. PostGIS provides a Perl data loader, postgis_restore.pl, that  will not attempt to recreate the spatial functions, since they are already there (although deprecated functions will be missing and will require special handling if you have database applications which make use of them). Any aspatial data will be loaded in the usual manner, so it is harmless to use this instead of pg_restore. The data loader will not, by default, be in your PATH, so either navigate to the PostGIS contrib directory, as I show below, or be prepared to type in the entire path when you call the command. Then pass in the name of the dump file and pipe the result to psql:

cd /usr/share/postgresql/9.3/contrib/postgis-2.1/
perl postgis_restore.pl ~/my_db.dump | psql -U superusername my_db 2> ~/postgis_restore_errors.txt

Go get a cup of coffee. In fact, go out to dinner. Maybe see a movie.

The PostGIS installer also installs the graphical shapeloader (front end to shp2pgsql-gui). As a final tweak, you might want to configure this as a pgAdmin plugin, so that you have easy access to it when working with pgAdmin (and it will launch with the database connection details taken from the pgAdmin environment). To do so, open (as root) the file /usr/share/pgadmin3/plugins.d/plugins.ini. Then add the following section:

;
; pgShapeLoader (Linux):
; Added manually by lee on 10/21/12
Title=PostGIS Shapefile and DBF loader
Command=$$PGBINDIR/shp2pgsql-gui -U $$USERNAME -d $$DATABASE -p $$PORT -h $$HOSTNAME
Description=Open a PostGIS ESRI Shapefile or Plain dbf loader console to the current database.
KeyFile=$$PGBINDIR/shp2pgsql-gui
Platform=unix
ServerType=postgresql
Database=Yes
SetPassword=No

This will create a new entry in the pgAdmin Plugins menu.

That’s it! You should now have a working PostGIS installation. The first thing I did was fire up QGIS, connect to the database, and load some of my old projects to make sure everything was working. Happy geoprocessing!

References:

 

 

 

Issue with QGIS DB Manager SQL Window

UPDATE: I just upgraded to QGIS 2.4 along with upgrading my OS to Linux Mint 13, and the problem described below no longer seems to an issue.

QGIS DB Manager has a SQL Window which is useful for extracting, joining, or transforming data before loading it in the map canvas. Unfortunately I found that the syntax highlighting causes keypresses to buffer while the highlighting module checks the syntax, and slows it down to the point of making it completely unusable. I reported this to the QGIS Developer mailing list, but the developer was not able to reproduce the problem and only a handful of the other listers were able to confirm it. I’m not sure why, as I have now experienced it on Linux and Windows (including an entire GIS lab of Windows machines), and a large group of my Geovisualization students who use Macs are also affected by it.

Until it can be fixed, I’ve found it necessary to disable syntax highlighting in the plugin. The affected file is dlg_sql_window.py. It is found in the following locations:

  • Linux: /usr/share/qgis/python/plugins/db_manager/dlg_sql_window.py
  • Mac: /Applications/QGIS.app/Contents/Resources/python/plugins/db_manager/dlg_sql_window.py
  • Windows OSGeo4W: C:\OSGeo4W\apps\qgis\python\plugins\db_manager\dlg_sql_window.py

The solution is to comment out lines 34 and 56. On Linux and Mac, editing this file requires administrative privileges, but this is not required in OSGeo4W. The beginning of the file is reproduced here so that you can see which lines need to be commented out.

# -*- coding: utf-8 -*-

"""
/***************************************************************************
Name                 : DB Manager
Description          : Database manager plugin for QGIS
Date                 : May 23, 2011
copyright            : (C) 2011 by Giuseppe Sucameli
email                : brush.tyler@gmail.com

The content of this file is based on
- PG_Manager by Martin Dobias (GPLv2 license)
 ***************************************************************************/

/***************************************************************************
 *                                                                         *
 *   This program is free software; you can redistribute it and/or modify  *
 *   it under the terms of the GNU General Public License as published by  *
 *   the Free Software Foundation; either version 2 of the License, or     *
 *   (at your option) any later version.                                   *
 *                                                                         *
 ***************************************************************************/
"""

from PyQt4.QtCore import *
from PyQt4.QtGui import *
from qgis.core import *

from .db_plugins.plugin import BaseError
from .dlg_db_error import DlgDbError

from .ui.ui_DlgSqlWindow import Ui_DbManagerDlgSqlWindow as Ui_Dialog

# COMMENT OUT THIS LINE -> from .highlighter import SqlHighlighter
from .completer import SqlCompleter

import re

class DlgSqlWindow(QDialog, Ui_Dialog):

  def __init__(self, iface, db, parent=None):
    QDialog.__init__(self, parent)
    self.iface = iface
    self.db = db
    self.setupUi(self)
    self.setWindowTitle( u"%s - %s [%s]" % (self.windowTitle(), db.connection().connectionName(), db.connection().typeNameString()) )

    self.defaultLayerName = 'QueryLayer'

    settings = QSettings()
    self.restoreGeometry(settings.value("/DB_Manager/sqlWindow/geometry", QByteArray(), type=QByteArray))

    self.editSql.setAcceptRichText(False)
    self.editSql.setFocus()
    SqlCompleter(self.editSql, self.db)
    # COMMENT OUT THIS LINE -> SqlHighlighter(self.editSql, self.db)

    # allow to copy results
    copyAction = QAction("copy", self)
    self.viewResult.addAction( copyAction )
    copyAction.setShortcuts(QKeySequence.Copy)
    QObject.connect(copyAction, SIGNAL("triggered()"), self.copySelectedResults)

# [subsequent code omitted]

 

Load PostGIS geometries in R without rgdal

As I said in my last post, rgdal lacks some of the features of GDAL, including the ability to subset columns and rows the source layer, and I demonstrated a workaround. The workaround relied upon the RPostgreSQL package, and this raises a question: Is it possible to transfer geographic data from PostGIS to R just using RPostgreSQL?

It is, and in fact, this may be necessary if you are working on Windows. The rgdal package comes with its own statically linked GDAL, which unfortunately has a very limited set of format drivers—the ubiquitous ESRI Shapefile, but not the PostgreSQL/PostGIS driver. If you try to use dbGetQuery to return a recordset with a geometry column, R will choke. But you can use the ST_AsText function in SQL to convert the PostGIS geometry to Well-Known Text, read it into R, and then use the readWKT function in the rgeos package to convert it into a SpatialPolygons, SpatialLines, or SpatialPoints object.

The readWKT function is not vectorized, so I build the set of SpatialPolygons and rbind them inside a for loop. I’m sure there’s a better way to do this using the *apply family of functions, but I haven’t been able to figure that one out.

library(RPostgreSQL)
library(rgeos)
library(sp)
 
# Load data from the PostGIS server
conn = dbConnect(
  dbDriver("PostgreSQL"), dbname=dbname, host=host, port=5432, 
  user=user, password=password
  )
 
strSQL = "
  SELECT gid, ST_AsText(geom) AS wkt_geometry, attr1, attr2[, ...]
  FROM geo_layer"
dfTemp = dbGetQuery(conn, strSQL)
row.names(dfTemp) = dfTemp$gid
 
# Create spatial polygons
# To set the PROJ4 string, enter the EPSG SRID and uncomment the 
# following two lines:
# EPSG = make_EPSG()
# p4s = EPSG[which(EPSG$code == SRID), "prj4"]
for (i in seq(nrow(dfTemp))) {
  if (i == 1) {
    spTemp = readWKT(dfTemp$wkt_geometry[i], dfTemp$gid[i])
    # If the PROJ4 string has been set, use the following instead
    # spTemp = readWKT(dfTemp$wkt_geometry[i], dfTemp$gid[i], p4s)
  }
  else {
    spTemp = rbind(
      spTemp, readWKT(dfTemp$wkt_geometry[i], dfTemp$gid[i])
      # If the PROJ4 string has been set, use the following instead
      # spTemp, readWKT(dfTemp$wkt_geometry[i], dfTemp$gid[i], p4s)
    )
  }
}
 
# Create SpatialPolygonsDataFrame, drop WKT field from attributes
spdfFinal = SpatialPolygonsDataFrame(spTemp, dfTemp[-2])

Created by Pretty R at inside-R.org

Subsetting in readOGR

The function readOGR in the rgdal package is used to bring vector spatial data sources into R. readOGR() relies upon OGR (part of the GDAL/OGR library) for format conversion. Unfortunately, while OGR supports the ability to subset columns (with the -select switch) or rows (with the -where switch), or even to request a layer using a full SQL statement, none of that functionality is available in readOGR. Every so often this gets discussed in the mailing lists, but the functionality has not yet been implemented.

If your data is already in a SQL database though, you’re in luck. You can accomplish the same thing within R by creating a spatial view in-database, loading the spatial view with readOGR, then dropping the view. I’ve created a function that does just that for PostGIS data sources.

readOgrSql = function (dsn, sql, ...) {

  require(rgdal)
  require(RPostgreSQL)
  require(stringr)

  # check dsn starts "PG:" and strip
  if (str_sub(dsn, 1, 3) != "PG:") {
    stop("readOgrSql only works with PostgreSQL DSNs")
  }
  dsnParamList = str_trim(str_split(dsn, ":")[[1]][2])

  # Build dbConnect expression, quote DSN parameter values 
  # if not already quoted
  if (str_count(dsnParamList, "=") 
      == str_count(dsnParamList, "='[[:alnum:]]+'")) {
    strExpression = str_c(
      "dbConnect(dbDriver('PostgreSQL'), ", 
      str_replace_all(dsnParamList, " ", ", "), 
      ")"
      )
  }
  else {
    dsnArgs = word(str_split(dsnParamList, " ")[[1]], 1, sep="=")
    dsnVals = sapply(
      word(str_split(dsnParamList, " ")[[1]], 2, sep="="), 
      function(x) str_c("'", str_replace_all(x, "'", ""), "'")
      )
    strExpression = str_c(
      "dbConnect(dbDriver('PostgreSQL'), ", 
      str_c(dsnArgs, "=", dsnVals, collapse=", "), 
      ")"
      )
  }

  # Connect, create spatial view, read spatial view, drop spatial view
  conn = eval(parse(text=strExpression))
  strCreateView = paste("CREATE VIEW vw_tmp_read_ogr AS", sql)
  dbSendQuery(conn, strCreateView)
  temp = readOGR(dsn = dsn, layer = "vw_tmp_read_ogr", ...)
  dbSendQuery(conn, "DROP VIEW vw_tmp_read_ogr;")
  dbDisconnect(conn)
  return(temp)
}

Since readOGR can’t send arbitrary SQL to the data source, the function also requires the RPostgreSQL package. Because I wanted the readOgrSql function to conform to readOGR as closely as possible, the function takes the DSN as a character string, then chops it up to pass the DSN parameters to dbConnect. readOGR expects the DSN in this format:

"PG:dbname='db' host='host' port='5432' user='user' password='***'"

The user requires privileges to create objects in the public schema.

Then, readOgrSql can be called as follows:

strSQL = "
  SELECT gid, name,
    ST_Transform(geom, 4326)::geometry(MultiPolygon, 4326) AS geom
  FROM geo_states"
spdfStates = readOgrSql(dsn, strSQL, stringsAsFactors=FALSE)
row.names(spdfStates) = spdfStates$name

(The last line is not strictly necessary, but is a little trick I use to make it easier to refer to specific features by name.)

At the moment this only works for PostGIS 2.0+. It relies on the fact that in recent versions of PostGIS, spatial views are automatically registered because the geometry_columns “table” is actually a view which finds all geometry columns in all tables. But for the view to work as a full spatial layer, the geometry column in the SELECT statement must be explicitly cast using a typmodded geometry. In the example, this is accomplished with ::geometry(MultiPolygon, 2263), but in general ::geometry(<geometry type>, <srid>). If there are no transformations on the geometry, the explicit cast can be avoided as long as the underlying table is defined using the PostGIS 2.0+ typmodded geometry columns:

SELECT gid, geom FROM my_spatial_table;

will work, but

SELECT gid, ST_Transform(geom, 4326) AS geom
FROM my_spatial_table;

will only work partially, because the transformed geometry is generic. See Manually Registering Geometry Columns in geometry_columns for details. Note that rgdal is smart enough to load the layer into an appropriate Spatial*DataFrame (e.g., SpatialPolygonsDataFrame for an OGC Polygon or MultiPolygon), but will lose the spatial reference information (i.e., the projection or coordinate system). It will plot OK, but unless you fix it in R, you won’t be able to combine it with layers in different coordinate reference systems.

Extending this to work for PostGIS pre-2.0 or for SpatiaLite shouldn’t be too difficult, but will require manually registering the spatial view in geometry_columns.

Hexbinning

Hexbinning is a method for visualizing point data when many similar values mean there is a lot of overplotting. Although it originated in the data visualization field as a an enhancement to the traditional XY scatterplot, within the last few years hexbinning has been used more and more in cartography. (See this great blog post by Zachary Forest Johnson which traces this history and explains how to create hexbin maps using D3.js.)

I wanted to map police stops under the NYPD Stop and Frisk program (official source, easier to use version from NYCLU), but at city or borough scale, this just looks like a continuous carpet of dots. Visualizing that kind of dense point data is exactly what hexbinning is for.
Continue reading

Welcome to the Open Geospatial Technologies Blog

Welcome to my new blog. I have previously blogged (somewhat sporadically) at Free City, where I have occasionally posted some technical stuff about installing and using open source GIS software. I have decided to break out the GIS and other geospatial technology material into a separate blog. My goal with this new site is to collect information about using various geospatial tools effectively. Often, that means I will be documenting methods for things that I am currently working on. It will probably grow a little haphazardly, but hopefully will be of interest to other geonerds.

Feel free to comment or contact me. Enjoy!