The function readOGR
in the rgdal
package is used to bring vector spatial data sources into R. readOGR()
relies upon OGR (part of the GDAL/OGR library) for format conversion. Unfortunately, while OGR supports the ability to subset columns (with the -select
switch) or rows (with the -where
switch), or even to request a layer using a full SQL statement, none of that functionality is available in readOGR
. Every so often this gets discussed in the mailing lists, but the functionality has not yet been implemented.
If your data is already in a SQL database though, you’re in luck. You can accomplish the same thing within R by creating a spatial view in-database, loading the spatial view with readOGR
, then dropping the view. I’ve created a function that does just that for PostGIS data sources.
readOgrSql = function (dsn, sql, ...) { require(rgdal) require(RPostgreSQL) require(stringr) # check dsn starts "PG:" and strip if (str_sub(dsn, 1, 3) != "PG:") { stop("readOgrSql only works with PostgreSQL DSNs") } dsnParamList = str_trim(str_split(dsn, ":")[[1]][2]) # Build dbConnect expression, quote DSN parameter values # if not already quoted if (str_count(dsnParamList, "=") == str_count(dsnParamList, "='[[:alnum:]]+'")) { strExpression = str_c( "dbConnect(dbDriver('PostgreSQL'), ", str_replace_all(dsnParamList, " ", ", "), ")" ) } else { dsnArgs = word(str_split(dsnParamList, " ")[[1]], 1, sep="=") dsnVals = sapply( word(str_split(dsnParamList, " ")[[1]], 2, sep="="), function(x) str_c("'", str_replace_all(x, "'", ""), "'") ) strExpression = str_c( "dbConnect(dbDriver('PostgreSQL'), ", str_c(dsnArgs, "=", dsnVals, collapse=", "), ")" ) } # Connect, create spatial view, read spatial view, drop spatial view conn = eval(parse(text=strExpression)) strCreateView = paste("CREATE VIEW vw_tmp_read_ogr AS", sql) dbSendQuery(conn, strCreateView) temp = readOGR(dsn = dsn, layer = "vw_tmp_read_ogr", ...) dbSendQuery(conn, "DROP VIEW vw_tmp_read_ogr;") dbDisconnect(conn) return(temp) }
Since readOGR can’t send arbitrary SQL to the data source, the function also requires the RPostgreSQL
package. Because I wanted the readOgrSql
function to conform to readOGR
as closely as possible, the function takes the DSN as a character string, then chops it up to pass the DSN parameters to dbConnect
. readOGR
expects the DSN in this format:
"PG:dbname='db' host='host' port='5432' user='user' password='***'"
The user requires privileges to create objects in the public
schema.
Then, readOgrSql
can be called as follows:
strSQL = " SELECT gid, name, ST_Transform(geom, 4326)::geometry(MultiPolygon, 4326) AS geom FROM geo_states" spdfStates = readOgrSql(dsn, strSQL, stringsAsFactors=FALSE) row.names(spdfStates) = spdfStates$name
(The last line is not strictly necessary, but is a little trick I use to make it easier to refer to specific features by name.)
At the moment this only works for PostGIS 2.0+. It relies on the fact that in recent versions of PostGIS, spatial views are automatically registered because the geometry_columns
“table” is actually a view which finds all geometry columns in all tables. But for the view to work as a full spatial layer, the geometry column in the SELECT
statement must be explicitly cast using a typmodded geometry. In the example, this is accomplished with ::geometry(MultiPolygon, 2263)
, but in general ::geometry(<geometry type>, <srid>)
. If there are no transformations on the geometry, the explicit cast can be avoided as long as the underlying table is defined using the PostGIS 2.0+ typmodded geometry columns:
SELECT gid, geom FROM my_spatial_table;
will work, but
SELECT gid, ST_Transform(geom, 4326) AS geom FROM my_spatial_table;
will only work partially, because the transformed geometry is generic. See Manually Registering Geometry Columns in geometry_columns for details. Note that rgdal
is smart enough to load the layer into an appropriate Spatial*DataFrame (e.g., SpatialPolygonsDataFrame for an OGC Polygon or MultiPolygon), but will lose the spatial reference information (i.e., the projection or coordinate system). It will plot OK, but unless you fix it in R, you won’t be able to combine it with layers in different coordinate reference systems.
Extending this to work for PostGIS pre-2.0 or for SpatiaLite shouldn’t be too difficult, but will require manually registering the spatial view in geometry_columns
.