Filling in Holes with PostGIS

Often you will want to simplify polygons by removing holes. In my case, I had used PostGIS to clip Census tracts to shoreline, but the hydrology layer I was using also had small lakes, ponds, and streams interior to the tracts. I ended up with Census tracts that looked like Swiss cheese.

I found the basic approach to filling in the holes in this postgis-users post by Regina Obe, author of PostGIS in Action:

SELECT gid, ST_Collect(ST_MakePolygon(geom)) As geom
    SELECT gid, ST_ExteriorRing((ST_Dump(geom)).geom) As geom
    FROM my_spatial_table
    ) s

Even this short code snippet is a little dense, so I’m going to annotate it before explaining how I changed it. The crucial function, the function that essentially does what we want, is ST_ExteriorRing(). But it only works on POLYGONs (my data were MULTIPOLYGONs), and it returns a LINESTRING of the boundary which we then need to convert back to (MULTI)POLYGON.

We deal with the first issue by first calling ST_Dump(), which explodes the MULTIPOLYGON into its constituent “islands”. ST_Dump is a set-returning function, which is why we see the unusual construction (ST_Dump(geom)).geom, which accesses the geom column of the function’s resultset. It also produces one output row for each polygon in the MULTIPOLYGON.

The second issue is dealt with by the ST_MakePolygon() function, which takes the LINESTRING and produces a POLYGON. ST_Collect() is an aggregate function which basically reverses the effect of ST_Dump(): it collects the geometries from all rows with the same gid and turns them into one MULTIPOLYGON.

So far we’ve been working with a SELECT statement, but rather than create a view or rewrite the table, I wanted to update the geometry column in the existing table. While the update would operate over all the rows harmlessly, it would probably be faster if it only updated the geometries that needed to change, i.e. the ones with holes. So how can we find the polygons with holes? One obvious choice, ST_NumInteriorRings(), doesn’t actually work because (a) for MULTIPOLYGONs it only returns the number of holes in one of the constituent polygons, and (b) in my case it didn’t return anything (i.e., returned NULL) for the MULTIPOLYGONs.

I was able to make do with ST_NRings(), a function which does work on MULTIPOLYGONs and returns the total count of rings, exterior and interior, in the geometry. The end result was:

UPDATE my_spatial_table t
SET geom = a.geom
    SELECT gid, ST_Collect(ST_MakePolygon(geom)) AS geom
    FROM (
        SELECT gid, ST_NRings(geom) AS nrings, 
            ST_ExteriorRing((ST_Dump(geom)).geom) AS geom
        FROM my_spatial_table
        WHERE ST_NRings(geom) > 1
        ) s
    GROUP BY gid, nrings
    HAVING nrings > COUNT(gid)
    ) a
WHERE t.gid = a.gid;

In the innermost subquery, WHERE ST_NRings(geom) > 1 filters out any geometries comprised of a single polygon without a hole. That alone knocked out ~2500 of the 2765 records in the table I was working with. The result of ST_NRings(geom) is also passed up a level (as nrings) where another filter is applied: HAVING nrings > COUNT(gid). Since ST_Dump() creates one row for each constituent polygon in the MULTIPOLYGON, COUNT(gid) tells us the number of “building block” polygons in each geometry. If this is equal to the total number of rings in the geometry—for example, 8 polygons and 8 total rings—then none of the “bulding block” polygons have holes. If, on the other hand, there are 9 total rings for 8 polygons, one of the polygons has a hole.

The table I was working on had 2765 MULTIPOLYGONs built out of 3204 constituent polygons, with 483 holes to be filled. After both filters are applied, the query only has to change 196 geometries. The update ran in 143 ms. I also tested a version which rewrote all 2765 geometries in the table, whether or not they had holes in them, and that took 406 ms. Still pretty quick, but I expect the time savings would become more substantial on a larger dataset.