[postgis-users] Finding Islands

Wed Nov 20 01:46:37 PST 2013

Maybe you can try the stupidest way to go,
anyway you have to do a inner product because you have to consider each
pair of polygons.

CREATE TABLE result AS
SELECT DISTINCT ON (a.gid) a.*
FROM table AS a , table AS b
WHERE ST_Intersects(a.geom, b.geom) = TRUE
AND a.gid != b.gid

Now you can improve, because in the previous query you will compute
(geom1,geom2) and (geom2,geom1), which is duplicate
so you can try creating an index on gid , and
CREATE TABLE result2 AS
WITH direct_comp AS (SELECT a.gid AS agid, a.geom AS ageom , b.gid AS bgid,
b.geom AS bgeom
FROM table AS a , table AS b
WHERE ST_Intersects(a.geom, b.geom) = TRUE
AND a.gid < b.gid)
SELECT agid AS gid, ageom AS geom
FROM direct_comp
UNION
SELECT bgid AS gid, bgeom AS geom
FROM direct_comp

Now if you want to go really big, this will be difficult.
Assuming your polygons bboxes have a homogenous size and/or are not too big
reguarding the total area .
The idea is to group polygons into same cells, so when you try to find
which polygons intersects, instead of comparing a polygon to each other
polygon, you compare a polygon to each other polygon in the same cell.

create a grid of cells (big enough or you will have lot's of cells)
index the table

for each cell, get polygons intersecting it. This defines groups of
polygons in the same cell, defined by a new id "group_id"
index "group_id"

Now you will have several options depending on your hardware/data
1) process a cell at a time if limited hardware, adding "WHERE group_id=XX"

2) do a massiv inner join on the group_id and compute all

In fact you could avoid to create explicitly the cells, but it will be more
complicated.

Cheers,
Rémi-C

2013/11/20 Brent Wood <pcreso at pcreso.com>

> I figure you have spatially indexed the polygons already?
>
> Any way of pre-categorising your polygons - binning them in some way that
> allows a non spatial test in the where clause to replace the spatial test...
>
> eg: a boolean attribute set to indicate if a feature has any part <= a
> particular x value or not.
>
> run this against the 2m features once to populate it, and assuming you get
> a 50/50 split of T/F features, your 2m^2 query can instead include a "where
> a.bool = b.bool", as we already know that if the boolean flag is different,
> they cannot touch, so your query should involve a spatial test only over
> 1m^2 instead... if you do this on both x & y, the boolean filter will
> replace even more spatial calculations & drop them to 500,000^2 tests...
>
> If you can pre-classify features so that non-spatial tests can reduce the
> spatial ones (esp for features with lots of vertices) in a where clause,
> such queries do run much faster, but you have the trade-off of the time
> taken to carry out the classification... which is only n*no_classes/2,
> generally much faster than n^n
>
> Hope this makes sense...
>
> Brent Wood
>   ------------------------------
>  *From:* Lee Hachadoorian <Lee.Hachadoorian+L at gmail.com>
> *To:* PostGIS Users <postgis-users at postgis.refractions.net>
> *Sent:* Wednesday, November 20, 2013 8:52 PM
> *Subject:* [postgis-users] Finding Islands
>
> I am trying to find "islands", polygons in a (multi)polygon layer which
> are not connected to any other polygons in the same layer. What I came
> up with runs in a couple of seconds on a layer with ~1000 geometries,
> and a couple of minutes on a layer with ~23,000 geometries, but I want
> to run it on a layer of 2 million+ and it's taking a L-O-O-O-NG time,
> presumably because it's making (2 million)^2 comparisons.
>
> What I came up with is:
>
> SELECT a.*
> FROM table a LEFT JOIN table b
>     ON (ST_Touches(a.geom, b.geom))
> WHERE b.gid is null
>
> or
>
> SELECT *
> FROM table
> WHERE gid NOT IN (
>     SELECT DISTINCT a.gid
>     FROM table a JOIN table b
>         ON (ST_Intersects(a.geom, b.geom) AND a.gid != b.gid)
>     )
>
> The first variant raises "NOTICE:  geometry_gist_joinsel called with
> incorrect join type". So I thought I could improve performance with an
> INNER JOIN instead of an OUTER JOIN, and came up with the second
> variant, and it does seem to perform somewhat better. Any suggestions
> for how to speed up either approach?
>
> Best,
> --Lee
>
> --
> Lee Hachadoorian
> Assistant Professor in Geography, Dartmouth College
> http://freecity.commons.gc.cuny.edu
>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
>
>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20131120/7a1fc903/attachment.html>