[postgis-users] Finding Islands
Lee Hachadoorian
Lee.Hachadoorian+L at gmail.com
Wed Nov 20 10:10:53 PST 2013
Thank you for these suggestions. I haven't replied yet because I am testing
them, and the queries take an hour or more to run on the 2 million plus
table. I will report back with some results.
On Wed, Nov 20, 2013 at 4:46 AM, Rémi Cura <remi.cura at gmail.com> wrote:
>
> Maybe you can try the stupidest way to go,
> anyway you have to do a inner product because you have to consider each
> pair of polygons.
>
You don't have to do inner product. `a LEFT JOIN b ... WHERE b.gid IS NULL`
is a typical unmatched query. By doing self join ON ST_Touches(), the
polygons with no neighbors are returned because the LEFT JOIN returns all
records from a, while ST_Touches() produces no match (because a polygon
doesn't touch itself) so b.gid IS NULL.
> CREATE TABLE result AS
> SELECT DISTINCT ON (a.gid) a.*
> FROM table AS a , table AS b
> WHERE ST_Intersects(a.geom, b.geom) = TRUE
> AND a.gid != b.gid
>
Just to be clear, this produces the polygons *with* neighbors, which is why
I use this as a subselect with gid NOT IN (...)
Still testing,
--Lee
> Now you can improve, because in the previous query you will compute
> (geom1,geom2) and (geom2,geom1), which is duplicate
> so you can try creating an index on gid , and
> CREATE TABLE result2 AS
> WITH direct_comp AS (SELECT a.gid AS agid, a.geom AS ageom , b.gid AS
> bgid, b.geom AS bgeom
> FROM table AS a , table AS b
> WHERE ST_Intersects(a.geom, b.geom) = TRUE
> AND a.gid < b.gid)
> SELECT agid AS gid, ageom AS geom
> FROM direct_comp
> UNION
> SELECT bgid AS gid, bgeom AS geom
> FROM direct_comp
>
>
>
>
> Now if you want to go really big, this will be difficult.
> Assuming your polygons bboxes have a homogenous size and/or are not too
> big reguarding the total area .
> The idea is to group polygons into same cells, so when you try to find
> which polygons intersects, instead of comparing a polygon to each other
> polygon, you compare a polygon to each other polygon in the same cell.
>
>
> create a grid of cells (big enough or you will have lot's of cells)
> index the table
>
> for each cell, get polygons intersecting it. This defines groups of
> polygons in the same cell, defined by a new id "group_id"
> index "group_id"
>
> Now you will have several options depending on your hardware/data
> 1) process a cell at a time if limited hardware, adding "WHERE group_id=XX"
>
> 2) do a massiv inner join on the group_id and compute all
>
> In fact you could avoid to create explicitly the cells, but it will be
> more complicated.
>
> Cheers,
> Rémi-C
>
>
> 2013/11/20 Brent Wood <pcreso at pcreso.com>
>
>> I figure you have spatially indexed the polygons already?
>>
>> Any way of pre-categorising your polygons - binning them in some way that
>> allows a non spatial test in the where clause to replace the spatial test...
>>
>> eg: a boolean attribute set to indicate if a feature has any part <= a
>> particular x value or not.
>>
>> run this against the 2m features once to populate it, and assuming you
>> get a 50/50 split of T/F features, your 2m^2 query can instead include a
>> "where a.bool = b.bool", as we already know that if the boolean flag is
>> different, they cannot touch, so your query should involve a spatial test
>> only over 1m^2 instead... if you do this on both x & y, the boolean filter
>> will replace even more spatial calculations & drop them to 500,000^2
>> tests...
>>
>> If you can pre-classify features so that non-spatial tests can reduce the
>> spatial ones (esp for features with lots of vertices) in a where clause,
>> such queries do run much faster, but you have the trade-off of the time
>> taken to carry out the classification... which is only n*no_classes/2,
>> generally much faster than n^n
>>
>> Hope this makes sense...
>>
>> Brent Wood
>> ------------------------------
>> *From:* Lee Hachadoorian <Lee.Hachadoorian+L at gmail.com>
>> *To:* PostGIS Users <postgis-users at postgis.refractions.net>
>> *Sent:* Wednesday, November 20, 2013 8:52 PM
>> *Subject:* [postgis-users] Finding Islands
>>
>> I am trying to find "islands", polygons in a (multi)polygon layer which
>> are not connected to any other polygons in the same layer. What I came
>> up with runs in a couple of seconds on a layer with ~1000 geometries,
>> and a couple of minutes on a layer with ~23,000 geometries, but I want
>> to run it on a layer of 2 million+ and it's taking a L-O-O-O-NG time,
>> presumably because it's making (2 million)^2 comparisons.
>>
>> What I came up with is:
>>
>> SELECT a.*
>> FROM table a LEFT JOIN table b
>> ON (ST_Touches(a.geom, b.geom))
>> WHERE b.gid is null
>>
>> or
>>
>> SELECT *
>> FROM table
>> WHERE gid NOT IN (
>> SELECT DISTINCT a.gid
>> FROM table a JOIN table b
>> ON (ST_Intersects(a.geom, b.geom) AND a.gid != b.gid)
>> )
>>
>> The first variant raises "NOTICE: geometry_gist_joinsel called with
>> incorrect join type". So I thought I could improve performance with an
>> INNER JOIN instead of an OUTER JOIN, and came up with the second
>> variant, and it does seem to perform somewhat better. Any suggestions
>> for how to speed up either approach?
>>
>> Best,
>> --Lee
>>
>> --
>> Lee Hachadoorian
>> Assistant Professor in Geography, Dartmouth College
>> http://freecity.commons.gc.cuny.edu
>>
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at lists.osgeo.org
>> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
>>
>>
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at lists.osgeo.org
>> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
>>
>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
>
--
Lee Hachadoorian
Asst Professor of Geography, Dartmouth College
http://freecity.commons.gc.cuny.edu/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20131120/5daf55b5/attachment.html>
More information about the postgis-users
mailing list