[postgis-users] Finding Islands

Fri Nov 22 05:28:58 PST 2013

Your welcom,
I hope it suits you !

(It still could be improved )

Cheers,
Rémi-C

2013/11/21 Lee Hachadoorian <Lee.Hachadoorian+L at gmail.com>

> And Brent & Rémi, thanks for your suggestions. --Lee
>
>
> On Thu, Nov 21, 2013 at 4:40 PM, Lee Hachadoorian <
> Lee.Hachadoorian+L at gmail.com> wrote:
>
>> This is what I came up with:
>>
>> SELECT * FROM table WHERE gid IN (
>>     SELECT DISTINCT unnest(array[a.gid, b.gid]) AS gid
>>     FROM table a JOIN table b
>>         ON (ST_Intersects(a.geom, b.geom) AND a.gid < b.gid)
>>     )
>>
>> I've posted my original attempts, your suggestions, and the final version
>> to my blog (with copious commentary) at
>> http://geospatial.commons.gc.cuny.edu/2013/11/21/finding-islands-or-the-converse/
>>
>> Best,
>> --Lee
>>
>>
>>
>> On Thu, Nov 21, 2013 at 7:28 AM, Rémi Cura <remi.cura at gmail.com> wrote:
>>
>>> Hey ,
>>> Oops,
>>> vocaulary error
>>> "because it's making (2 million)^2 comparisons." -> it's cartesian
>>> product.
>>>
>>> basically you want to compare every polygon to every other, this is a
>>> cartesian product, and this is what takes time.
>>> Choosing wel the function used (touches or intersects or ...)  won't
>>> change much computing time as long as it uses index.
>>> In the same way, the logic you do after is not what takes time.
>>>
>>> Maybe you should not use the NOT IN syntaxe, but instead a where, or
>>> better, an "except", this should work much better
>>> http://www.postgresql.org/docs/9.3/static/queries-union.html
>>>
>>>
>>> Cheers,
>>>
>>> Rémi-C
>>>
>>>
>>>
>>> 2013/11/20 Lee Hachadoorian <Lee.Hachadoorian+L at gmail.com>
>>>
>>>> Thank you for these suggestions. I haven't replied yet because I am
>>>> testing them, and the queries take an hour or more to run on the 2 million
>>>> plus table. I will report back with some results.
>>>>
>>>>
>>>> On Wed, Nov 20, 2013 at 4:46 AM, Rémi Cura <remi.cura at gmail.com> wrote:
>>>>
>>>>>
>>>>> Maybe you can try the stupidest way to go,
>>>>> anyway you have to do a inner product because you have to consider
>>>>> each pair of polygons.
>>>>>
>>>>
>>>>  You don't have to do inner product. `a LEFT JOIN b ... WHERE b.gid IS
>>>> NULL` is a typical unmatched query. By doing self join ON ST_Touches(), the
>>>> polygons with no neighbors are returned because the LEFT JOIN returns all
>>>> records from a, while ST_Touches() produces no match (because a polygon
>>>> doesn't touch itself) so b.gid IS NULL.
>>>>
>>>>
>>>>> CREATE TABLE result AS
>>>>> SELECT DISTINCT ON (a.gid) a.*
>>>>> FROM table AS a , table AS b
>>>>> WHERE ST_Intersects(a.geom, b.geom) = TRUE
>>>>> AND a.gid != b.gid
>>>>>
>>>>
>>>> Just to be clear, this produces the polygons *with* neighbors, which is
>>>> why I use this as a subselect with gid NOT IN (...)
>>>>
>>>> Still testing,
>>>> --Lee
>>>>
>>>>
>>>>
>>>>
>>>>> Now you can improve, because in the previous query you will compute
>>>>> (geom1,geom2) and (geom2,geom1), which is duplicate
>>>>> so you can try creating an index on gid , and
>>>>> CREATE TABLE result2 AS
>>>>> WITH direct_comp AS (SELECT a.gid AS agid, a.geom AS ageom , b.gid AS
>>>>> bgid, b.geom AS bgeom
>>>>> FROM table AS a , table AS b
>>>>> WHERE ST_Intersects(a.geom, b.geom) = TRUE
>>>>> AND a.gid < b.gid)
>>>>> SELECT agid AS gid, ageom AS geom
>>>>> FROM direct_comp
>>>>> UNION
>>>>> SELECT bgid AS gid, bgeom AS geom
>>>>> FROM direct_comp
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Now if you want to go really big, this will be difficult.
>>>>> Assuming your polygons bboxes have a homogenous size and/or are not
>>>>> too big reguarding the total area .
>>>>> The idea is to group polygons into same cells, so when you try to find
>>>>> which polygons intersects, instead of comparing a polygon to each other
>>>>> polygon, you compare a polygon to each other polygon in the same cell.
>>>>>
>>>>>
>>>>> create a grid of cells (big enough or you will have lot's of cells)
>>>>> index the table
>>>>>
>>>>> for each cell, get polygons intersecting it. This defines groups of
>>>>> polygons in the same cell, defined by a new id "group_id"
>>>>> index "group_id"
>>>>>
>>>>> Now you will have several options depending on your hardware/data
>>>>> 1) process a cell at a time if limited hardware, adding "WHERE
>>>>> group_id=XX"
>>>>>
>>>>> 2) do a massiv inner join on the group_id and compute all
>>>>>
>>>>> In fact you could avoid to create explicitly the cells, but it will be
>>>>> more complicated.
>>>>>
>>>>> Cheers,
>>>>> Rémi-C
>>>>>
>>>>>
>>>>> 2013/11/20 Brent Wood <pcreso at pcreso.com>
>>>>>
>>>>>> I figure you have spatially indexed the polygons already?
>>>>>>
>>>>>> Any way of pre-categorising your polygons - binning them in some way
>>>>>> that allows a non spatial test in the where clause to replace the spatial
>>>>>> test...
>>>>>>
>>>>>> eg: a boolean attribute set to indicate if a feature has any part <=
>>>>>> a particular x value or not.
>>>>>>
>>>>>> run this against the 2m features once to populate it, and assuming
>>>>>> you get a 50/50 split of T/F features, your 2m^2 query can instead include
>>>>>> a "where a.bool = b.bool", as we already know that if the boolean flag is
>>>>>> different, they cannot touch, so your query should involve a spatial test
>>>>>> only over 1m^2 instead... if you do this on both x & y, the boolean filter
>>>>>> will replace even more spatial calculations & drop them to 500,000^2
>>>>>> tests...
>>>>>>
>>>>>> If you can pre-classify features so that non-spatial tests can reduce
>>>>>> the spatial ones (esp for features with lots of vertices) in a where
>>>>>> clause, such queries do run much faster, but you have the trade-off of the
>>>>>> time taken to carry out the classification... which is only n*no_classes/2,
>>>>>> generally much faster than n^n
>>>>>>
>>>>>> Hope this makes sense...
>>>>>>
>>>>>> Brent Wood
>>>>>>   ------------------------------
>>>>>>  *From:* Lee Hachadoorian <Lee.Hachadoorian+L at gmail.com>
>>>>>> *To:* PostGIS Users <postgis-users at postgis.refractions.net>
>>>>>> *Sent:* Wednesday, November 20, 2013 8:52 PM
>>>>>> *Subject:* [postgis-users] Finding Islands
>>>>>>
>>>>>> I am trying to find "islands", polygons in a (multi)polygon layer
>>>>>> which
>>>>>> are not connected to any other polygons in the same layer. What I
>>>>>> came
>>>>>> up with runs in a couple of seconds on a layer with ~1000 geometries,
>>>>>> and a couple of minutes on a layer with ~23,000 geometries, but I
>>>>>> want
>>>>>> to run it on a layer of 2 million+ and it's taking a L-O-O-O-NG time,
>>>>>> presumably because it's making (2 million)^2 comparisons.
>>>>>>
>>>>>> What I came up with is:
>>>>>>
>>>>>> SELECT a.*
>>>>>> FROM table a LEFT JOIN table b
>>>>>>     ON (ST_Touches(a.geom, b.geom))
>>>>>> WHERE b.gid is null
>>>>>>
>>>>>> or
>>>>>>
>>>>>> SELECT *
>>>>>> FROM table
>>>>>> WHERE gid NOT IN (
>>>>>>     SELECT DISTINCT a.gid
>>>>>>     FROM table a JOIN table b
>>>>>>         ON (ST_Intersects(a.geom, b.geom) AND a.gid != b.gid)
>>>>>>     )
>>>>>>
>>>>>> The first variant raises "NOTICE:  geometry_gist_joinsel called with
>>>>>> incorrect join type". So I thought I could improve performance with
>>>>>> an
>>>>>> INNER JOIN instead of an OUTER JOIN, and came up with the second
>>>>>> variant, and it does seem to perform somewhat better. Any suggestions
>>>>>> for how to speed up either approach?
>>>>>>
>>>>>> Best,
>>>>>> --Lee
>>>>>>
>>>>>> --
>>>>>> Lee Hachadoorian
>>>>>> Assistant Professor in Geography, Dartmouth College
>>>>>> http://freecity.commons.gc.cuny.edu
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> postgis-users mailing list
>>>>>> postgis-users at lists.osgeo.org
>>>>>> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> postgis-users mailing list
>>>>>> postgis-users at lists.osgeo.org
>>>>>> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> postgis-users mailing list
>>>>> postgis-users at lists.osgeo.org
>>>>> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Lee Hachadoorian
>>>> Asst Professor of Geography, Dartmouth College
>>>> http://freecity.commons.gc.cuny.edu/
>>>>
>>>> _______________________________________________
>>>> postgis-users mailing list
>>>> postgis-users at lists.osgeo.org
>>>> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
>>>>
>>>
>>>
>>> _______________________________________________
>>> postgis-users mailing list
>>> postgis-users at lists.osgeo.org
>>> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
>>>
>>
>>
>>
>> --
>> Lee Hachadoorian
>> Asst Professor of Geography, Dartmouth College
>> http://freecity.commons.gc.cuny.edu/
>>
>
>
>
> --
> Lee Hachadoorian
> Asst Professor of Geography, Dartmouth College
> http://freecity.commons.gc.cuny.edu/
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20131122/f8f6418b/attachment.html>