[postgis-users] Finding Islands

Lee Hachadoorian Lee.Hachadoorian+L at gmail.com
Thu Nov 21 13:41:37 PST 2013


And Brent & Rémi, thanks for your suggestions. --Lee


On Thu, Nov 21, 2013 at 4:40 PM, Lee Hachadoorian <
Lee.Hachadoorian+L at gmail.com> wrote:

> This is what I came up with:
>
> SELECT * FROM table WHERE gid IN (
>     SELECT DISTINCT unnest(array[a.gid, b.gid]) AS gid
>     FROM table a JOIN table b
>         ON (ST_Intersects(a.geom, b.geom) AND a.gid < b.gid)
>     )
>
> I've posted my original attempts, your suggestions, and the final version
> to my blog (with copious commentary) at
> http://geospatial.commons.gc.cuny.edu/2013/11/21/finding-islands-or-the-converse/
>
> Best,
> --Lee
>
>
>
> On Thu, Nov 21, 2013 at 7:28 AM, Rémi Cura <remi.cura at gmail.com> wrote:
>
>> Hey ,
>> Oops,
>> vocaulary error
>> "because it's making (2 million)^2 comparisons." -> it's cartesian
>> product.
>>
>> basically you want to compare every polygon to every other, this is a
>> cartesian product, and this is what takes time.
>> Choosing wel the function used (touches or intersects or ...)  won't
>> change much computing time as long as it uses index.
>> In the same way, the logic you do after is not what takes time.
>>
>> Maybe you should not use the NOT IN syntaxe, but instead a where, or
>> better, an "except", this should work much better
>> http://www.postgresql.org/docs/9.3/static/queries-union.html
>>
>>
>> Cheers,
>>
>> Rémi-C
>>
>>
>>
>> 2013/11/20 Lee Hachadoorian <Lee.Hachadoorian+L at gmail.com>
>>
>>> Thank you for these suggestions. I haven't replied yet because I am
>>> testing them, and the queries take an hour or more to run on the 2 million
>>> plus table. I will report back with some results.
>>>
>>>
>>> On Wed, Nov 20, 2013 at 4:46 AM, Rémi Cura <remi.cura at gmail.com> wrote:
>>>
>>>>
>>>> Maybe you can try the stupidest way to go,
>>>> anyway you have to do a inner product because you have to consider each
>>>> pair of polygons.
>>>>
>>>
>>>  You don't have to do inner product. `a LEFT JOIN b ... WHERE b.gid IS
>>> NULL` is a typical unmatched query. By doing self join ON ST_Touches(), the
>>> polygons with no neighbors are returned because the LEFT JOIN returns all
>>> records from a, while ST_Touches() produces no match (because a polygon
>>> doesn't touch itself) so b.gid IS NULL.
>>>
>>>
>>>> CREATE TABLE result AS
>>>> SELECT DISTINCT ON (a.gid) a.*
>>>> FROM table AS a , table AS b
>>>> WHERE ST_Intersects(a.geom, b.geom) = TRUE
>>>> AND a.gid != b.gid
>>>>
>>>
>>> Just to be clear, this produces the polygons *with* neighbors, which is
>>> why I use this as a subselect with gid NOT IN (...)
>>>
>>> Still testing,
>>> --Lee
>>>
>>>
>>>
>>>
>>>> Now you can improve, because in the previous query you will compute
>>>> (geom1,geom2) and (geom2,geom1), which is duplicate
>>>> so you can try creating an index on gid , and
>>>> CREATE TABLE result2 AS
>>>> WITH direct_comp AS (SELECT a.gid AS agid, a.geom AS ageom , b.gid AS
>>>> bgid, b.geom AS bgeom
>>>> FROM table AS a , table AS b
>>>> WHERE ST_Intersects(a.geom, b.geom) = TRUE
>>>> AND a.gid < b.gid)
>>>> SELECT agid AS gid, ageom AS geom
>>>> FROM direct_comp
>>>> UNION
>>>> SELECT bgid AS gid, bgeom AS geom
>>>> FROM direct_comp
>>>>
>>>>
>>>>
>>>>
>>>> Now if you want to go really big, this will be difficult.
>>>> Assuming your polygons bboxes have a homogenous size and/or are not too
>>>> big reguarding the total area .
>>>> The idea is to group polygons into same cells, so when you try to find
>>>> which polygons intersects, instead of comparing a polygon to each other
>>>> polygon, you compare a polygon to each other polygon in the same cell.
>>>>
>>>>
>>>> create a grid of cells (big enough or you will have lot's of cells)
>>>> index the table
>>>>
>>>> for each cell, get polygons intersecting it. This defines groups of
>>>> polygons in the same cell, defined by a new id "group_id"
>>>> index "group_id"
>>>>
>>>> Now you will have several options depending on your hardware/data
>>>> 1) process a cell at a time if limited hardware, adding "WHERE
>>>> group_id=XX"
>>>>
>>>> 2) do a massiv inner join on the group_id and compute all
>>>>
>>>> In fact you could avoid to create explicitly the cells, but it will be
>>>> more complicated.
>>>>
>>>> Cheers,
>>>> Rémi-C
>>>>
>>>>
>>>> 2013/11/20 Brent Wood <pcreso at pcreso.com>
>>>>
>>>>> I figure you have spatially indexed the polygons already?
>>>>>
>>>>> Any way of pre-categorising your polygons - binning them in some way
>>>>> that allows a non spatial test in the where clause to replace the spatial
>>>>> test...
>>>>>
>>>>> eg: a boolean attribute set to indicate if a feature has any part <= a
>>>>> particular x value or not.
>>>>>
>>>>> run this against the 2m features once to populate it, and assuming you
>>>>> get a 50/50 split of T/F features, your 2m^2 query can instead include a
>>>>> "where a.bool = b.bool", as we already know that if the boolean flag is
>>>>> different, they cannot touch, so your query should involve a spatial test
>>>>> only over 1m^2 instead... if you do this on both x & y, the boolean filter
>>>>> will replace even more spatial calculations & drop them to 500,000^2
>>>>> tests...
>>>>>
>>>>> If you can pre-classify features so that non-spatial tests can reduce
>>>>> the spatial ones (esp for features with lots of vertices) in a where
>>>>> clause, such queries do run much faster, but you have the trade-off of the
>>>>> time taken to carry out the classification... which is only n*no_classes/2,
>>>>> generally much faster than n^n
>>>>>
>>>>> Hope this makes sense...
>>>>>
>>>>> Brent Wood
>>>>>   ------------------------------
>>>>>  *From:* Lee Hachadoorian <Lee.Hachadoorian+L at gmail.com>
>>>>> *To:* PostGIS Users <postgis-users at postgis.refractions.net>
>>>>> *Sent:* Wednesday, November 20, 2013 8:52 PM
>>>>> *Subject:* [postgis-users] Finding Islands
>>>>>
>>>>> I am trying to find "islands", polygons in a (multi)polygon layer
>>>>> which
>>>>> are not connected to any other polygons in the same layer. What I came
>>>>> up with runs in a couple of seconds on a layer with ~1000 geometries,
>>>>> and a couple of minutes on a layer with ~23,000 geometries, but I want
>>>>> to run it on a layer of 2 million+ and it's taking a L-O-O-O-NG time,
>>>>> presumably because it's making (2 million)^2 comparisons.
>>>>>
>>>>> What I came up with is:
>>>>>
>>>>> SELECT a.*
>>>>> FROM table a LEFT JOIN table b
>>>>>     ON (ST_Touches(a.geom, b.geom))
>>>>> WHERE b.gid is null
>>>>>
>>>>> or
>>>>>
>>>>> SELECT *
>>>>> FROM table
>>>>> WHERE gid NOT IN (
>>>>>     SELECT DISTINCT a.gid
>>>>>     FROM table a JOIN table b
>>>>>         ON (ST_Intersects(a.geom, b.geom) AND a.gid != b.gid)
>>>>>     )
>>>>>
>>>>> The first variant raises "NOTICE:  geometry_gist_joinsel called with
>>>>> incorrect join type". So I thought I could improve performance with an
>>>>> INNER JOIN instead of an OUTER JOIN, and came up with the second
>>>>> variant, and it does seem to perform somewhat better. Any suggestions
>>>>> for how to speed up either approach?
>>>>>
>>>>> Best,
>>>>> --Lee
>>>>>
>>>>> --
>>>>> Lee Hachadoorian
>>>>> Assistant Professor in Geography, Dartmouth College
>>>>> http://freecity.commons.gc.cuny.edu
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> postgis-users mailing list
>>>>> postgis-users at lists.osgeo.org
>>>>> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> postgis-users mailing list
>>>>> postgis-users at lists.osgeo.org
>>>>> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> postgis-users mailing list
>>>> postgis-users at lists.osgeo.org
>>>> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
>>>>
>>>
>>>
>>>
>>> --
>>> Lee Hachadoorian
>>> Asst Professor of Geography, Dartmouth College
>>> http://freecity.commons.gc.cuny.edu/
>>>
>>> _______________________________________________
>>> postgis-users mailing list
>>> postgis-users at lists.osgeo.org
>>> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
>>>
>>
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at lists.osgeo.org
>> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
>>
>
>
>
> --
> Lee Hachadoorian
> Asst Professor of Geography, Dartmouth College
> http://freecity.commons.gc.cuny.edu/
>



-- 
Lee Hachadoorian
Asst Professor of Geography, Dartmouth College
http://freecity.commons.gc.cuny.edu/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20131121/c38409b8/attachment.html>


More information about the postgis-users mailing list