<div dir="ltr">Your welcom,<div>I hope it suits you !</div><div><br><div>(It still could be improved )</div></div><div><br></div><div>Cheers,</div><div>Rémi-C</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">
2013/11/21 Lee Hachadoorian <span dir="ltr"><<a href="mailto:Lee.Hachadoorian+L@gmail.com" target="_blank">Lee.Hachadoorian+L@gmail.com</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">And Brent & Rémi, thanks for your suggestions. --Lee</div><div class="HOEnZb"><div class="h5"><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Nov 21, 2013 at 4:40 PM, Lee Hachadoorian <span dir="ltr"><<a href="mailto:Lee.Hachadoorian+L@gmail.com" target="_blank">Lee.Hachadoorian+L@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">This is what I came up with:<div><br></div><div><pre>SELECT * FROM table WHERE gid IN (
SELECT DISTINCT unnest(array[a.gid, b.gid]) AS gid
FROM table a JOIN table b
ON (ST_Intersects(a.geom, b.geom) AND a.gid < b.gid)
)</pre></div><div>I've posted my original attempts, your suggestions, and the final version to my blog (with copious commentary) at <a href="http://geospatial.commons.gc.cuny.edu/2013/11/21/finding-islands-or-the-converse/" target="_blank">http://geospatial.commons.gc.cuny.edu/2013/11/21/finding-islands-or-the-converse/</a></div>
<div><br></div><div>Best,</div><div>--Lee<br></div><div><br></div></div><div><div><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Nov 21, 2013 at 7:28 AM, Rémi Cura <span dir="ltr"><<a href="mailto:remi.cura@gmail.com" target="_blank">remi.cura@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hey ,<div>Oops,</div><div>vocaulary error<br><div>"<span style="font-family:arial,sans-serif;font-size:12.800000190734863px">because it's making (2 million)^2 comparisons." -> it's cartesian product.</span></div>
<div><span style="font-family:arial,sans-serif;font-size:12.800000190734863px"><br></span></div><div><span style="font-family:arial,sans-serif;font-size:12.800000190734863px">basically you want to compare every polygon to every other, this is a cartesian product, and this is what takes time.</span></div>
</div><div><span style="font-family:arial,sans-serif;font-size:12.800000190734863px">Choosing wel the function used (touches or intersects or ...) won't change much computing time as long as it uses index.</span></div>
<div><span style="font-family:arial,sans-serif;font-size:12.800000190734863px">In the same way, the logic you do after is not what takes time.</span></div><div><span style="font-family:arial,sans-serif;font-size:12.800000190734863px"><br>
</span></div><div><span style="font-family:arial,sans-serif;font-size:12.800000190734863px">Maybe you should not use the NOT IN syntaxe, but instead a where, or better, an "except", this should work much better </span></div>
<div><a href="http://www.postgresql.org/docs/9.3/static/queries-union.html" target="_blank">http://www.postgresql.org/docs/9.3/static/queries-union.html</a><span style="font-family:arial,sans-serif;font-size:12.800000190734863px"><br>
</span></div>
<div><span style="font-family:arial,sans-serif;font-size:12.800000190734863px"><br></span></div><div><span style="font-family:arial,sans-serif;font-size:12.800000190734863px"><br></span></div><div><span style="font-family:arial,sans-serif;font-size:12.800000190734863px">Cheers,</span></div>
<div><span style="font-family:arial,sans-serif;font-size:12.800000190734863px"><br>Rémi-C</span></div><div><span style="font-family:arial,sans-serif;font-size:12.800000190734863px"><br></span></div></div><div>
<div><div class="gmail_extra">
<br><br><div class="gmail_quote">2013/11/20 Lee Hachadoorian <span dir="ltr"><<a href="mailto:Lee.Hachadoorian+L@gmail.com" target="_blank">Lee.Hachadoorian+L@gmail.com</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr">Thank you for these suggestions. I haven't replied yet because I am testing them, and the queries take an hour or more to run on the 2 million plus table. I will report back with some results.<div class="gmail_extra">
<br><br><div class="gmail_quote"><div>On Wed, Nov 20, 2013 at 4:46 AM, Rémi Cura <span dir="ltr"><<a href="mailto:remi.cura@gmail.com" target="_blank">remi.cura@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr"><div><br></div><div>Maybe you can try the stupidest way to go, </div><div>anyway you have to do a inner product because you have to consider each pair of polygons.</div></div></blockquote><div><br></div></div>
<div>
You don't have to do inner product. `a LEFT JOIN b ... WHERE b.gid IS NULL` is a typical unmatched query. By doing self join ON ST_Touches(), the polygons with no neighbors are returned because the LEFT JOIN returns all records from a, while ST_Touches() produces no match (because a polygon doesn't touch itself) so b.gid IS NULL.</div>
<div>
<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><br></div><div>CREATE TABLE result AS </div>
<div>SELECT DISTINCT ON (a.gid) a.*</div><div>FROM table AS a , table AS b<br><div><div>WHERE <span style="font-family:arial,sans-serif;font-size:12.800000190734863px">ST_Intersects(a.geom, b.geom) = TRUE </span></div><div>
<span style="font-family:arial,sans-serif;font-size:12.800000190734863px">AND a.gid != b.gid</span></div></div></div></div></blockquote><div><br></div></div><div>Just to be clear, this produces the polygons *with* neighbors, which is why I use this as a subselect with gid NOT IN (...)</div>
<div><br></div><div>Still testing,</div><div>--Lee</div><div><div><div><br></div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr"><div>
<div><div><span style="font-family:arial,sans-serif;font-size:12.800000190734863px"><br></span></div><div><span style="font-family:arial,sans-serif;font-size:12.800000190734863px">Now you can improve, because in the previous query you will compute (geom1,geom2) and (geom2,geom1), which is duplicate</span></div>
<div><span style="font-family:arial,sans-serif;font-size:12.800000190734863px">so you can try creating an index on gid , and</span></div><div><div>CREATE TABLE result2 AS </div><div>WITH direct_comp AS (SELECT a.gid AS agid, a.geom AS ageom , b.gid AS bgid, b.geom AS bgeom</div>
<div>FROM table AS a , table AS b<br><div><div>WHERE <span style="font-family:arial,sans-serif;font-size:12.800000190734863px">ST_Intersects(a.geom, b.geom) = TRUE </span></div><div><span style="font-family:arial,sans-serif;font-size:12.800000190734863px">AND a.gid < b.gid)</span></div>
</div></div></div><div><span style="font-family:arial,sans-serif;font-size:12.800000190734863px">SELECT agid AS gid, ageom AS geom</span></div><div><span style="font-family:arial,sans-serif;font-size:12.800000190734863px">FROM </span>direct_comp </div>
<div>UNION</div><div><div><span style="font-family:arial,sans-serif;font-size:12.800000190734863px">SELECT bgid AS gid, bgeom AS geom</span></div><div><span style="font-family:arial,sans-serif;font-size:12.800000190734863px">FROM </span>direct_comp </div>
</div><div><span style="font-family:arial,sans-serif;font-size:12.800000190734863px"><br></span></div><div><br></div><div><span style="font-family:arial,sans-serif;font-size:12.800000190734863px"><br></span></div><div><span style="font-family:arial,sans-serif;font-size:12.800000190734863px"><br>
</span></div><div><span style="font-family:arial,sans-serif;font-size:12.800000190734863px">Now if you want to go really big, this will be difficult. </span></div><div><span style="font-family:arial,sans-serif;font-size:12.800000190734863px">Assuming your polygons bboxes have a homogenous size and/or are not too big reguarding the total area . </span></div>
<div><span style="font-family:arial,sans-serif;font-size:12.800000190734863px">The idea is to group polygons into same cells, so when you try to find which polygons intersects, instead of comparing a polygon to each other polygon, you compare a polygon to each other polygon in the same cell. </span></div>
<div><span style="font-family:arial,sans-serif;font-size:12.800000190734863px"><br></span></div><div><span style="font-family:arial,sans-serif;font-size:12.800000190734863px"><br></span></div><div><span style="font-family:arial,sans-serif;font-size:12.800000190734863px">create a grid of cells (big enough or you will have lot's of cells)</span></div>
<div><span style="font-family:arial,sans-serif;font-size:12.800000190734863px">index the table</span></div><div><span style="font-family:arial,sans-serif;font-size:12.800000190734863px"><br></span></div><div><span style="font-family:arial,sans-serif;font-size:12.800000190734863px">for each cell, get polygons intersecting it. This defines groups of polygons in the same cell, defined by a new id "group_id"</span></div>
<div><span style="font-family:arial,sans-serif;font-size:12.800000190734863px">index "group_id"</span></div><div><span style="font-family:arial,sans-serif;font-size:12.800000190734863px"><br></span></div><div><span style="font-family:arial,sans-serif;font-size:12.800000190734863px">Now you will have several options depending on your hardware/data</span></div>
<div><span style="font-family:arial,sans-serif;font-size:12.800000190734863px">1) process a cell at a time if limited hardware, adding "WHERE group_id=XX"</span></div><div><span style="font-family:arial,sans-serif;font-size:12.800000190734863px"><br>
</span></div><div><span style="font-family:arial,sans-serif;font-size:12.800000190734863px">2) do a massiv inner join on the group_id and compute all </span></div><div><span style="font-family:arial,sans-serif;font-size:12.800000190734863px"><br>
</span></div><div>In fact you could avoid to create explicitly the cells, but it will be more complicated.</div><div><br></div><div>Cheers,</div></div></div><div>Rémi-C</div></div><div><div><div class="gmail_extra">
<br><br><div class="gmail_quote">
2013/11/20 Brent Wood <span dir="ltr"><<a href="mailto:pcreso@pcreso.com" target="_blank">pcreso@pcreso.com</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div><div style="font-size:12pt;font-family:times new roman,new york,times,serif"><div><span>I figure you have spatially indexed the polygons already?<br></span></div><div style="font-style:normal;font-size:16px;background-color:transparent;font-family:times new roman,new york,times,serif">
<span><br></span></div><div style="font-style:normal;font-size:16px;background-color:transparent;font-family:times new roman,new york,times,serif"><span>Any way of pre-categorising your polygons - binning them in some way that allows a non spatial test in the where clause to replace the spatial test...</span></div>
<div style="font-style:normal;font-size:16px;background-color:transparent;font-family:times new roman,new york,times,serif"><br><span></span></div><div style="font-style:normal;font-size:16px;background-color:transparent;font-family:times new roman,new york,times,serif">
<span>eg: a boolean attribute set to indicate if a feature has any part <= a particular x value or not.</span></div><div style="font-style:normal;font-size:16px;background-color:transparent;font-family:times new roman,new york,times,serif">
<br><span></span></div><div style="font-style:normal;font-size:16px;background-color:transparent;font-family:times new roman,new york,times,serif"><span>run this against the 2m features once to populate it, and assuming you get a 50/50 split of T/F features, your 2m^2 query can instead include a "where a.bool = b.bool", as we already know that if the boolean flag is different, they cannot touch, so your query should involve a spatial test only over 1m^2 instead... if you do this on both x & y, the boolean filter will replace even more spatial calculations & drop them to 500,000^2
tests...</span></div><div style="font-style:normal;font-size:16px;background-color:transparent;font-family:times new roman,new york,times,serif"><br><span></span></div><div style="font-style:normal;font-size:16px;background-color:transparent;font-family:times new roman,new york,times,serif">
<span>If you can pre-classify features so that non-spatial tests can reduce the spatial ones (esp for features with lots of vertices) in a where clause, such queries do run much faster, but you have the trade-off of the time taken to carry out the classification... which is only n*no_classes/2, generally much faster than n^n<br>
</span></div><div><span><br></span></div><div style="font-style:normal;font-size:16px;background-color:transparent;font-family:times new roman,new york,times,serif"><span>Hope this makes sense...</span></div><span><font color="#888888"><div style="font-style:normal;font-size:16px;background-color:transparent;font-family:times new roman,new york,times,serif">
<br><span></span></div><div style="font-style:normal;font-size:16px;background-color:transparent;font-family:times new roman,new york,times,serif"><span>Brent Wood<br></span></div> </font></span><div style="font-family:times new roman,new york,times,serif;font-size:12pt">
<span><font color="#888888"> </font></span><div style="font-family:times new roman,new york,times,serif;font-size:12pt"><span><font color="#888888"> <div dir="ltr"> <hr size="1"> <font face="Arial"> <b><span style="font-weight:bold">From:</span></b> Lee Hachadoorian <<a href="mailto:Lee.Hachadoorian%2BL@gmail.com" target="_blank">Lee.Hachadoorian+L@gmail.com</a>><br>
<b><span style="font-weight:bold">To:</span></b> PostGIS Users <<a href="mailto:postgis-users@postgis.refractions.net" target="_blank">postgis-users@postgis.refractions.net</a>> <br> <b><span style="font-weight:bold">Sent:</span></b> Wednesday, November 20, 2013 8:52 PM<br>
<b><span style="font-weight:bold">Subject:</span></b> [postgis-users] Finding Islands<br> </font>
</div></font></span><div><div> <div><br>I am trying to find "islands", polygons in a (multi)polygon layer which <br>are not connected to any other polygons in the same layer. What I came <br>up with runs in a couple of seconds on a layer with ~1000 geometries, <br>
and a couple of minutes on a layer with ~23,000 geometries, but I want <br>to run it on a layer of 2 million+ and it's taking a L-O-O-O-NG time, <br>presumably because it's making (2 million)^2 comparisons.<br><br>
What I came up with is:<br><br>SELECT a.*<br>FROM table a LEFT JOIN table b<br> ON (ST_Touches(a.geom, b.geom))<br>WHERE b.gid is null<br><br>or<br><br>SELECT *<br>FROM table<br>WHERE gid NOT IN (<br> SELECT DISTINCT a.gid<br>
FROM table a JOIN table b<br> ON (ST_Intersects(a.geom, b.geom) AND a.gid != b.gid)<br> )<br><br>The first variant raises "NOTICE: geometry_gist_joinsel called with
<br>incorrect join type". So I thought I could improve performance with an <br>INNER JOIN instead of an OUTER JOIN, and came up with the second <br>variant, and it does seem to perform somewhat better. Any suggestions <br>
for how to speed up either approach?<br><br>Best,<br>--Lee<br><br>-- <br>Lee Hachadoorian<br>Assistant Professor in Geography, Dartmouth College<br><a href="http://freecity.commons.gc.cuny.edu/" target="_blank">http://freecity.commons.gc.cuny.edu</a><br>
<br><br>_______________________________________________<br>postgis-users mailing list<br><a href="mailto:postgis-users@lists.osgeo.org" target="_blank">postgis-users@lists.osgeo.org</a><br><a href="http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users" target="_blank">http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users</a><br>
<br><br></div> </div></div></div> </div> </div></div><br>_______________________________________________<br>
postgis-users mailing list<br>
<a href="mailto:postgis-users@lists.osgeo.org" target="_blank">postgis-users@lists.osgeo.org</a><br>
<a href="http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users" target="_blank">http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users</a><br></blockquote></div><br></div>
</div></div><br>_______________________________________________<br>
postgis-users mailing list<br>
<a href="mailto:postgis-users@lists.osgeo.org" target="_blank">postgis-users@lists.osgeo.org</a><br>
<a href="http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users" target="_blank">http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users</a><br></blockquote></div></div></div><span><font color="#888888"><br>
<br clear="all"><div><br></div>-- <br>Lee Hachadoorian<br>
Asst Professor of Geography, Dartmouth College<br><a href="http://freecity.commons.gc.cuny.edu/" target="_blank">http://freecity.commons.gc.cuny.edu/</a>
</font></span></div></div>
<br>_______________________________________________<br>
postgis-users mailing list<br>
<a href="mailto:postgis-users@lists.osgeo.org" target="_blank">postgis-users@lists.osgeo.org</a><br>
<a href="http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users" target="_blank">http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users</a><br></blockquote></div><br></div>
</div></div><br>_______________________________________________<br>
postgis-users mailing list<br>
<a href="mailto:postgis-users@lists.osgeo.org" target="_blank">postgis-users@lists.osgeo.org</a><br>
<a href="http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users" target="_blank">http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users</a><br></blockquote></div><br><br clear="all"><div><br></div>-- <br>Lee Hachadoorian<br>
Asst Professor of Geography, Dartmouth College<br><a href="http://freecity.commons.gc.cuny.edu/" target="_blank">http://freecity.commons.gc.cuny.edu/</a>
</div>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br>Lee Hachadoorian<br>Asst Professor of Geography, Dartmouth College<br><a href="http://freecity.commons.gc.cuny.edu/" target="_blank">http://freecity.commons.gc.cuny.edu/</a>
</div>
</div></div><br>_______________________________________________<br>
postgis-users mailing list<br>
<a href="mailto:postgis-users@lists.osgeo.org">postgis-users@lists.osgeo.org</a><br>
<a href="http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users" target="_blank">http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users</a><br></blockquote></div><br></div>