<div dir="ltr">On 9 May 2013 15:43, Stephen Woodbridge <span dir="ltr"><<a href="mailto:woodbri@swoodbridge.com" target="_blank">woodbri@swoodbridge.com</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div><div>On 5/9/2013 7:20 AM, Nicolas Ribot wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
(I spammed this thread a bit with image attachment....)<br>
<br>
Hi Steve,<br>
<br>
We the given dataset, my approach is indeed slow compared to st_union<br>
approach (though precision for the st_dwithin clause must be adapted to<br>
the current dataset. I took the following precision: 0.000001)<br>
<br>
The st_union method generates 18322 segments in 7318 ms, though the<br>
final association between original lines and new segment is not done here.<br>
<br>
With the query I gave, the st_dwithin part takes 11.7 sec on a recent<br>
laptop machine (1.8 Ghz Intel Core I7, 1024 mb of ram for shared_buffer,<br>
512 for work_mem)...<br>
<br>
The complete query returns 17292 segments in 17956 ms.<br>
<br>
As the lines are almost already noded, it generates a lot of<br>
intersection points coincident with one line ends.<br>
<br>
As you noted, intermediate temp tables may help here:<br>
<br>
I decomposed the query into intermediate steps and the performance is<br>
about the same as with st_union :<br>
<br>
-- First creates temp table with intersection points<br>
drop table if exists intergeom;<br>
create temp table intergeom as<br>
select <a href="http://l1.id" target="_blank">l1.id</a> as l1id, <a href="http://l2.id" target="_blank">l2.id</a> as l2id,<br>
st_intersection(l1.geom, l2.geom) as geom<br>
from bdaways l1 join bdaways l2 on (st_dwithin(l1.geom, l2.geom, 0.000001))<br>
where <a href="http://l1.id" target="_blank">l1.id</a> <> <a href="http://l2.id" target="_blank">l2.id</a> ;<br>
<br>
-- keeps only true intersection points<br>
-- must handle the case where lines intersects at a linestring...<br>
</blockquote>
<br></div></div>
Would it make sense to take all the geometryType(geom) <> 'LINESTRING' and just add the end points to the intergeom table. I think this would then add break points at the ends of overlapping segments insure that they get divided at those points and it would also add extraneous points at the other end, but these would get filter out later. So add these two lines here?<br>
</blockquote><div><br></div><div>You meant, geometryType(geom) = 'LINESTRING ?</div><div>Yes it would make sense. I will try.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<br>
insert intergeom (l1id, l2id, geom)<br>
values (l1id, l2id, st_startpoint(geom))<br>
where geometryType(geom) <> 'LINESTRING';<br>
<br>
insert intergeom (l1id, l2id, geom)<br>
values (l1id, l2id, st_endpoint(geom))<br>
where geometryType(geom) <> 'LINESTRING';<div><br>
<br>
<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
delete from intergeom where geometryType(geom) <> 'POINT';<br>
</blockquote>
<br></div>
I think some OSM data from a small central american country might make a good test case. I have not tried any of these:<br>
<br>
<a href="http://downloads.cloudmade.com/americas/central_america" target="_blank">http://downloads.cloudmade.<u></u>com/americas/central_america</a><br>
<br>
but I notice that if you click into a country like Belize or Costa Rica, there is a shapefile for the country near the bottom of the list or you can pick just a city for a smaller data set.<br>
<br>
I'll start packaging this into a stored procedure. This is an awesome job.<br></blockquote><div><br></div><div>Thanks.</div><div>I tested the costa rica highways dataset and found the following results:</div>
<div><br></div><div>The query runs in about 28 seconds to process 29187 ways that are not at all connected between them.</div><div>It generates 48415 segments.</div><div><br></div><div>As an extra steps, all original ways that did not need to be cut by intersection points have to be added to the final results: these lines</div>
<div> are already clean but were dismissed during the cutting process:</div><div><br></div><div><div>insert into res (gid, sub_id, geom, type)</div><div><span style="white-space:pre-wrap"> </span>select ways.gid, 1 as sub_id, ways.geom, geometryType(ways.geom)</div>
<div><span style="white-space:pre-wrap"> </span>from ways</div><div><span style="white-space:pre-wrap"> </span>where not exists (</div><div><span style="white-space:pre-wrap"> </span>select res.gid from res where res.gid = ways.gid</div>
<div><span style="white-space:pre-wrap"> </span>);</div></div><div><br></div><div>The result looks good, as all ways seems to be segmented (cf. <a href="http://sd-38177.dedibox.fr/img1.png">http://sd-38177.dedibox.fr/img1.png</a> and <a href="http://sd-38177.dedibox.fr/img2.png">http://sd-38177.dedibox.fr/img2.png</a>: black: initial lines, blue: new segments. Lines ends are shown as circles).</div>
<div><br></div><div>A problem remains with non-simple or dirty lines as they self-intersect or have several ends and so are not noded properly.</div>
<div>Though it should be simple to process them by identifying the problem. here is an example of such a line: <a href="http://sd-38177.dedibox.fr/img3.png">http://sd-38177.dedibox.fr/img3.png</a></div><div><br></div><div>
I go looking at the procedure ;)</div><div>
<br></div><div> Nicolas</div></div></div></div>