<div dir="ltr">Hey,<div>I'm glad you are pushing in that direction because large scale is a serious issue.</div><div><br></div><div>There is no theoretical limitation (see Grass GIS, topological, fast enough)</div><div>
<br></div><div>Maybe poke in the pg_routing direction (they have a topology, although different, and time is the matter), and maybe with people using the TIGER dataset (very large?).</div><div><br></div><div>When you ask the question "what are the closest n polygons to this point" to the database, it is ln(n) with indexes (or better wit cache), so that makes the whole process n *ln(n).</div>
<div><br></div><div>But I would guess the slower is not this but self join on the "edge" table to "walk" trough topology. Yet it has also been indexed.</div><div><br></div><div>Most likely like Sandro said, there is just one or two operation being slow because they are not optimally written or could be better indexed.</div>
<div><br></div><div>I don't know a way to analyse execution time inside a function in postgres, I'm gonna ask on postgres list.</div><div><br></div><div>Cheers,</div><div>Rémi-C</div><div><br></div><div><br></div>
</div><div class="gmail_extra"><br><br><div class="gmail_quote">2014/1/10 Graeme B. Bell <span dir="ltr"><<a href="mailto:grb@skogoglandskap.no" target="_blank">grb@skogoglandskap.no</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hello Remi and Sandro,<br>
<br>
Thanks for your replies. I'll answer your posts together, if you don't mind.<br>
<br>
> To my knowledge construction is the most expensive operation.<br>
> I'm sure it can be optimized but finding the bottleneck is not an easy task.<br>
> I remember someone posted a nice graph with timings which didn't look<br>
> exponential. But I can't find the link to it right now.<br>
<div class="im"><br>
> Hey,<br>
> whatever you do,<br>
> if you call 9 millions times a plpgsql function (with subsequent multiple<br>
> other function calls, and lots of exception handling) I'm afraid it will be<br>
> "slow" !<br>
><br>
> I don"t know precisely the operation done when inserting a polygon into the<br>
> topology, but theoretically, we need at least to look if it doesn't<br>
> intersect other topology (meaning for each insertion, you have to check all<br>
> other already existing topology, theoretically ln(n)) if using an index))<br>
<br>
</div>The problems here are:<br>
<br>
0. It is not necessary to check all other already existing topology when you add a polygon. It's only necessary to check the nearest neighbours to the centerpoint that overlap the bounding box. Checking against all neighbours that overlap the bounding box would be slower but OK too. We're talking about checking a constant 3-15 polygons, not 'n'.<br>
<br>
1. Not being exponential isn't the issue here. N-squared growth plus a slow starting point is already too high for postgis topology to be practical for anything other than trivially small datasets.<br>
<br>
2. In principal, there is absolutely no reason for this to be n-squared. We know where the polygon is going. We know, from the fact it is a topology, that it can only affect or be affected by the first few polygons that border the centerpoint - whether the map is a million or a hundred polygons, you'd expect each polygon to affect or be affected by only about 3-15 neighbours. It would be a strange topology indeed if most polygons had even 30 neighbours.<br>
<br>
Given that spatial indices allow fast lookups of what those neighbouring polygons are, adding a single polygon to the topology should therefore be taking place in a small constant or near-constant time.<br>
<br>
Q. Do postgis topology functions (including constructors) make use of a spatial index? Is there any way to put a spatial index on topology items?<br>
<br>
> 3. "if you call 9 millions times a plpgsql function (with subsequent multiple<br>
<div class="im">>> other function calls, and lots of exception handling) I'm afraid it will be<br>
>> "slow" !"<br>
<br>
</div>"Slow" would be hours or days, and would be totally acceptable. "Decades" means "broken". For a situation where it is only necessary to clip against or adjust a few closest intersecting neighbours, 5-20 years is not an OK run time for processing a few million polygons on a modern machine, if a thousand can be done in seconds.<br>
<br>
Why should adding the 5-millionth small polygon take days to compute, when it is only touching against 5 other polygons? There is definitely a fundamental problem here.<br>
<br>
Does anyone on the list have a large map working in postgis topology? If so, how did you do it?<br>
<div class="HOEnZb"><div class="h5"><br>
Graeme.<br>
_______________________________________________<br>
postgis-devel mailing list<br>
<a href="mailto:postgis-devel@lists.osgeo.org">postgis-devel@lists.osgeo.org</a><br>
<a href="http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel" target="_blank">http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel</a><br>
</div></div></blockquote></div><br></div>