[postgis-devel] postgis-devel Digest, Vol 130, Issue 5

Rémi Cura remi.cura at gmail.com
Fri Jan 10 01:00:19 PST 2014


Hey,
I'm glad you are pushing in that direction because large scale is a serious
issue.

There is no theoretical limitation (see Grass GIS, topological, fast enough)

Maybe poke in the pg_routing direction (they have a topology, although
different, and time is the matter), and maybe with people using the TIGER
dataset (very large?).

When you ask the question "what are the closest n polygons to this point"
to the database, it is ln(n)  with indexes (or better wit cache), so that
makes the whole process  n *ln(n).

But I would guess the slower is not this but self join on the "edge" table
to "walk" trough topology. Yet it has also been indexed.

Most likely like Sandro said, there is just one or two operation being slow
because they are not optimally written or could be better indexed.

I don't know a way to analyse execution time inside a function in postgres,
I'm gonna ask on postgres list.

Cheers,
Rémi-C




2014/1/10 Graeme B. Bell <grb at skogoglandskap.no>

> Hello Remi and Sandro,
>
> Thanks for your replies. I'll answer your posts together, if you don't
> mind.
>
> > To my knowledge construction is the most expensive operation.
> > I'm sure it can be optimized but finding the bottleneck is not an easy
> task.
> > I remember someone posted a nice graph with timings which didn't look
> > exponential. But I can't find the link to it right now.
>
> > Hey,
> > whatever you do,
> > if you call 9 millions times a plpgsql function (with subsequent multiple
> > other function calls, and lots of exception handling) I'm afraid it will
> be
> > "slow" !
> >
> > I don"t know precisely the operation done when inserting a polygon into
> the
> > topology, but theoretically, we need at least to look if it doesn't
> > intersect other topology (meaning for each insertion, you have to check
> all
> > other already existing topology, theoretically ln(n)) if using an index))
>
> The problems here are:
>
> 0. It is not necessary to check all other already existing topology when
> you add a polygon. It's only necessary to check the nearest neighbours to
> the centerpoint that overlap the bounding box. Checking against all
> neighbours that overlap the bounding box would be slower but OK too. We're
> talking about checking a constant 3-15 polygons, not 'n'.
>
> 1. Not being exponential isn't the issue here. N-squared growth plus a
> slow starting point is already too high for postgis topology to be
> practical for anything other than trivially small datasets.
>
> 2. In principal, there is absolutely no reason for this to be n-squared.
> We know where the polygon is going. We know, from the fact it is a
> topology, that it can only affect or be affected by the first few polygons
> that border the centerpoint - whether the map is a million or a hundred
> polygons, you'd expect each polygon to affect or be affected by only about
> 3-15 neighbours. It would be a strange topology indeed if most polygons had
> even 30 neighbours.
>
> Given that spatial indices allow fast lookups of what those neighbouring
> polygons are, adding a single polygon to the topology should therefore be
> taking place in a small constant or near-constant time.
>
> Q. Do postgis topology functions (including constructors) make use of a
> spatial index? Is there any way to put a spatial index on topology items?
>
> > 3. "if you call 9 millions times a plpgsql function (with subsequent
> multiple
> >> other function calls, and lots of exception handling) I'm afraid it
> will be
> >> "slow" !"
>
> "Slow" would be hours or days, and would be totally acceptable. "Decades"
> means "broken". For a situation where it is only necessary to clip against
> or adjust a few closest intersecting neighbours, 5-20 years is not an OK
> run time for processing a few million polygons on a modern machine, if a
> thousand can be done in seconds.
>
> Why should adding the 5-millionth small polygon take days to compute, when
> it is only touching against 5 other polygons? There is definitely a
> fundamental problem here.
>
> Does anyone on the list have a large map working in postgis topology? If
> so, how did you do it?
>
> Graeme.
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20140110/62215957/attachment.html>


More information about the postgis-devel mailing list