[postgis-devel] postgis-devel Digest, Vol 130, Issue 5

Fri Jan 10 00:36:08 PST 2014

Hello Remi and Sandro,

Thanks for your replies. I'll answer your posts together, if you don't mind.

> To my knowledge construction is the most expensive operation.
> I'm sure it can be optimized but finding the bottleneck is not an easy task.
> I remember someone posted a nice graph with timings which didn't look
> exponential. But I can't find the link to it right now.

> Hey,
> whatever you do,
> if you call 9 millions times a plpgsql function (with subsequent multiple
> other function calls, and lots of exception handling) I'm afraid it will be
> "slow" !
> 
> I don"t know precisely the operation done when inserting a polygon into the
> topology, but theoretically, we need at least to look if it doesn't
> intersect other topology (meaning for each insertion, you have to check all
> other already existing topology, theoretically ln(n)) if using an index))

The problems here are:

0. It is not necessary to check all other already existing topology when you add a polygon. It's only necessary to check the nearest neighbours to the centerpoint that overlap the bounding box. Checking against all neighbours that overlap the bounding box would be slower but OK too. We're talking about checking a constant 3-15 polygons, not 'n'.

1. Not being exponential isn't the issue here. N-squared growth plus a slow starting point is already too high for postgis topology to be practical for anything other than trivially small datasets.

2. In principal, there is absolutely no reason for this to be n-squared. We know where the polygon is going. We know, from the fact it is a topology, that it can only affect or be affected by the first few polygons that border the centerpoint - whether the map is a million or a hundred polygons, you'd expect each polygon to affect or be affected by only about 3-15 neighbours. It would be a strange topology indeed if most polygons had even 30 neighbours.

Given that spatial indices allow fast lookups of what those neighbouring polygons are, adding a single polygon to the topology should therefore be taking place in a small constant or near-constant time.

Q. Do postgis topology functions (including constructors) make use of a spatial index? Is there any way to put a spatial index on topology items? 

> 3. "if you call 9 millions times a plpgsql function (with subsequent multiple
>> other function calls, and lots of exception handling) I'm afraid it will be
>> "slow" !"

"Slow" would be hours or days, and would be totally acceptable. "Decades" means "broken". For a situation where it is only necessary to clip against or adjust a few closest intersecting neighbours, 5-20 years is not an OK run time for processing a few million polygons on a modern machine, if a thousand can be done in seconds. 

Why should adding the 5-millionth small polygon take days to compute, when it is only touching against 5 other polygons? There is definitely a fundamental problem here. 

Does anyone on the list have a large map working in postgis topology? If so, how did you do it?

Graeme.