[postgis-devel] Topology building performance

Sandro Santilli strk at keybit.net
Fri Jan 10 02:58:40 PST 2014


On Fri, Jan 10, 2014 at 08:36:08AM +0000, Graeme B. Bell wrote:

> The problems here are:
> 
> 0. It is not necessary to check all other already existing topology when you add a polygon. It's only necessary to check the nearest neighbours to the centerpoint that overlap the bounding box. Checking against all neighbours that overlap the bounding box would be slower but OK too. We're talking about checking a constant 3-15 polygons, not 'n'.

You mean already existing topology _elements_, right ?
The code only checks elements whose bounding box overlap
the bounding box of the newly added geometry. The smaller
your geometries bounding box, the easiest. I guess ring
insertion could likely be optimized by cutting them in 4 or
more parts, but there's currently no ready function to do
that efficiently and without introducing drifts (ie: split
on existing vertex).

You can uncomment the POSTGIS_TOPOLOGY_DEBUG define in
topology/sql/populate.sql.in (then run "make -C topology"
and psql -f topology/topology_upgrade.sql) to get DEBUG
messages that'd tell you how many candidates are being
considered.

> 1. Not being exponential isn't the issue here. N-squared growth plus a slow starting point is already too high for postgis topology to be practical for anything other than trivially small datasets.

I hope you can help making it better.

> 2. In principal, there is absolutely no reason for this to be n-squared. We know where the polygon is going. We know, from the fact it is a topology, that it can only affect or be affected by the first few polygons that border the centerpoint - whether the map is a million or a hundred polygons, you'd expect each polygon to affect or be affected by only about 3-15 neighbours. It would be a strange topology indeed if most polygons had even 30 neighbours.

This is a duplicate of 0.

> Given that spatial indices allow fast lookups of what those neighbouring polygons are, adding a single polygon to the topology should therefore be taking place in a small constant or near-constant time.

It should be verified that the index is well balanced and
analisys is up to date as the index grows. Doing everything
in a single transaction doesn't help. The diagrams I was
talking about included runs with batch loads, and it helped
a lot. Can you try ?

> Q. Do postgis topology functions (including constructors) make use of a spatial index? Is there any way to put a spatial index on topology items? 

They do, supposedly. Adding elements to a topology should both
use the index _and_ modify it. There are indices on many tables,
which could be part of the slowdown too.

The really hard part is profiling all of this...

> "Slow" would be hours or days, and would be totally acceptable. "Decades" means "broken". For a situation where it is only necessary to clip against or adjust a few closest intersecting neighbours, 5-20 years is not an OK run time for processing a few million polygons on a modern machine, if a thousand can be done in seconds. 
> 
> Why should adding the 5-millionth small polygon take days to compute, when it is only touching against 5 other polygons? There is definitely a fundamental problem here. 

What about producing a syntetic dataset to serve as base for analisys
and see how it performs on differently configured systems ?

--strk;

 ()  ASCII ribbon campaign        - against html e-mail
 /\  http://www.asciiribbon.org   - against proprietary attachments



More information about the postgis-devel mailing list