[postgis-devel] Postgis topology creation - O(n-squared)? - creates problems with large datasets.

Graeme B. Bell grb at skogoglandskap.no
Tue Jan 21 01:20:36 PST 2014


Hi Sandro,

>> I'll wrap up the rbuild topology script neatly and put it on github, and post to the list when it's ready. 
>> That way, people can generate topologies of any complexity they like, in any zone they like. 
> 
> That'd be perfect for a performance test. It could be only enabled
> when "rbuild" is found and feed a version-controlled configuration
> to it!

2 issues

1. Currently generates a random topology. We could control the seed but it's probably better to have some fixed files that get downloaded if you want to do performance tests. 30MB is not too big nowadays, somewhere between a few seconds and a minute for almost everyone. 

2. It takes *lots* of time to build the topology from raster. I hide this by making the geometry fit neatly with the tiling system and letting rbuild parallelise like mad and using a reasonably fast system for builds. It wouldn't be fun for most devs. Better to download standard test data, I'd say. 

> The asymptote on that curve is looking a lot more healthy :-)
> 
> You can see what I meant about "ST_CreateTopogeo" not being
> optimized at all, it behaves the same as the incremental builder,
> and often it even takes more time.

Indeed.

Let's think ... 

- Is there anything clever that can be done if you know all the polygons you want to add at once?   (in terms of the order they are added (left to right? big through small? randomise?)  - or in terms of how they're grouped for processing?)

- Currently they are passed as a collection. Would it help if they were passed or accessed in some other way?

- Could the user hint that the geometry is already closely topology-like? Is this useful or does it encourage risk taking? 

- What parameters / internal constants are used; how do they affect the resulting topology and the costs of building it.

- Profiling: currently, on large datasets, what 3-5 functions eat the most time in createTopo or the incremental approach, and why? 

- If createTopo is slower than the incremental builder, can we make them the same code?

Graeme.


More information about the postgis-devel mailing list