[postgis-devel] Postgis topology creation - O(n-squared)? - creates problems with large datasets.

Fri Jan 10 12:40:32 PST 2014

>> 
>> 1 - I am not using a transaction.
> 
> > You are. If you're not explicitly starting one you'll get
> > one for every query you run. 

OK, I'm assuming that since this is a database list the word transaction carries a certain formal weight. 
 e.g.

"In PostgreSQL, a transaction is set up by surrounding the SQL commands of the transaction with BEGIN and COMMIT commands."
http://www.postgresql.org/docs/8.3/static/tutorial-transactions.html

I agree that normal postgres commands carry much of the function of a transaction.

>> 
>> 2 - I am using the standard API constructor call to build a topology from a collection of geometry. If that is the wrong way to build a topology, then I don't know what the correct way is.
> 
> You mean the ST_CreateTopoGeo function ?

It might be helpful to re-read the first post in the list where I specify I am using this function, and that I also tested incrementally adding polygons , which was 50-100% slower. I'm not trying to insult you by saying that, I just mean literally it will be easier to talk about this if we're both on the same page about the conversation so far. 

> That function was the first implementation of a constructor
> and I hadn't given it much love as I've always thought it would
> have been inappropriate anyway, due to the constraints of doing
> the building in a single transaction and while keeping the whole
> input in memory during operations.

In the rest of the postgis/postgres community, bulk loading operations are generally provided as faster alternatives to manual/incremental operations, so this is not intuitive behaviour
(e.g. pg_dump, shp2pgsql). 

If it is expected to be slower than manual incremental adding (which it isn't, in my measurements), that would be useful to have in the documentation somewhere.

I'm not attacking your work here, I do want to help improve it (and the documentation). It's great that you wrote it!

A good constructor would be very,very useful; part of the reason people use postgis is because shp2pgsql and ogr2ogr help you get the data in so easily. If API constructors aren't quick and effective, people will never try the rest of the API. Not a criticism of you, just a general comment on getting people to engage with a project. 

> The function is pretty small, you can easily see how many
> database-accessing function calls are being made, surely
> more than needed. I suggest you not only build with POSTGIS_TOPOLOGY_DEBUG
> defined, but also add more RAISE DEBUG lines to eye-spot which
> step costs more. 

If this is the case then it would be great to add it to the documentation as a warning flag to users and a way of gathering more feedback on it. Your description of the function here is a world apart from the impression I picked up looking at the docs.

> 
> You can install auto_explain module and configure it to show you
> the plans for queries run from the plpgsql functions.

Thanks for the suggestion, this will be very useful for a few other things too. Is there any design spec or info describing the intention/approach taken with the implementation?

> Also make sure you're running latest GEOS version.

One minor version behind, I think. (x.x.-1)

Thanks again and have a nice weekend

Graeme.