[postgis-devel] postgis-devel Digest, Vol 130, Issue 5

Tue Jan 7 07:42:57 PST 2014

Hi Remi

Thanks very much for your reply.

> hey,
> if you have already topological data,
> there is no point in using the normal function :

Consider the polygonised raster. It is naturally topological (pixels of a raster don't overlap), but it has never been a topology in the postgis or oracle sense. In producing a simple feature dataset there's a possiblility that the polygonisation function has produced some very slight overlaps or gaps. Generally when the data has passed through simple feature geometry there is always the possibility that we have now got slight overlaps due to inaccuracies in number representation or subtleties of algorithm behaviour.

The mention of topological properties was simply to highlight that this was not a case of perversely difficult data screwing up the postgis topology constructor, rather it's one of the easiest situations that the constructor can encounter, with few or zero overlaps between the polygons being imported. 

> These functions performs a lot of operations to ensure that when you insert
> a polygon, you won't intersect with another one, etc etc.

I think that is always a concern when adding simple feature geometry, due to uncertainties surrounding algorithm behaviour and number representation, especially when using such a large dataset which can't be manually inspected. 

> To gain speed , you could either cut your data into several different
> topology schema (partitionning),

It's a good idea which we have thought about too. Unfortunately we need to have a single topology here. "Gaining speed" against this algorithm would mean slicing the map into hundreds of pieces, not merely several. Also, splitting the topology up into lots of little fragments adds complexity to every application that uses the topology. A significant reason for using a topology is to have the map in one tidy, aggregated piece. 

> or put topology data the batch way (not one by one) , if possible ignoring
> all the framework used for normal data but useless for already topological
> data (possibly performing the necessary operation the set-way).

I agree that it might be possible to make assumptions about the data and write our own constructors that remove safety checks. However we do not want to ignore the framework. We would want to use this on a range of maps and so we assume 'normal data' will be used. 

Also, removing safety checks may not help. I am not also certain that the slowness is from the geometry set constructor rather than the topology data type itself. The slowness when polygons are added incrementally makes me wonder. 

> A simple thing you could try is not adding polygon but edges to populate
> your topological model,
> then when you have imported all edges,
> create a topogeometry per polygon and link it to your polygon data.
> This would be done by breaking all your polygon into lines then edges (with
> some creative use of split), removing the duplicate,
> then populating the postgis topology
> tables<http://trac.osgeo.org/postgis/wiki/PostgisTopology_Data_Model>manually.
> 
> The keys would be never use a one object per one object function , but
> instead sql querries on whole table.

That's a good idea. Thanks. However... see my comment at the end. 

> Similarly it would be way faster and easer to directly import oracle
> topology (i mean by pure data base manipulation, like table import/export).
>> From what I see here
> <http://docs.oracle.com/cd/B19306_01/appdev.102/b14256/sdo_topo_concepts.htm>,
> oracle topology and postgis topology data model seems very similar,
> the conversion oracle-> postgis looking like just a copy of proper tables
> and columns, the conversion postgis -> oracle looking a bit more difficult.

Yes. It's probably possible to do that but it seems there is no out of the box solution. 

Regardless, we still need a topology constructor solution that works generally with all the datasets, not only the ones currently in oracle topology, and not just trivial cases. 

A concern at the back of my mind is that if the standard topology constructor (or datatype) is using O(n^2) algorithms, are we likely to encounter further similar problems in other functions (perhaps due to similarly written functions elsewhere or due to problems with the underlying datatype)? i.e. is this constructor performance problem the tip of an iceberg?

Does anyone know if there have been any tests/measurements of postgis topology function performance as datasets get larger?

Graeme.