[postgis-devel] Postgis topology creation - O(n-squared)? - creates problems with large datasets.

Rémi Cura remi.cura at gmail.com
Tue Jan 7 07:09:15 PST 2014


hey,
if you have already topological data,
there is no point in using the normal function :
These functions performs a lot of operations to ensure that when you insert
a polygon, you won't intersect with another one, etc etc.

To gain speed , you could either cut your data into several different
topology schema (partitionning),
or put topology data the batch way (not one by one) , if possible ignoring
all the framework used for normal data but useless for already topological
data (possibly performing the necessary operation the set-way).

A simple thing you could try is not adding polygon but edges to populate
your topological model,
then when you have imported all edges,
create a topogeometry per polygon and link it to your polygon data.
This would be done by breaking all your polygon into lines then edges (with
some creative use of split), removing the duplicate,
then populating the postgis topology
tables<http://trac.osgeo.org/postgis/wiki/PostgisTopology_Data_Model>manually.

The keys would be never use a one object per one object function , but
instead sql querries on whole table.


Similarly it would be way faster and easier to directly import oracle
topology (i mean by pure data base manipulation, like table import/export).
>From what I see here
<http://docs.oracle.com/cd/B19306_01/appdev.102/b14256/sdo_topo_concepts.htm>,
oracle topology and postgis topology data model seems very similar,
the conversion oracle-> postgis looking like just a copy of proper tables
and columns, the conversion postgis -> oracle looking a bit more difficult.

Cheers,

Rémi-C





2014/1/7 Graeme B. Bell <grb at skogoglandskap.no>

> Hi everyone.
>
> I tested postgis topology (2.1.0 r11822) by creating a topology from some
> national geometry datasets with respectively 1.6 million and 7.8 million
> polygons.
>
> One source geometry dataset was made by a transformation from an oracle
> topology into postgis geometry, the other is a polygonised raster (a
> natural topology).
>
> I selected out a fraction of the polygons randomly and created a topology
> in two ways, first using createtopogeo, and then manually using
> topogeo_addpolygon.
>
> The data has spatial indices but I think these possibly aren't being used
> because of the need for a geometry collection in createtopogeo.
>
> The results looked like this:
>
> 1.7 million polygon dataset :
>
> 1/512th of the data: 24 seconds
> 1/256th of the data: 76 seconds
> 1/128th of the data: 214 seconds
> 1/64th of the data: 707 seconds
> 1/32nd of the data: 2430 seconds
>
>
> 7.8 million polygon dataset:
>
> 1/512th of the data: 509 seconds
> 1/256th of the data: 1905 seconds
> 1/128th of the data:  6944 seconds
>
>
> Manually using topology's addpolygon produced CPU costs 50-100% higher
> than the createtopogeo function and growing at a similar rate. I did not
> complete testing with it.
>
> In both cases, the cost of creating the topology grows by 3-4x as the size
> of the source geometry set doubles. As the data becomes less sparse (e.g.
> 1/512th of a national dataset is pretty sparse) the trend seems to be
> towards 4x more CPU time for 2x extra data.
>
> We would like to use postgis topology but judging from the growth in
> costs, creating topologies would take e.g. years on the larger dataset and
> a month on the small dataset. These are not our largest geometry datasets.
>
> Does anyone have any ideas or suggestions about how we could proceed from
> here? Unfortunately I cannot share the datasets for testing purposes.
>
> Graeme.
>
>
>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20140107/eb694dbd/attachment.html>


More information about the postgis-devel mailing list