[postgis-devel] postgis-devel Digest, Vol 130, Issue 5

Tue Jan 7 08:39:56 PST 2014

Hey,
whatever you do,
if you call 9 millions times a plpgsql function (with subsequent multiple
other function calls, and lots of exception handling) I'm afraid it will be
"slow" !

I don"t know precisely the operation done when inserting a polygon into the
topology, but theoretically, we need at least to look if it doesn't
intersect other topology (meaning for each insertion, you have to check all
other already existing topology, theoretically ln(n)) if using an index))

That would be n*ln(n) at best.

Now maybe there is a problem in the index uses? I'm not a pro of indexes,
Maybe you could try to insert your polygon K by K, refreshing the stats
between each insertion?
(like insert 10k polygons, ANALYZE, insert 10k polygons, ...).

Another easy test would be to insert not at the topogeom level but at the
topology level, maybe with function addface ?
This would help to know where the slowness is coming from

Cheers,
Rémi-C

2014/1/7 Graeme B. Bell <grb at skogoglandskap.no>

>
> Hi Remi
>
> Thanks very much for your reply.
>
> > hey,
> > if you have already topological data,
> > there is no point in using the normal function :
>
> Consider the polygonised raster. It is naturally topological (pixels of a
> raster don't overlap), but it has never been a topology in the postgis or
> oracle sense. In producing a simple feature dataset there's a possiblility
> that the polygonisation function has produced some very slight overlaps or
> gaps. Generally when the data has passed through simple feature geometry
> there is always the possibility that we have now got slight overlaps due to
> inaccuracies in number representation or subtleties of algorithm behaviour.
>
> The mention of topological properties was simply to highlight that this
> was not a case of perversely difficult data screwing up the postgis
> topology constructor, rather it's one of the easiest situations that the
> constructor can encounter, with few or zero overlaps between the polygons
> being imported.
>
> > These functions performs a lot of operations to ensure that when you
> insert
> > a polygon, you won't intersect with another one, etc etc.
>
> I think that is always a concern when adding simple feature geometry, due
> to uncertainties surrounding algorithm behaviour and number representation,
> especially when using such a large dataset which can't be manually
> inspected.
>
> > To gain speed , you could either cut your data into several different
> > topology schema (partitionning),
>
>  It's a good idea which we have thought about too. Unfortunately we need
> to have a single topology here. "Gaining speed" against this algorithm
> would mean slicing the map into hundreds of pieces, not merely several.
> Also, splitting the topology up into lots of little fragments adds
> complexity to every application that uses the topology. A significant
> reason for using a topology is to have the map in one tidy, aggregated
> piece.
>
> > or put topology data the batch way (not one by one) , if possible
> ignoring
> > all the framework used for normal data but useless for already
> topological
> > data (possibly performing the necessary operation the set-way).
>
> I agree that it might be possible to make assumptions about the data and
> write our own constructors that remove safety checks. However we do not
> want to ignore the framework. We would want to use this on a range of maps
> and so we assume 'normal data' will be used.
>
> Also, removing safety checks may not help. I am not also certain that the
> slowness is from the geometry set constructor rather than the topology data
> type itself. The slowness when polygons are added incrementally makes me
> wonder.
>
> > A simple thing you could try is not adding polygon but edges to populate
> > your topological model,
> > then when you have imported all edges,
> > create a topogeometry per polygon and link it to your polygon data.
> > This would be done by breaking all your polygon into lines then edges
> (with
> > some creative use of split), removing the duplicate,
> > then populating the postgis topology
> > tables<http://trac.osgeo.org/postgis/wiki/PostgisTopology_Data_Model
> >manually.
> >
> > The keys would be never use a one object per one object function , but
> > instead sql querries on whole table.
>
> That's a good idea. Thanks. However... see my comment at the end.
>
> > Similarly it would be way faster and easer to directly import oracle
> > topology (i mean by pure data base manipulation, like table
> import/export).
> >> From what I see here
> > <
> http://docs.oracle.com/cd/B19306_01/appdev.102/b14256/sdo_topo_concepts.htm
> >,
> > oracle topology and postgis topology data model seems very similar,
> > the conversion oracle-> postgis looking like just a copy of proper tables
> > and columns, the conversion postgis -> oracle looking a bit more
> difficult.
>
> Yes. It's probably possible to do that but it seems there is no out of the
> box solution.
>
> Regardless, we still need a topology constructor solution that works
> generally with all the datasets, not only the ones currently in oracle
> topology, and not just trivial cases.
>
> A concern at the back of my mind is that if the standard topology
> constructor (or datatype) is using O(n^2) algorithms, are we likely to
> encounter further similar problems in other functions (perhaps due to
> similarly written functions elsewhere or due to problems with the
> underlying datatype)? i.e. is this constructor performance problem the tip
> of an iceberg?
>
> Does anyone know if there have been any tests/measurements of postgis
> topology function performance as datasets get larger?
>
> Graeme.
>
>
>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20140107/e859b48d/attachment.html>