[postgis-users] Topology building performance (was: Topology: cannot delete slivers)

Thu Nov 20 04:48:39 PST 2014

Sadly I think it can't be done with pure plpgsql,
because every function is wrapped in a transaction no matter what.
You can only do it using the trick to connect to the same database from
within with the extension "dblink"
But I find difficult to understand how transactions and sub transactions
affects performance.

Morevover , the transaction think is not the only problem. It is more a
design problem.
Even with CGAL, building topology one by one or with the batch mode changes
radically the time of building ( n to n^2 at least).

That's why I truly think perf is going to come from a batch mode. Tweaking
the current process is just damage control in my opinion.
This is not so hard do to if we rely a bit on GEOS.

1. cut the input geom into a space partition (for line, ST_Node, for poly
ST_Polygonize)
2. populate node table, and create a temp table with list of line for each
node
3. Populate edge_data
4. fill next / left for edge_data
5. compute area (Polygonize, Geos? )
6. Map the input geom to generated topology (to be able to use attributes)

I already tested 1,2,3,6.
It can be fast (not to that building full topo in geos and converting it to
postgis_topology I'm afraid), and it will scale very well.

Cheers,
Rémi-C

2014-11-20 12:24 GMT+01:00 Sandro Santilli <strk at keybit.net>:

> On Wed, Nov 19, 2014 at 04:47:48PM +0100, Sandro Santilli wrote:
> > On Wed, Nov 19, 2014 at 12:50:09PM +0100, Rémi Cura wrote:
> > >
> > > Adding one feature is actually quite fast, even on already big
> topology.
> > >
> > > Its when you want to add a lot's that it becomes increasingly slow
> (maybe
> > > because indexes are not updated,or because we are in one transaction?)
> > >
> > > The slowing seems to be very non linear, probably following n^2, where
> n is
> > > the number of feature already constructed in the transaction.
> >
> > An issue with index use was recently fixed.
> > There might be another one hiding somewhere.
>
> On a closer look, I'm thinking the single-transaction is what commonly
> hits during topology building (UPDATE .. SET tg = toTopoGeom ..)
>
> Starting from an empty topology and running a single statement
> invoking toTopoGeom for each of many inputs result in no stats ever
> being visible by the planner within the transaction. In turn this
> is likely to opt for sequencial scans (an empty table is quicker to
> scan sequencially).
>
> This would explain why populating in chunks works better, using
> a transaction for each chunk
> (UPDATE .. SET .. WHERE gid >= N AND gid < N+chunksize)
>
> It could be interesting to try a wrapper function taking care of
> running ANALYZE on the primitive tables every N calls to toTopoGeom
> (or N primitives being created, regardless of number of simple inputs).
>
> --strk;
>
>  ()  ASCII ribbon campaign  --  Keep it simple !
>  /\  http://strk.keybit.net/rants/ascii_mails.txt
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20141120/12b23780/attachment.html>