[postgis-users] Topology creation performance

Sun Nov 22 23:39:39 PST 2020

>
>
>From: postgis-users <postgis-users-bounces at lists.osgeo.org> on behalf of Alexandre Silva <amsilva at infoportugal.impresa.pt>Sent: Friday, November 20, 2020 5:55 PMTo: postgis-users at lists.osgeo.org <postgis-users at lists.osgeo.org>Subject: [postgis-users] Topology creation performance
>
>Hello,
>
>I'm creating a topology with a large amount of lines (around 165k) and when adding those lines to the topology, the number of iterations per second drops considerably as more lines are added. The lines are added one at a time in a transaction and the topology has 0 tolerance.
>When adding the lines with the toTopogeom method, using a geohash ordering (st_geohash(st_transform(st_pointn(st_exteriorring(st_envelope(geom)), 1), 4326))) it takes around 4h to complete.
> Our use case of the topology depends on a certain order of the added lines (some fixing logic will be added in a later stage) and using that ordering the process was stopped at 78% after a 50h wait (by that time each line was taking about 15s).
>The slower ordering method results in the whole area being added to the topology in a layered style (rivers, roads, rural areas, etc.) and after the first one, there are already some faces with a large area, and the performance starts dropping rapidly. My suspicion is that this faces are the culprit of this slowing down.
>In a first attempt to fix it I tried deleting the faces after each line was added, and it improved a little at the start but by the second half it's not much of a difference.
> In another attempt, I used the AddEdge method, and it processed all the lines in about 15 minutes. Even though this needs the polygonize method to be run afterwards, from what I could discover it seems that every edge is only processed once, instead of multiple times. (In a older post (https://postgis-users.postgis.refractions.narkive.com/Xg3wV8V2/postgis-topology-performance) this approach seems to be the way to go). The major disadvantage of this method is that every line needs to be split beforehand, so the AddEdge doesn't throw an error, but using any other of the existing methods (toTopogeom and TopoGeo_AddLineString) it doesn't seems to be a way to get the performance that I get with AddEdge.
>
>Are my assumptions are correct? And is the AddEdge the way to go or is there another way?
>
>Thanks,
>Alexandre Silva
>

Hi

At NIBIO we have got Postgis Topology to perform quite ok with more than 25 million edges that represents land, water, roads, field types and more. We use topology.TopoGeo_addLinestring.

To get this to work we had to use content based grids (https://github.com/larsop/content_balanced_grid) and work inside each grid until each single cell are done and then start to merge cells together and the end. The process of merging cells is more time consuming related to too each edge, but the number of edges are also limited because we only have to work with edges that cross cell borders.

Using content based grids has advantages like

  *   You can safely work in parallel

  *   It performance good when building up new large topology datasets.

In the case below we have more than 25 million edges that we split up into around 7000 cells and we see the number off cells handled pr hour below are not decreasing, when running 20 threads in parallel. (The number of edges are not equal pr cell, but limited to a max number of polygons pr. cell so the idea is to vary the size of cell to get the workload pr cell more equal)

852

1113

840

563

461

541

583

704

705

Before we started to use content based grids we had the same problems as you describe here, the performance decreased when starting to work with big datasets.

You find the code I used here https://github.com/larsop/resolve-overlap-and-gap if you want more info.

Lars

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20201123/aedfed5c/attachment.html>