[postgis-users] topology.TopoGeo_AddLineString performance depends very on order of input lines in some cases.

Lars Aksel Opsahl Lars.Opsahl at nibio.no
Tue Mar 3 23:04:57 PST 2020


Hi

When testing adding millions of linestrings by splitting in around 4000 content based cells, one cell was using hours for adding a few thousands lines , but by changing the order and adding the non-closed line strings first I reduced the time to around two minutes from many  hours for this cell. When running I started from an empty Topology for each cell.


select topology.CreateTopology ('test_topo_02', 4258,0.000001);

select topology.TopoGeo_AddLineString('test_topo_02',geom,0.000001) from test_topology_05

order by is_closed, num_points desc;

Time: 31234035.504 ms (08:40:34.036)

(The server was doing other work while part of this test was running, but it was enough CPU left for test)


select topology.CreateTopology ('test_topo_01', 4258,0.000001);

select topology.TopoGeo_AddLineString('test_topo_01',geom,0.000001) from test_topology_05

order by num_points desc;

Time: 164403.033 ms (02:44.403)


select topology.CreateTopology ('test_topo_03', 4258,0.000001);

select topology.TopoGeo_AddLineString('test_topo_03',geom,0.000001) from test_topology_05

order by is_closed desc, num_points desc;

Time: 131125.236 ms (02:11.125)

The test_topology_05 is located her https://github.com/larsop/resolve-overlap-and-gap/blob/add_postgis_topology_TopoGeo_addLinestring_thred_grid/src/test/sql/regress/test_topology_05.dump.gz f .

There was also some other cells that used 20 minutes and by changing the order it was reduced to less than a minute. But another strange thing was that time could vary a lot for the same dataset and order. In this test I did not care about closed or not but only line length.


select topology.DropTopology ('test_topo_02');select topology.DropTopology ('test_topo_01');select topology.DropTopology ('test_topo_03');


select topology.CreateTopology ('test_topo_01', 4258,0.000001);

select topology.TopoGeo_AddLineString('test_topo_01',geom,0.000001) from test_topology_02 order by num_points desc;

Time: 57915.860 ms (00:57.916)


select topology.CreateTopology ('test_topo_02', 4258,0.000001);

select topology.TopoGeo_AddLineString('test_topo_02',geom,0.000001) from test_topology_02 order by num_points asc;

Time: 42424.087 ms (00:42.424)


select topology.CreateTopology ('test_topo_03', 4258,0.000001);

select topology.TopoGeo_AddLineString('test_topo_03',geom,0.000001) from test_topology_02 order by id asc;

Time: 738242.464 ms (12:18.242)

Time: 1197808.563 ms (19:57.809)

Time: 1003501.683 ms (16:43.502)

The file is here https://github.com/larsop/resolve-overlap-and-gap/blob/add_postgis_topology_TopoGeo_addLinestring_thred_grid/src/test/sql/regress/test_topology_02.dump.gz .


Running on this setup.


POSTGIS="2.5.3 r17699" [EXTENSION] PGSQL="120" GEOS="3.8.0-CAPI-1.13.1 " PROJ="Rel. 6.2.1, November 1st, 2019" GDAL="GDAL 3.0.4, released 2020/01/28" LIBXML="2.9.1" LIBJSON="0.11" LIBPROTOBUF="1.0.2" TOPOLOGY RASTER


Thanks.

Lars
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20200304/4f000f19/attachment.html>


More information about the postgis-users mailing list