[postgis-users] topology.TopoGeo_AddLineString performance depends very on order of input lines in some cases.
Lars Aksel Opsahl
Lars.Opsahl at nibio.no
Wed Mar 4 07:50:28 PST 2020
________________________________
>On Wed, Mar 04, 2020 at 07:04:57AM +0000, Lars Aksel Opsahl wrote:
>
>> When testing adding millions of linestrings by splitting in around 4000 content based cells, one cell was using hours for adding a few thousands lines , but by changing the order and adding the non-closed line strings first I reduced the time to around two minutes from many hours for this cell. When running I started from an empty Topology for each cell.
>
>Reading your numbers I see:
>
> - Non-closed first, big first: 8h40m
> - Big first (no matter closed/non-closed): 0h02m
> - Closed first, big first: 0h02m
>
>Correct me if I'm wrong. I'll include your tests below.
>Note that "false" comes before "true" with order by on
>boolean (ascending, by default).
>
>> select topology.TopoGeo_AddLineString('test_topo_02',geom,0.000001) from test_topology_05
>> order by is_closed, num_points desc;
>> Time: 31234035.504 ms (08:40:34.036)
>
>The above adds non-closed first, and each set (non-closed, then
>closed) it will add the lines with the most points first.
>
>> select topology.TopoGeo_AddLineString('test_topo_01',geom,0.000001) from test_topology_05
>> order by num_points desc;
>> Time: 164403.033 ms (02:44.403)
>
>The above adds with most dense lines first, no matter closed/unclosed.
>
>> select topology.TopoGeo_AddLineString('test_topo_03',geom,0.000001) from test_topology_05
>> order by is_closed desc, num_points desc;
>> Time: 131125.236 ms (02:11.125)
>
>The above adds with closed first, and each set (closed, then
>non-closed) it will add the lines with most points first.
>
>So, I'd derive from the above that "big first" makes a difference, no
>matter closed/non-closed (although I'd expect the contrary, to be
>honest).
>
>> But another strange thing was that time could vary a lot for the same
>> dataset and order. In this test I did not care about closed or not but
>> only line length.
>
>Yes, I'd expect this.
>
>> select topology.TopoGeo_AddLineString('test_topo_01',geom,0.000001) from test_topology_02 order by num_points desc;
>> Time: 57915.860 ms (00:57.916)
>>
>> select topology.TopoGeo_AddLineString('test_topo_02',geom,0.000001) from test_topology_02 order by num_points asc;
>> Time: 42424.087 ms (00:42.424)
>>
>> select topology.TopoGeo_AddLineString('test_topo_03',geom,0.000001) from test_topology_02 order by id asc;
>> Time: 738242.464 ms (12:18.242)
>> Time: 1197808.563 ms (19:57.809)
>> Time: 1003501.683 ms (16:43.502)
>>
>> The file is here https://github.com/larsop/resolve-overlap-and-gap/blob/add_postgis_topology_TopoGeo_addLinestring_thred_grid/src/test/sql/regress/test_topology_02.dump.gz .
>
>The cost of each addition is the cost of:
>
> 1. Finding all edges and nodes intersecting the new line
> 2. Noding the new line with existing nodes/edges, split accordingly
> 3. For each resulting split-segment:
> 3.1. Finding all edges and nodes intersecting the segment
> (to bail out if found an intersecting one)
> 3.2. Finding edge rings on both sides of the edge
> (to determine if a new face is created)
>
>Roughly, the above is the algorithm. Can probably be improved in
>some steps, but you can see how the order in which things are
>added matters. In general, reducing (by splitting) the length
>of lines and (by adding connecting edges) the area of
>faces should speed things up. Assuming indexes are correctly used
>(is autovacuum on?).
>
Hi again
- Yes we have autovacum on.
- I made some very simple code where I tried group edges together but that did not seem to help to much on performance, but that may be related to that I am not making clean directed graph.
DO $$
DECLARE
tables_non_closed CURSOR FOR
select id,geom,num_points from test_topology_02 where ST_Isclosed(geom) = false order by num_points desc;
working_geo geometry;
BEGIN
drop table if exists test_tt;
create table test_tt as (select geom,num_points from test_topology_02 where ST_Isclosed(geom) = false order by num_points desc);
drop table if exists test_tt4;
create table test_tt4 as select * from test_tt limit 0;
alter table test_tt4 add column id serial;
INSERT INTO test_tt4(geom) select geom from test_topology_02 where ST_Isclosed(geom) = true order by num_points desc;
working_geo = null;
FOR table_record IN tables_non_closed LOOP
working_geo = table_record.geom;
LOOP
Select ST_Union(geom) from test_tt where ST_Intersects(geom,working_geo) into working_geo;
EXIT WHEN working_geo is null;
INSERT into test_tt4 select * from test_tt where ST_Intersects(geom,working_geo) order by num_points;
DELETE from test_tt where ST_Intersects(geom,working_geo);
END LOOP ;
END LOOP;
END$$;
select topology.CreateTopology ('test_topo_04', 4258,0.000001);
select topology.TopoGeo_AddLineString('test_topo_04',geom,0.000001) from test_tt4 order by id;
Time: 138741.013 ms (02:18.741)
- The randomness for the three same calls with same order I did not understand. The time vary from almost 20 minutes to 12 minutes.
>> select topology.TopoGeo_AddLineString('test_topo_03',geom,0.000001) from test_topology_02 order by id asc;
>> Time: 738242.464 ms (12:18.242)
>> Time: 1197808.563 ms (19:57.809)
>> Time: 1003501.683 ms (16:43.502)
Thanks a lot.
Lars
>--strk;
>_______________________________________________
>postgis-users mailing list
>postgis-users at lists.osgeo.org
>https://lists.osgeo.org/mailman/listinfo/postgis-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20200304/0de9c539/attachment.html>
More information about the postgis-users
mailing list