[postgis-users] Postgis Topology and slow topology.TopoGeo_addLinestring on big datasets

Lars Aksel Opsahl Lars.Opsahl at nibio.no
Wed Feb 26 01:59:52 PST 2020


Hi

I have question about Postgis Topology and what seems like scaling problem when converting simple feature datasets to Postgis Topology. The code I am testing on is here.

https://github.com/larsop/resolve-overlap-and-gap/tree/add_postgis_topology_TopoGeo_addLinestring_thred_grid

When working when testing a small dataset with 619230 surface (25909671 total line points) it takes around 20 minutes,  but if go to a dataset that is ten times bigger I get a scaling problem.

Basically what I do is that break input datasets into smaller cells by using  a content based grid with around 4000 cells. Then I work inside each cell where I make sure that no lines are close to the cell border. This seem to scale very good now after resolved the https://www.postgresql-archive.org/SubtransControlLock-and-performance-problems-td6123349.html for each single cell.

The scaling problem starts when we start to connect each cell with it's neighbour by adding the missing border lines. I can connect with around one cell pr. second in the start, but after 10 minutes it drops to maybe 10 cell pr. minute and later to less than a cell pr minute or less. The CPU load are running on more than 90% there are now no locks or no IO wait on the server. I have tested with analyse on all the involved tables.

The around 1 meter long border lines below takes from milli seconds and maybe up to minutes  to add.

I have the attached a picture where the line I try to add is the red one and where we we also show existing faces, edge_data and nodes close by.

  *   I have a layer test_topo_ar5 with the following settings.

id | name | srid | precision | hasz

------+---------------+------+-----------+------

8813 | test_topo_ar5 | 4258 | 1e-06 | f


  *   It’s a quite big layer and it contains the following numbers

SELECT count(*) from test_topo_ar5.node; 15922351

SELECT count(*) from test_topo_ar5.edge_data; 21322652

SELECT count(*) from test_topo_ar5.face; 7709431

SELECT count(*) from test_topo_ar5.relation ; 0



  *   In this I add line 47 from a total of 60 lines.

SELECT ST_asText(geo), ST_length(geo,true) from test_topo_ar5.t1 where id = 47;

st_astext | st_length

-------------------------------------------------------------------------------+-------------------

LINESTRING(10.1764246034643 64.1461284736044,10.17644275856 64.1461364736044) | 1.255520004022319

(1 row)

Time: 2.900 ms



  *   By doing this call

SELECT topology.TopoGeo_addLinestring('test_topo_ar5',geo,0.000001) from test_topo_ar5.t1 where id = 47;



  *   When running auto explain I see this in the log.


< postgres, 2020-02-24 11:42:19 CET, resolve_og, 2020-02-24 11:42:19.018 CET >LOG: duration: 13972.041 ms plan:

Query Text: SELECT topology.TopoGeo_addLinestring('test_topo_ar5',geo,0.000001) from test_topo_ar5.t1 where id = 47;

ProjectSet (cost=0.00..7.76 rows=1000 width=4) (actual time=13972.014..13972.027 rows=3 loops=1)

-> Seq Scan on t1 (cost=0.00..2.75 rows=1 width=53) (actual time=0.013..0.020 rows=1 loops=1)

Filter: (id = 47)

Rows Removed by Filter: 59

< postgres, 2020-02-24 11:42:19 CET, resolve_og, 2020-02-24 11:42:19.018 CET >LOG: duration: 14188.079 ms statement: SELECT topology.TopoGeo_addLinestring('test_topo_ar5',geo,0.000001) from test_topo_ar5.t1 where id = 47;

  *   The output from pg_stat_statements is below, The only strange thing I see here is that the query “SELECT edge_id,geom FROM "test_topo_ar5".edge_data WHERE edge_id IN ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,” has length of 599090


SELECT

(total_time / 1000 / 60) as total,

(total_time/calls) as avg,

substring(query,1,100), length(query), calls

FROM pg_stat_statements

ORDER BY 2 DESC

LIMIT 100;

total | avg | substring | length | calls

------------------------+----------------------+------------------------------------------------------------------------------------------------------+--------+-------

0.2253887273 | 13523.323638000002 | SELECT topology.TopoGeo_addLinestring($1,geo,$2) from test_topo_ar5.t1 where id = $3 | 84 | 1

0.00865815 | 519.489 | SELECT edge_id,geom FROM "test_topo_ar5".edge_data WHERE edge_id IN ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10, | 599090 | 1

0.008164354616666668 | 244.93063850000001 | WITH RECURSIVE edgering AS ( SELECT $1 as signed_edge_id, edge_id, next_left_edge, next_right_edge F | 469 | 2

0.00257109955 | 77.1329865 | SELECT +| 133 | 2

| | (total_time / $1 / $2) as total, +| |

| | (total_time/calls) as avg, +| |

| | query +| |

| | FROM pg_stat_stat | |

7.898863333333334e-05 | 4.739318 | SELECT +| 172 | 1

| | (total_time / $1 / $2) as total, +| |

| | (total_time/calls) as avg, +| |

| | substring(query,$3,$4), | |

7.60238e-05 | 4.561428 | SELECT +| 165 | 1

| | (total_time / $1 / $2) as total, +| |

| | (total_time/calls) as avg, +| |

| | substring(query,$3,$4), | |

0.00010785645000000001 | 3.2356935 | SELECT +| 150 | 2

| | (total_time / $1 / $2) as total, +| |

| | (total_time/calls) as avg, +| |

| | substring(query,$3,$4) +| |

| | | |

3.708345e-05 | 2.225007 | SELECT node_id,geom FROM "test_topo_ar5".node WHERE containing_face IN ($1) AND geom && $2::geometry | 100 | 1

3.0895216666666665e-05 | 1.8537130000000002 | select pg_stat_statements_reset() | 33 | 1

1.37987e-05 | 0.827922 | WITH newedges(edge_id,left_face) AS ( VALUES ($1,$2),($3,$4),($5,$6),($7,$8),($9,$10),($11,$12),($13 | 243 | 1

1.0209283333333334e-05 | 0.612557 | WITH newedges(edge_id,right_face) AS ( VALUES ($1,$2),($3,$4),($5,$6),($7,$8),($9,$10),($11,$12),($1 | 236 | 1

3.15745e-06 | 0.189447 | SELECT edge_id,left_face,right_face,geom FROM "test_topo_ar5".edge_data WHERE ( left_face = ANY($1) | 142 | 1

7.9419e-06 | 0.158838 | SELECT node_id,containing_face,geom FROM "test_topo_ar5".node WHERE geom && $1::geometry | 88 | 3

4.929566666666667e-06 | 0.147887 | INSERT INTO "test_topo_ar5".edge_data (edge_id,start_node,end_node,left_face,right_face,next_left_ed | 215 | 2

2.0770833333333333e-06 | 0.12462500000000001 | SELECT EXISTS ( SELECT $1 FROM "test_topo_ar5".node WHERE ST_Equals(geom, $2::geometry)) | 88 | 1

1.9927333333333334e-06 | 0.11956399999999999 | INSERT INTO "test_topo_ar5".node (node_id,containing_face,geom) VALUES (DEFAULT,$1,$2::geometry) RET | 114 | 1

1.9733666666666666e-06 | 0.118402 | SELECT EXISTS ( SELECT $1 FROM "test_topo_ar5".edge_data WHERE ST_Within($2::geometry, geom)) | 93 | 1

4.6954833333333335e-06 | 0.09390966666666667 | SELECT edge_id,start_node,end_node,left_face,right_face,next_left_edge,next_right_edge,geom FROM "te | 144 | 3

7.43605e-06 | 0.0892326 | SELECT edge_id,geom FROM "test_topo_ar5".edge WHERE geom && $1::geometry | 72 | 5

1.3167833333333333e-06 | 0.079007 | WITH faces AS ( SELECT face_id FROM "test_topo_ar5".face WHERE mbr && $1 ORDER BY ST_Area(mbr) ASC ) | 200 | 1

6.3652666666666664e-06 | 0.06365266666666666 | SELECT node_id,geom FROM "test_topo_ar5".node WHERE ST_DWithin(geom, $1::geometry, $2) | 86 | 6

8.6195e-07 | 0.051717 | SELECT edge_id,geom FROM "test_topo_ar5".edge_data WHERE edge_id IN ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10, | 124 | 1

8.021666666666667e-07 | 0.04813 | UPDATE "test_topo_ar5".node SET containing_face =$1::int WHERE node_id = $2 | 75 | 1

7.810333333333333e-07 | 0.046862 | SELECT edge_id,geom FROM "test_topo_ar5".edge_data WHERE ST_DWithin($1::geometry, geom, $2) | 91 | 1

7.368666666666667e-07 | 0.044212 | INSERT INTO "test_topo_ar5".face (face_id,mbr) VALUES (DEFAULT,ST_Envelope($1::geometry)) RETURNING | 107 | 1

1.1007166666666668e-06 | 0.0330215 | SELECT node_id,containing_face,geom FROM "test_topo_ar5".node WHERE node_id IN ($1,$2) | 86 | 2

5.234333333333332e-07 | 0.031405999999999996 | SELECT id,srid,precision,null::geometry FROM topology.topology WHERE name = $1::varchar | 87 | 1

1.844933333333333e-06 | 0.027673999999999997 | SELECT edge_id,start_node,end_node,left_face,right_face,next_left_edge,next_right_edge,geom FROM "te | 167 | 4

7.955833333333334e-07 | 0.0238675 | UPDATE "test_topo_ar5".edge_data SET next_left_edge= $1, abs_next_left_edge= $2 WHERE edge_id = $3 | 98 | 2

3.5251666666666666e-07 | 0.021151 | UPDATE "test_topo_ar5".edge_data SET next_right_edge= $1, abs_next_right_edge= $2 WHERE edge_id = $3 | 100 | 1

2.2806666666666666e-07 | 0.006842 | SELECT nextval($1) | 18 | 2

1.7872166666666665e-06 | 0.005957388888888889 | SELECT 1 FROM ONLY "test_topo_ar5"."node" x WHERE "node_id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE | 104 | 18

2.7536166666666677e-06 | 0.005329580645161292 | SELECT 1 FROM ONLY "test_topo_ar5"."face" x WHERE "face_id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE | 104 | 31

3.3149999999999996e-08 | 0.001989 | rollback | 8 | 1

2.9683333333333334e-08 | 0.001781 | begin | 5 | 1

(35 rows)


  *   I am now running on POSTGIS="2.5.3 r17699" [EXTENSION] PGSQL="120" GEOS="3.8.0-CAPI-1.13.1 " PROJ="Rel. 6.2.1, November 1st, 2019" GDAL="GDAL 3.0.4, released 2020/01/28" LIBXML="2.9.1" LIBJSON="0.11" LIBPROTOBUF="1.0.2" TOPOLOGY RASTER,
but I have tested PostGIS 3.0 also and the problems seems to be he same.

There is no problem switching back to postgis 3.0, I am using postgis 2.5 on the test server now since we use 2.5 in prod.

I use TopoGeo_AddLineString and not TopoGeo_AddPolygon to avoid to add border lines two times and to make it simple to break up lines  into smaller pieces  when I get Topology errors.

I can switch to use TopoGeo_AddPolygon in some parts if that could solve the problem.


Any idea about what do here ?

Thanks .


Lars



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20200226/acae5414/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ar5_line_47.png
Type: image/png
Size: 29299 bytes
Desc: ar5_line_47.png
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20200226/acae5414/attachment.png>


More information about the postgis-users mailing list