[postgis-users] Postgis Topology and slow topology.TopoGeo_addLinestring on big datasets
Lars Aksel Opsahl
Lars.Opsahl at nibio.no
Wed Feb 26 01:59:52 PST 2020
Hi
I have question about Postgis Topology and what seems like scaling problem when converting simple feature datasets to Postgis Topology. The code I am testing on is here.
https://github.com/larsop/resolve-overlap-and-gap/tree/add_postgis_topology_TopoGeo_addLinestring_thred_grid
When working when testing a small dataset with 619230 surface (25909671 total line points) it takes around 20 minutes, but if go to a dataset that is ten times bigger I get a scaling problem.
Basically what I do is that break input datasets into smaller cells by using a content based grid with around 4000 cells. Then I work inside each cell where I make sure that no lines are close to the cell border. This seem to scale very good now after resolved the https://www.postgresql-archive.org/SubtransControlLock-and-performance-problems-td6123349.html for each single cell.
The scaling problem starts when we start to connect each cell with it's neighbour by adding the missing border lines. I can connect with around one cell pr. second in the start, but after 10 minutes it drops to maybe 10 cell pr. minute and later to less than a cell pr minute or less. The CPU load are running on more than 90% there are now no locks or no IO wait on the server. I have tested with analyse on all the involved tables.
The around 1 meter long border lines below takes from milli seconds and maybe up to minutes to add.
I have the attached a picture where the line I try to add is the red one and where we we also show existing faces, edge_data and nodes close by.
* I have a layer test_topo_ar5 with the following settings.
id | name | srid | precision | hasz
------+---------------+------+-----------+------
8813 | test_topo_ar5 | 4258 | 1e-06 | f
* It’s a quite big layer and it contains the following numbers
SELECT count(*) from test_topo_ar5.node; 15922351
SELECT count(*) from test_topo_ar5.edge_data; 21322652
SELECT count(*) from test_topo_ar5.face; 7709431
SELECT count(*) from test_topo_ar5.relation ; 0
* In this I add line 47 from a total of 60 lines.
SELECT ST_asText(geo), ST_length(geo,true) from test_topo_ar5.t1 where id = 47;
st_astext | st_length
-------------------------------------------------------------------------------+-------------------
LINESTRING(10.1764246034643 64.1461284736044,10.17644275856 64.1461364736044) | 1.255520004022319
(1 row)
Time: 2.900 ms
* By doing this call
SELECT topology.TopoGeo_addLinestring('test_topo_ar5',geo,0.000001) from test_topo_ar5.t1 where id = 47;
* When running auto explain I see this in the log.
< postgres, 2020-02-24 11:42:19 CET, resolve_og, 2020-02-24 11:42:19.018 CET >LOG: duration: 13972.041 ms plan:
Query Text: SELECT topology.TopoGeo_addLinestring('test_topo_ar5',geo,0.000001) from test_topo_ar5.t1 where id = 47;
ProjectSet (cost=0.00..7.76 rows=1000 width=4) (actual time=13972.014..13972.027 rows=3 loops=1)
-> Seq Scan on t1 (cost=0.00..2.75 rows=1 width=53) (actual time=0.013..0.020 rows=1 loops=1)
Filter: (id = 47)
Rows Removed by Filter: 59
< postgres, 2020-02-24 11:42:19 CET, resolve_og, 2020-02-24 11:42:19.018 CET >LOG: duration: 14188.079 ms statement: SELECT topology.TopoGeo_addLinestring('test_topo_ar5',geo,0.000001) from test_topo_ar5.t1 where id = 47;
* The output from pg_stat_statements is below, The only strange thing I see here is that the query “SELECT edge_id,geom FROM "test_topo_ar5".edge_data WHERE edge_id IN ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,” has length of 599090
SELECT
(total_time / 1000 / 60) as total,
(total_time/calls) as avg,
substring(query,1,100), length(query), calls
FROM pg_stat_statements
ORDER BY 2 DESC
LIMIT 100;
total | avg | substring | length | calls
------------------------+----------------------+------------------------------------------------------------------------------------------------------+--------+-------
0.2253887273 | 13523.323638000002 | SELECT topology.TopoGeo_addLinestring($1,geo,$2) from test_topo_ar5.t1 where id = $3 | 84 | 1
0.00865815 | 519.489 | SELECT edge_id,geom FROM "test_topo_ar5".edge_data WHERE edge_id IN ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10, | 599090 | 1
0.008164354616666668 | 244.93063850000001 | WITH RECURSIVE edgering AS ( SELECT $1 as signed_edge_id, edge_id, next_left_edge, next_right_edge F | 469 | 2
0.00257109955 | 77.1329865 | SELECT +| 133 | 2
| | (total_time / $1 / $2) as total, +| |
| | (total_time/calls) as avg, +| |
| | query +| |
| | FROM pg_stat_stat | |
7.898863333333334e-05 | 4.739318 | SELECT +| 172 | 1
| | (total_time / $1 / $2) as total, +| |
| | (total_time/calls) as avg, +| |
| | substring(query,$3,$4), | |
7.60238e-05 | 4.561428 | SELECT +| 165 | 1
| | (total_time / $1 / $2) as total, +| |
| | (total_time/calls) as avg, +| |
| | substring(query,$3,$4), | |
0.00010785645000000001 | 3.2356935 | SELECT +| 150 | 2
| | (total_time / $1 / $2) as total, +| |
| | (total_time/calls) as avg, +| |
| | substring(query,$3,$4) +| |
| | | |
3.708345e-05 | 2.225007 | SELECT node_id,geom FROM "test_topo_ar5".node WHERE containing_face IN ($1) AND geom && $2::geometry | 100 | 1
3.0895216666666665e-05 | 1.8537130000000002 | select pg_stat_statements_reset() | 33 | 1
1.37987e-05 | 0.827922 | WITH newedges(edge_id,left_face) AS ( VALUES ($1,$2),($3,$4),($5,$6),($7,$8),($9,$10),($11,$12),($13 | 243 | 1
1.0209283333333334e-05 | 0.612557 | WITH newedges(edge_id,right_face) AS ( VALUES ($1,$2),($3,$4),($5,$6),($7,$8),($9,$10),($11,$12),($1 | 236 | 1
3.15745e-06 | 0.189447 | SELECT edge_id,left_face,right_face,geom FROM "test_topo_ar5".edge_data WHERE ( left_face = ANY($1) | 142 | 1
7.9419e-06 | 0.158838 | SELECT node_id,containing_face,geom FROM "test_topo_ar5".node WHERE geom && $1::geometry | 88 | 3
4.929566666666667e-06 | 0.147887 | INSERT INTO "test_topo_ar5".edge_data (edge_id,start_node,end_node,left_face,right_face,next_left_ed | 215 | 2
2.0770833333333333e-06 | 0.12462500000000001 | SELECT EXISTS ( SELECT $1 FROM "test_topo_ar5".node WHERE ST_Equals(geom, $2::geometry)) | 88 | 1
1.9927333333333334e-06 | 0.11956399999999999 | INSERT INTO "test_topo_ar5".node (node_id,containing_face,geom) VALUES (DEFAULT,$1,$2::geometry) RET | 114 | 1
1.9733666666666666e-06 | 0.118402 | SELECT EXISTS ( SELECT $1 FROM "test_topo_ar5".edge_data WHERE ST_Within($2::geometry, geom)) | 93 | 1
4.6954833333333335e-06 | 0.09390966666666667 | SELECT edge_id,start_node,end_node,left_face,right_face,next_left_edge,next_right_edge,geom FROM "te | 144 | 3
7.43605e-06 | 0.0892326 | SELECT edge_id,geom FROM "test_topo_ar5".edge WHERE geom && $1::geometry | 72 | 5
1.3167833333333333e-06 | 0.079007 | WITH faces AS ( SELECT face_id FROM "test_topo_ar5".face WHERE mbr && $1 ORDER BY ST_Area(mbr) ASC ) | 200 | 1
6.3652666666666664e-06 | 0.06365266666666666 | SELECT node_id,geom FROM "test_topo_ar5".node WHERE ST_DWithin(geom, $1::geometry, $2) | 86 | 6
8.6195e-07 | 0.051717 | SELECT edge_id,geom FROM "test_topo_ar5".edge_data WHERE edge_id IN ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10, | 124 | 1
8.021666666666667e-07 | 0.04813 | UPDATE "test_topo_ar5".node SET containing_face =$1::int WHERE node_id = $2 | 75 | 1
7.810333333333333e-07 | 0.046862 | SELECT edge_id,geom FROM "test_topo_ar5".edge_data WHERE ST_DWithin($1::geometry, geom, $2) | 91 | 1
7.368666666666667e-07 | 0.044212 | INSERT INTO "test_topo_ar5".face (face_id,mbr) VALUES (DEFAULT,ST_Envelope($1::geometry)) RETURNING | 107 | 1
1.1007166666666668e-06 | 0.0330215 | SELECT node_id,containing_face,geom FROM "test_topo_ar5".node WHERE node_id IN ($1,$2) | 86 | 2
5.234333333333332e-07 | 0.031405999999999996 | SELECT id,srid,precision,null::geometry FROM topology.topology WHERE name = $1::varchar | 87 | 1
1.844933333333333e-06 | 0.027673999999999997 | SELECT edge_id,start_node,end_node,left_face,right_face,next_left_edge,next_right_edge,geom FROM "te | 167 | 4
7.955833333333334e-07 | 0.0238675 | UPDATE "test_topo_ar5".edge_data SET next_left_edge= $1, abs_next_left_edge= $2 WHERE edge_id = $3 | 98 | 2
3.5251666666666666e-07 | 0.021151 | UPDATE "test_topo_ar5".edge_data SET next_right_edge= $1, abs_next_right_edge= $2 WHERE edge_id = $3 | 100 | 1
2.2806666666666666e-07 | 0.006842 | SELECT nextval($1) | 18 | 2
1.7872166666666665e-06 | 0.005957388888888889 | SELECT 1 FROM ONLY "test_topo_ar5"."node" x WHERE "node_id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE | 104 | 18
2.7536166666666677e-06 | 0.005329580645161292 | SELECT 1 FROM ONLY "test_topo_ar5"."face" x WHERE "face_id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE | 104 | 31
3.3149999999999996e-08 | 0.001989 | rollback | 8 | 1
2.9683333333333334e-08 | 0.001781 | begin | 5 | 1
(35 rows)
* I am now running on POSTGIS="2.5.3 r17699" [EXTENSION] PGSQL="120" GEOS="3.8.0-CAPI-1.13.1 " PROJ="Rel. 6.2.1, November 1st, 2019" GDAL="GDAL 3.0.4, released 2020/01/28" LIBXML="2.9.1" LIBJSON="0.11" LIBPROTOBUF="1.0.2" TOPOLOGY RASTER,
but I have tested PostGIS 3.0 also and the problems seems to be he same.
There is no problem switching back to postgis 3.0, I am using postgis 2.5 on the test server now since we use 2.5 in prod.
I use TopoGeo_AddLineString and not TopoGeo_AddPolygon to avoid to add border lines two times and to make it simple to break up lines into smaller pieces when I get Topology errors.
I can switch to use TopoGeo_AddPolygon in some parts if that could solve the problem.
Any idea about what do here ?
Thanks .
Lars
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20200226/acae5414/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ar5_line_47.png
Type: image/png
Size: 29299 bytes
Desc: ar5_line_47.png
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20200226/acae5414/attachment.png>
More information about the postgis-users
mailing list