[PostGIS] #5953: Slow st_intersection with big input polygons
PostGIS
trac at osgeo.org
Thu Jul 24 22:53:59 PDT 2025
#5953: Slow st_intersection with big input polygons
-------------------------------+---------------------------
Reporter: Lars Aksel Opsahl | Owner: pramsey
Type: enhancement | Status: new
Priority: medium | Milestone: PostGIS 3.6.0
Component: postgis | Version: master
Keywords: |
-------------------------------+---------------------------
We have case where users can upload polygons and we use those polygons to
run a intersection against another big table. We do not have any control
of what type polygons the user send in.
We are now forced to use st_subdivide to get any OK performance.
Yes we do analyze of input and adjust how to run quary based what input
looks like for instance by use st_subdivide, coveredBy and so on, but I
was that kind of analyzes was done by postgis.
The table klima.plantegning_3035_test contains the single input from the
user. And both table have gist indexes. Just be sure I tested with && but
as expected that does not make any difference.
Here is samples from
{{{
PostgreSQL 16.3 (Ubuntu 16.3-1.pgdg22.04+1) on x86_64-pc-linux-gnu,
compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
POSTGIS="3.6.0dev 3.5.0-314-g637410f77" [EXTENSION] PGSQL="160"
GEOS="3.13.0beta2-CAPI-1.19.0" PROJ="9.3.0 NETWORK_ENABLED=OFF
URL_ENDPOINT=https://cdn.proj.org USER_WRITABLE_DIRECTORY=/tmp/proj
DATABASE_PATH=/usr/local/share/proj/proj.db" (compiled against PROJ 9.3.0)
LIBXML="2.9.13" LIBJSON="0.15" LIBPROTOBUF="1.3.3" WAGYU="0.5.0
(Internal)" TOPOLOGY
}}}
With subdivide takes less than half a minute
{{{
explain analyze select sum(ST_area(st_intersection(gk.geo, pt.geo)))
FROM
(select st_subdivide(geo,300) as geo from klima.plantegning_3035_test) pt,
klima.grunnkart_utslippsfaktor_generalisert AS gk
WHERE ST_Intersects(gk.geo, pt.geo)
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=721271258.09..721271258.10 rows=1 width=8) (actual
time=16973.986..16973.988 rows=1 loops=1)
-> Nested Loop (cost=0.42..68093212.52 rows=51726632 width=778)
(actual time=990.824..2697.340 rows=461431 loops=1)
-> ProjectSet (cost=0.00..18.02
rows=1000 width=32) (actual time=990.647..993.890 rows=1208 loops=1)
-> Seq Scan on
plantegning_3035_test (cost=0.00..0.51 rows=1 width=2379616) (actual
time=0.009..0.012 rows=1 loops=1)
-> Index Scan using
geoidx_klima_grunnkart_utslippsfaktor_generalisert on
grunnkart_utslippsfaktor_generalisert gk (cost=0.42..68041.46
rows=5173 width=746) (actual time=0.061..1.376 rows=382 loops=1208)
Index Cond: (geo
&& (st_subdivide(plantegning_3035_test.geo, 300, '-1'::double precision)))
Filter:
st_intersects(geo, (st_subdivide(plantegning_3035_test.geo, 300,
'-1'::double precision)))
Rows Removed by
Filter: 59
Planning Time: 0.836 ms
JIT:
Functions: 13
Options: Inlining true, Optimization true, Expressions true,
Deforming true
Timing: Generation 1.142 ms, Inlining 16.787 ms, Optimization
81.470 ms, Emission 49.597 ms, Total 148.995 ms
Execution Time: 16975.206 ms
(14 rows)
}}}
With out subdivide it takes hours
{{{
explain analyze select sum(ST_area(st_intersection(gk.geo, pt.geo)))
FROM
klima.plantegning_3035_test AS pt,
klima.grunnkart_utslippsfaktor_generalisert AS gk
WHERE ST_Intersects(gk.geo, pt.geo);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=552890.66..552890.67 rows=1 width=8) (actual
time=5251380.919..5251380.921 rows=1 loops=1)
-> Nested Loop (cost=0.42..70027.69 rows=38239 width=2380362)
(actual time=177.699..6158.529 rows=442738 loops=1)
-> Seq Scan on plantegning_3035_test pt
(cost=0.00..0.51 rows=1 width=2379616) (actual time=0.010..0.012 rows=1
loops=1)
-> Index Scan using
geoidx_klima_grunnkart_utslippsfaktor_generalisert on
grunnkart_utslippsfaktor_generalisert gk (cost=0.42..69975.45
rows=5173 width=746) (actual time=29.570..5759.045 rows=442738 loops=1)
Index Cond: (geo
&& pt.geo)
Filter:
st_intersects(geo, pt.geo)
Rows Removed by
Filter: 171189
Planning Time: 1.665 ms
JIT:
Functions: 10
Options: Inlining true, Optimization true, Expressions true,
Deforming true
Timing: Generation 1.303 ms, Inlining 18.157 ms, Optimization
78.832 ms, Emission 48.786 ms, Total 147.079 ms
Execution Time: 5251384.010 ms
(13 rows)
}}}
There is quit a bit holes in the input polygon and if I remove those it
takes more than a day.
{{{
explain analyze select sum(ST_area(st_intersection(gk.geo, pt.geo)))
FROM
( select ST_BuildArea(ST_Boundary(geo)) as geo from
klima.plantegning_3035_test) pt,
klima.grunnkart_utslippsfaktor_generalisert AS gk
WHERE gk.geo && pt.geo AND ST_Intersects(gk.geo, pt.geo)
sl-# ;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1393.97..1393.98 rows=1 width=8) (actual
time=109750375.276..109750375.573 rows=1 loops=1)
-> Nested Loop (cost=26.67..54.84 rows=52 width=2380362) (actual
time=65777.772..60678842.877 rows=442738 loops=1)
-> Seq Scan on plantegning_3035_test
(cost=0.00..0.51 rows=1 width=2379616) (actual time=0.004..0.310 rows=1
loops=1)
-> Index Scan using
geoidx_klima_grunnkart_utslippsfaktor_generalisert on
grunnkart_utslippsfaktor_generalisert gk (cost=26.67..54.32 rows=1
width=746) (actual time=65583.985..60677581.594 rows=442738 loops=1)
Index Cond: ((geo
&& st_buildarea(st_boundary(plantegning_3035_test.geo))) AND (geo &&
st_buildarea(st_boundary(plantegning_3035_test.geo))))
Filter:
st_intersects(geo, st_buildarea(st_boundary(plantegning_3035_test.geo)))
Rows Removed by
Filter: 171189
Planning Time: 0.175 ms
Execution Time: 109750390.145 ms
(9 rows)
Time: 109750395.411 ms (1 d 06:29:10.395)
}}}
If I just do a count thats fast.
{{{
select sum(ST_area(gk.geo)) area, count(geo) num_geo,
sum(ST_Npoints(gk.geo)) points, ST_Envelope(ST_Collect(gk.geo)) FROM
(
SELECT distinct gk.geo
FROM
(select st_subdivide(geo,300) as geo from
klima.plantegning_3035_test) pt,
klima.grunnkart_utslippsfaktor_generalisert AS gk
WHERE ST_Intersects(gk.geo, pt.geo)
) gk;
area | num_geo | points |
st_envelope
--------------------+---------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1015720043.1571457 | 442738 | 13524258 |
0103000020DB0B0000010000000500000083BE84064172504139DA0264B91D514183BE840641725041E48311773E445141FEB28C0701A55041E48311773E445141FEB28C0701A5504139DA0264B91D514183BE84064172504139DA0264B91D5141
(1 row)
Time: 6156.295 ms (00:06.156)
}}}
But faster with out subdivide
{{{
select sum(ST_area(gk.geo)) area, count(geo) num_geo,
sum(ST_Npoints(gk.geo)) points, ST_Envelope(ST_Collect(gk.geo)) FROM
(
SELECT distinct gk.geo
FROM
klima.plantegning_3035_test AS pt,
klima.grunnkart_utslippsfaktor_generalisert AS gk
WHERE ST_Intersects(gk.geo, pt.geo)
) gk
;
area | num_geo | points |
st_envelope
--------------------+---------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1015720043.1571265 | 442738 | 13524258 |
0103000020DB0B0000010000000500000083BE84064172504139DA0264B91D514183BE840641725041E48311773E445141FEB28C0701A55041E48311773E445141FEB28C0701A5504139DA0264B91D514183BE84064172504139DA0264B91D5141
(1 row)
Time: 3443.553 ms (00:03.444)
}}}
I have started to test the same case on and we seems ti have the same
problems.
{{{
PostgreSQL 16.4 (Ubuntu 16.4-1.pgdg22.04+1) on x86_64-pc-linux-gnu,
compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
POSTGIS="3.6.0beta1 3.6.0beta1-4-g1458a794d" [EXTENSION] PGSQL="160"
GEOS="3.13.1-CAPI-1.19.2" PROJ="9.3.0 NETWORK_ENABLED=OFF
URL_ENDPOINT=https://cdn.proj.org USER_WRITABLE_DIRECTORY=/tmp/proj
DATABASE_PATH=/usr/local/share/proj/proj.db" (compiled against PROJ 9.3.0)
LIBXML="2.9.13" LIBJSON="0.15" LIBPROTOBUF="1.3.3" WAGYU="0.5.0
(Internal)" TOPOLOGY
}}}
I am now testing on a much smaller dataset from
klima.grunnkart_utslippsfaktor_generalisert and we seems to have the same
problems there on latest master.
I will have those tests and that smaller dataset ready over the weekend
running latest master and provide those for upload some place.
--
Ticket URL: <https://trac.osgeo.org/postgis/ticket/5953>
PostGIS <http://trac.osgeo.org/postgis/>
The PostGIS Trac is used for bug, enhancement & task tracking, a user and developer wiki, and a view into the subversion code repository of PostGIS project.
More information about the postgis-tickets
mailing list