[PostGIS] #5953: Slow st_intersection with big input polygons

PostGIS trac at osgeo.org
Thu Jul 24 22:53:59 PDT 2025


#5953: Slow st_intersection with big input polygons
-------------------------------+---------------------------
 Reporter:  Lars Aksel Opsahl  |      Owner:  pramsey
     Type:  enhancement        |     Status:  new
 Priority:  medium             |  Milestone:  PostGIS 3.6.0
Component:  postgis            |    Version:  master
 Keywords:                     |
-------------------------------+---------------------------
 We have case where users can upload polygons and we use those polygons to
 run a intersection against another big table. We do not have any control
 of what type polygons the user send in.

 We are now forced to use st_subdivide to get any OK performance.

 Yes we do analyze of input and adjust how to run quary based what input
 looks like for instance by use st_subdivide, coveredBy and so on, but I
 was that kind of analyzes was done by postgis.

 The table klima.plantegning_3035_test contains the single input from the
 user. And both table have gist indexes. Just be sure I tested with && but
 as expected that does not make any difference.

 Here is samples from

 {{{
 PostgreSQL 16.3 (Ubuntu 16.3-1.pgdg22.04+1) on x86_64-pc-linux-gnu,
 compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit

 POSTGIS="3.6.0dev 3.5.0-314-g637410f77" [EXTENSION] PGSQL="160"
 GEOS="3.13.0beta2-CAPI-1.19.0" PROJ="9.3.0 NETWORK_ENABLED=OFF
 URL_ENDPOINT=https://cdn.proj.org USER_WRITABLE_DIRECTORY=/tmp/proj
 DATABASE_PATH=/usr/local/share/proj/proj.db" (compiled against PROJ 9.3.0)
 LIBXML="2.9.13" LIBJSON="0.15" LIBPROTOBUF="1.3.3" WAGYU="0.5.0
 (Internal)" TOPOLOGY
 }}}

 With subdivide takes less than half a minute


 {{{
 explain analyze select sum(ST_area(st_intersection(gk.geo, pt.geo)))
 FROM
 (select st_subdivide(geo,300) as geo from klima.plantegning_3035_test) pt,
 klima.grunnkart_utslippsfaktor_generalisert AS gk
 WHERE ST_Intersects(gk.geo, pt.geo)

 QUERY PLAN

 -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=721271258.09..721271258.10 rows=1 width=8) (actual
 time=16973.986..16973.988 rows=1 loops=1)
         ->  Nested Loop  (cost=0.42..68093212.52 rows=51726632 width=778)
 (actual time=990.824..2697.340 rows=461431 loops=1)
                                 ->  ProjectSet  (cost=0.00..18.02
 rows=1000 width=32) (actual time=990.647..993.890 rows=1208 loops=1)
                                                         ->  Seq Scan on
 plantegning_3035_test  (cost=0.00..0.51 rows=1 width=2379616) (actual
 time=0.009..0.012 rows=1 loops=1)
                                 ->  Index Scan using
 geoidx_klima_grunnkart_utslippsfaktor_generalisert on
 grunnkart_utslippsfaktor_generalisert gk  (cost=0.42..68041.46
 rows=5173 width=746) (actual time=0.061..1.376 rows=382 loops=1208)
                                                         Index Cond: (geo
 && (st_subdivide(plantegning_3035_test.geo, 300, '-1'::double precision)))
                                                         Filter:
 st_intersects(geo, (st_subdivide(plantegning_3035_test.geo, 300,
 '-1'::double precision)))
                                                         Rows Removed by
 Filter: 59
 Planning Time: 0.836 ms
 JIT:
         Functions: 13
         Options: Inlining true, Optimization true, Expressions true,
 Deforming true
         Timing: Generation 1.142 ms, Inlining 16.787 ms, Optimization
 81.470 ms, Emission 49.597 ms, Total 148.995 ms
 Execution Time: 16975.206 ms
 (14 rows)

 }}}

 With out subdivide it takes hours

 {{{
 explain analyze select sum(ST_area(st_intersection(gk.geo, pt.geo)))
 FROM
 klima.plantegning_3035_test AS pt,
 klima.grunnkart_utslippsfaktor_generalisert AS gk
 WHERE ST_Intersects(gk.geo, pt.geo);

 QUERY PLAN

 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=552890.66..552890.67 rows=1 width=8) (actual
 time=5251380.919..5251380.921 rows=1 loops=1)
         ->  Nested Loop  (cost=0.42..70027.69 rows=38239 width=2380362)
 (actual time=177.699..6158.529 rows=442738 loops=1)
                                 ->  Seq Scan on plantegning_3035_test pt
 (cost=0.00..0.51 rows=1 width=2379616) (actual time=0.010..0.012 rows=1
 loops=1)
                                 ->  Index Scan using
 geoidx_klima_grunnkart_utslippsfaktor_generalisert on
 grunnkart_utslippsfaktor_generalisert gk  (cost=0.42..69975.45
 rows=5173 width=746) (actual time=29.570..5759.045 rows=442738 loops=1)
                                                         Index Cond: (geo
 && pt.geo)
                                                         Filter:
 st_intersects(geo, pt.geo)
                                                         Rows Removed by
 Filter: 171189
 Planning Time: 1.665 ms
 JIT:
         Functions: 10
         Options: Inlining true, Optimization true, Expressions true,
 Deforming true
         Timing: Generation 1.303 ms, Inlining 18.157 ms, Optimization
 78.832 ms, Emission 48.786 ms, Total 147.079 ms
 Execution Time: 5251384.010 ms
 (13 rows)

 }}}

 There is quit a bit holes in the input polygon and if I remove those it
 takes more than a day.


 {{{
 explain analyze select sum(ST_area(st_intersection(gk.geo, pt.geo)))
 FROM
 ( select ST_BuildArea(ST_Boundary(geo)) as geo from
 klima.plantegning_3035_test) pt,
 klima.grunnkart_utslippsfaktor_generalisert AS gk
 WHERE gk.geo && pt.geo AND ST_Intersects(gk.geo, pt.geo)
 sl-# ;

                                                         QUERY PLAN
 -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1393.97..1393.98 rows=1 width=8) (actual
 time=109750375.276..109750375.573 rows=1 loops=1)
         ->  Nested Loop  (cost=26.67..54.84 rows=52 width=2380362) (actual
 time=65777.772..60678842.877 rows=442738 loops=1)
                                 ->  Seq Scan on plantegning_3035_test
 (cost=0.00..0.51 rows=1 width=2379616) (actual time=0.004..0.310 rows=1
 loops=1)
                                 ->  Index Scan using
 geoidx_klima_grunnkart_utslippsfaktor_generalisert on
 grunnkart_utslippsfaktor_generalisert gk  (cost=26.67..54.32 rows=1
 width=746) (actual time=65583.985..60677581.594 rows=442738 loops=1)
                                                         Index Cond: ((geo
 && st_buildarea(st_boundary(plantegning_3035_test.geo))) AND (geo &&
 st_buildarea(st_boundary(plantegning_3035_test.geo))))
                                                         Filter:
 st_intersects(geo, st_buildarea(st_boundary(plantegning_3035_test.geo)))
                                                         Rows Removed by
 Filter: 171189
 Planning Time: 0.175 ms
 Execution Time: 109750390.145 ms
 (9 rows)

 Time: 109750395.411 ms (1 d 06:29:10.395)


 }}}

 If I just do a count thats fast.


 {{{
 select sum(ST_area(gk.geo)) area, count(geo) num_geo,
 sum(ST_Npoints(gk.geo)) points, ST_Envelope(ST_Collect(gk.geo)) FROM
 (
         SELECT distinct gk.geo
         FROM
         (select st_subdivide(geo,300) as geo from
 klima.plantegning_3035_test) pt,
         klima.grunnkart_utslippsfaktor_generalisert AS gk
         WHERE ST_Intersects(gk.geo, pt.geo)
 ) gk;
                                 area        | num_geo |  points  |
 st_envelope
 --------------------+---------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1015720043.1571457 |  442738 | 13524258 |
 0103000020DB0B0000010000000500000083BE84064172504139DA0264B91D514183BE840641725041E48311773E445141FEB28C0701A55041E48311773E445141FEB28C0701A5504139DA0264B91D514183BE84064172504139DA0264B91D5141
 (1 row)

 Time: 6156.295 ms (00:06.156)

 }}}


 But faster with out subdivide

 {{{

 select sum(ST_area(gk.geo)) area, count(geo) num_geo,
 sum(ST_Npoints(gk.geo)) points, ST_Envelope(ST_Collect(gk.geo)) FROM
 (
         SELECT distinct gk.geo
         FROM
         klima.plantegning_3035_test AS pt,
         klima.grunnkart_utslippsfaktor_generalisert AS gk
         WHERE ST_Intersects(gk.geo, pt.geo)
 ) gk
 ;
                                 area        | num_geo |  points  |
 st_envelope
 --------------------+---------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1015720043.1571265 |  442738 | 13524258 |
 0103000020DB0B0000010000000500000083BE84064172504139DA0264B91D514183BE840641725041E48311773E445141FEB28C0701A55041E48311773E445141FEB28C0701A5504139DA0264B91D514183BE84064172504139DA0264B91D5141
 (1 row)

 Time: 3443.553 ms (00:03.444)

 }}}



 I have started to test the same case on and we seems ti have the same
 problems.

 {{{
 PostgreSQL 16.4 (Ubuntu 16.4-1.pgdg22.04+1) on x86_64-pc-linux-gnu,
 compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit

 POSTGIS="3.6.0beta1 3.6.0beta1-4-g1458a794d" [EXTENSION] PGSQL="160"
 GEOS="3.13.1-CAPI-1.19.2" PROJ="9.3.0 NETWORK_ENABLED=OFF
 URL_ENDPOINT=https://cdn.proj.org USER_WRITABLE_DIRECTORY=/tmp/proj
 DATABASE_PATH=/usr/local/share/proj/proj.db" (compiled against PROJ 9.3.0)
 LIBXML="2.9.13" LIBJSON="0.15" LIBPROTOBUF="1.3.3" WAGYU="0.5.0
 (Internal)" TOPOLOGY

 }}}


 I am now testing on a much smaller dataset from
 klima.grunnkart_utslippsfaktor_generalisert and we seems to have the same
 problems there on latest master.

 I will have those tests and that smaller dataset ready over the weekend
 running latest master and provide those for upload some place.
-- 
Ticket URL: <https://trac.osgeo.org/postgis/ticket/5953>
PostGIS <http://trac.osgeo.org/postgis/>
The PostGIS Trac is used for bug, enhancement & task tracking, a user and developer wiki, and a view into the subversion code repository of PostGIS project.


More information about the postgis-tickets mailing list