<div dir="ltr">I have performance related question as I noticed strange PostGIS behaviour I can not explain, that degrades query performance.<br><br>I do have a table with North America continent split into 1mln irregular sectors based on population density. Some of the sectors are 100mx100m (cities) and some are 100kmx100km (north Canada) in size.<br><br>Table definition:<br><br>db=> \d+ table<br>                                                   Table "table”<br>     Column      |            Type             | Collation | Nullable | Default | Storage  | Compression | Stats target | Description<br>-----------------+-----------------------------+-----------+----------+---------+----------+-------------+--------------+-------------<br> id              | character(36)               |           | not null |         | extended |             |              |<br> polygon         | geometry(MultiPolygon,4326) |           |          |         | main     |             | 10000        |<br>…<br>Indexes:<br>    "table_pkey" PRIMARY KEY, btree (id)<br>    "table_gist_idx" gist (polygon)<br>     …<br>Access method: heap<br><br>db=> select version();<br>                                                   version<br>--------------------------------------------------------------------------------------------------------------<br> PostgreSQL 14.7 on aarch64-unknown-linux-gnu, compiled by gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6), 64-bit<br>(1 row)<br><br>db=> select postgis_full_version();<br>                                                                                                                                                                       postgis_full_version<br>------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------<br>-------------------------------------------------------------------------------------------------------------------------------------------------------------------------<br> POSTGIS="3.1.7 aafe1ff" [EXTENSION] PGSQL="120" (procs need upgrade for use with PostgreSQL "140") GEOS="3.9.1-CAPI-1.14.2" PROJ="8.0.1" GDAL="GDAL 3.4.3, released 2022/04/22" LIBXML="2<br>.9.1" LIBJSON="0.15" LIBPROTOBUF="1.3.2" WAGYU="0.5.0 (Internal)" (core procs from "3.1.5 c60e4e3" need upgrade) RASTER (raster procs from "3.1.5 c60e4e3" need upgrade)<br>(1 row)<br><br>db=> select count(*) from table;<br>  count<br>---------<br> 1083887<br>(1 row)<br><br>db=> select oid::regclass,  pg_size_pretty(pg_relation_size(oid)) from pg_class WHERE oid::regclass::text like ‘table%’;<br>                     oid                     | pg_size_pretty<br>---------------------------------------------+----------------<br> table_pkey                          | 61 MB<br> table_gist_idx                      | 63 MB<br> table                               | 774 MB<br>(6 rows)<br><br>And now the problem - the first symptoms are visible with ST_Contains function - the index is used correctly but the row count estimate (1075) is way too high. I think it’s causing parallelism to kick off breaking query performance:<br><br>db=> EXPLAIN ANALYZE SELECT id FROM table WHERE ST_Contains(polygon, ST_PointFromText('POINT(-84.68998 39.1622)', 4326));<br>                                                               QUERY PLAN<br>-----------------------------------------------------------------------------------------------------------------------------------------<br> Gather  (cost=1044.35..20805.92 rows=1 width=36) (actual time=0.479..52.232 rows=1 loops=1)<br>   Workers Planned: 1<br>   Workers Launched: 1<br>   ->  Parallel Bitmap Heap Scan on table  (cost=44.35..19805.82 rows=1 width=36) (actual time=0.154..0.154 rows=0 loops=2)<br>         Filter: st_contains(polygon, '0101000020E610000039B9DFA1282C55C0A2B437F8C2944340'::geometry)<br>         Rows Removed by Filter: 0<br>         Heap Blocks: exact=1<br>         ->  Bitmap Index Scan on table_gist_idx  (cost=0.00..44.35 rows=1075 width=0) (actual time=0.066..0.066 rows=2 loops=1)<br>               Index Cond: (polygon ~ '0101000020E610000039B9DFA1282C55C0A2B437F8C2944340'::geometry)<br> Planning Time: 0.287 ms<br> Execution Time: 52.282 ms<br><br><br>The root cause of the problem with ST_Contains row estimation sources from bounding box ~ operator estimate (1075 as before). The parallelism doesn’t kick in here however as the BB operation is much cheaper then ST_Contains function:<br><br>db=> EXPLAIN ANALYZE SELECT id FROM table WHERE polygon ~ ST_PointFromText('POINT(-84.68998 39.1622)', 4326);<br>                                                            QUERY PLAN<br>-----------------------------------------------------------------------------------------------------------------------------------<br> Bitmap Heap Scan on table  (cost=44.62..4004.37 rows=1075 width=36) (actual time=0.068..0.069 rows=2 loops=1)<br>   Recheck Cond: (polygon ~ '0101000020E610000039B9DFA1282C55C0A2B437F8C2944340'::geometry)<br>   Heap Blocks: exact=1<br>   ->  Bitmap Index Scan on table_gist_idx  (cost=0.00..44.35 rows=1075 width=0) (actual time=0.063..0.063 rows=2 loops=1)<br>         Index Cond: (polygon ~ '0101000020E610000039B9DFA1282C55C0A2B437F8C2944340'::geometry)<br> Planning Time: 0.210 ms<br> Execution Time: 0.106 ms<br>(7 rows)<br><br>And here comes the surprise - theoretically less restrictive && operator does it kind a correct predicting single row:<br><br>db=> EXPLAIN ANALYZE SELECT id FROM table WHERE polygon && ST_PointFromText('POINT(-84.68998 39.1622)', 4326);<br>                                                               QUERY PLAN<br>-----------------------------------------------------------------------------------------------------------------------------------------<br> Index Scan using table_gist_idx on table  (cost=0.29..8.30 rows=1 width=36) (actual time=0.068..0.069 rows=2 loops=1)<br>   Index Cond: (polygon && '0101000020E610000039B9DFA1282C55C0A2B437F8C2944340'::geometry)<br> Planning Time: 0.371 ms<br> Execution Time: 0.117 ms<br>(4 rows)<br><br>As a workaround combining && with ST_Contains does the job for me but imaybe there is room for improvement?<br><br>db=> EXPLAIN ANALYZE SELECT id FROM table WHERE polygon && ST_PointFromText('POINT(-84.68998 39.1622)', 4326)<br>                                            AND ST_Contains(polygon, ST_PointFromText('POINT(-84.68998 39.1622)', 4326));<br>                                                                                 QUERY PLAN<br>----------------------------------------------------------------------------------------------------------------------------------------------------------------------------<br> Index Scan using table_gist_idx on table  (cost=0.29..33.30 rows=1 width=36) (actual time=0.077..0.078 rows=1 loops=1)<br>   Index Cond: ((polygon && '0101000020E610000039B9DFA1282C55C0A2B437F8C2944340'::geometry) AND (polygon ~ '0101000020E610000039B9DFA1282C55C0A2B437F8C2944340'::geometry))<br>   Filter: st_contains(polygon, '0101000020E610000039B9DFA1282C55C0A2B437F8C2944340'::geometry)<br>   Rows Removed by Filter: 1<br> Planning Time: 0.451 ms<br> Execution Time: 0.141 ms<br>(6 rows)<br><br>So the question is - why less restrictive intersect && operator gives lower row count prediction than more restrictive contains ~ operator? Shouldn’t the row count estimate for ~ operator be the same or lower as for &&?  Is it a bug in PostGIS (or PostgreSQL core)?<br><br>Best regards<br>   Wojtek Strzałka<br></div>