[postgis-users] Row estimations

Thu Dec 1 01:36:31 PST 2022

Hello everyone,

I have a question regarding the row estimations/gist indexes. Here's a
minimal reproduction of it:

create table test(p geometry(point));
insert into test(p) values (st_makepoint(1,1));
insert into test(p) values (st_makepoint(1,2));
insert into test(p) values (st_makepoint(2,1));
insert into test(p) values (st_makepoint(2,2));
insert into test(p) values (st_makepoint(51,1));
analyze test;
explain analyze select * from test where
ST_Contains(ST_GeomFromText('POLYGON((0 0,3 -0,3 3,0 3,0 0))'), p);
explain analyze select * from test where
ST_Contains(ST_GeomFromText('POLYGON((50 0,52 0,52 2,50 2,50 0))'), p);

The two queries get the same cost/row estimation, of 1 row. This is the
EXPLAIN ANALYZE of the first query:

Seq Scan on test  (cost=0.00..126.05 rows=1 width=32) (actual
time=0.015..0.022 rows=4 loops=1)
   Filter:
st_contains('01030000000100000005000000000000000000F0BF000000000000F0BF0000000000000040000000000000F0BF00000000000000400000000000000040000000000000F0BF0000000000000040000000000000F0BF000000000000F0BF'::geometry,
p)
   Rows Removed by Filter: 1
 Planning Time: 0.072 ms
 Execution Time: 0.035 ms
(5 rows)

What I was expecting is the first query to estimate 4 rows and the second
to estimate 1, like what I get If I try the same thing using integers.

create table test(x integer, y integer);
insert into test(x, y) values (0, 0);
insert into test(x, y) values (0, 1);
insert into test(x, y) values (1, 0);
insert into test(x, y) values (1, 1);
insert into test(x, y) values (50, 0);
analyze test;
explain analyze select * from test where x between 0 and 1 and y between 0
and 1;
explain analyze select * from test where x between 50 and 51 and y between
0 and 1;

My question is: is this expected behaviour? I actually have a much larger
table with a gist index where I found this occurring, and this causes the
planner to make bad decisions: every query that I do will have the same
estimation, and whenever this estimation is very wrong, the planner does
not take the optimal decision when it has to compare with other indexes
costs in a more complicated query.

I'm using the official docker image, PostgreSQL 15.1 (Debian
15.1-1.pgdg110+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6)
10.2.1 20210110, 64-bit POSTGIS="3.3.1 3786b21" [EXTENSION] PGSQL="150"
GEOS="3.9.0-CAPI-1.16.2" PROJ="7.2.1" LIBXML="2.9.10" LIBJSON="0.15"
LIBPROTOBUF="1.3.3" WAGYU="0.5.0 (Internal)" TOPOLOGY

Best regards,
Igor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20221201/e7b02673/attachment.htm>