[postgis-users] Row estimations

Paul Ramsey pramsey at cleverelephant.ca
Mon Dec 12 14:56:56 PST 2022


>From what I've been able to find out today, this bug may have existed
for a very long time. Firstly, the selectivity calculation is correct
(using your test setup).

=# select _postgis_selectivity('test'::regclass, 'p', 'POLYGON((0
0,0.8 0,0.8 0.8,0 0.8,0 0))');
 _postgis_selectivity
----------------------
   0.7886187126157185

That seems to be the selectivity applied in the ST_Intersects plan,
but not in the ST_Contains plan.

Now, both ST_Intersects and ST_Contains are "index assisted"
functions. Under the covers, we add index operators to them before
they are planned. Before PgSQL 12, this was done with a SQL wrapper,
and since then with a "support function". In both cases though, the
effect is to add an operator to the query.

For ST_Intersects, && is added. For ST_Contains, @ is added. And here
we get into the problem... if you look at the definitions of && and @,
you'll see this.

CREATE OPERATOR && (
    LEFTARG = geometry, RIGHTARG = geometry, PROCEDURE = geometry_overlaps,
    COMMUTATOR = '&&',
    RESTRICT = gserialized_gist_sel_2d,
    JOIN = gserialized_gist_joinsel_2d
);

CREATE OPERATOR @ (
    LEFTARG = geometry, RIGHTARG = geometry, PROCEDURE = geometry_within,
    COMMUTATOR = '~',
    RESTRICT = contsel, JOIN = contjoinsel
);

The && operator is using the PostGIS selectivity calculator to
estimate rows returned. The @ operator is using an in-built function
that returns a constant (small) selectivity. Effectively pushing the
planner to use the index almost all the time.

For many (most) spatial queries, a selectivity that pushes the plan
into using the index is a Good and Correct Thing (tm). But as queries
get more complex and involve more tables, the odds that this is the
Wrong Thing, go up.

I'm not sure how long this situation has existed, but I'm inclined to
thing a Very Long Time. So I'm surprised we haven't had reports about
it before. I'm also a little surprised we didn't apply selectivity
calculations to contains/within a long time ago. A simple and
effective fix is to replace contsel with gserialized_gist_sel_2d for
the ~ and @ operators. Unfortunately it's a hard fix to apply in
in-place upgrades and in batch patches.

Anyone who feels like researching the history of selectivity in @/~ by
looking at old releases, I'd be interested in knowing how long this
situation has existed. (Forever?)

P.


On Fri, Dec 9, 2022 at 10:14 AM Paul Ramsey <pramsey at cleverelephant.ca> wrote:
>
>
>
> > On Dec 9, 2022, at 10:11 AM, Igor ALBUQUERQUE SILVA <i.albuquerque-silva at kayrros.com> wrote:
> >
> > In this example ST_Contains estimates 785 rows, while ST_Intersect estimates 615469 which is closer to the real 621571 rows. Using the bounding box operation I also obtain good estimations, and I also obtain good estimations if I don't use the gist index.
> >
>
> This still seems Really Bad (tm) in that the contains estimate is still tragically wrong. However, we have at least some clue as to where to hunt for this bug. Thanks for continuing to push the testing forward.
>
> P.


More information about the postgis-users mailing list