[postgis-users] Row estimations
Paul Ramsey
pramsey at cleverelephant.ca
Tue Dec 13 09:04:15 PST 2022
On Mon, Dec 12, 2022 at 7:17 PM Regina Obe <lr at pcorp.us> wrote:
>
> Paul,
>
> Is it the same story with ST_CoveredBy and ST_Covers. I care more about
> those than ST_Contains and ST_Within.
Yes, it's the same story. The only place we use a real selectivity
calculation is in the && operator.
> Maybe part of the reason it hasn't been brought up if it is a really old
> issue, is most people just use ST_Intersects or ST_DWithin.
I assume that's a lot of it. Though I know that there's a certain
gravitation towards contains/within for point-in-polygon work (people
"feel" the containment more for points in polygons... ironically it's
a case with no performance advantage to using containment, while
poly/poly tests actually get a small boost on walking the index for
pure containment.
>
> This ticket looks like it might be a possible complaint of the issue.
> https://trac.osgeo.org/postgis/ticket/4754
> That suggests it's an issue of PostGIS 3+
I don't think so..., just eyeballing 2.5 we use constant selectivity
on all those operators too, which would seem to indicate that this has
been the case pretty much forever. And the wrapper for 2.5 ST_Contains
does use the containment (~) operator, not the overlaps (&&) operator.
CREATE OR REPLACE FUNCTION ST_Contains(geom1 geometry, geom2 geometry)
RETURNS boolean
AS 'SELECT $1 OPERATOR(@extschema at .~) $2 AND
@extschema at ._ST_Contains($1,$2)'
LANGUAGE 'sql' IMMUTABLE _PARALLEL;
It is possible that at some earlier stage we used && in all the
wrappers? Then we'd get correct selectivity on those plans, at the
expense of slightly less efficient index searching.
> I thought we had others too about ST_covers but can't find any open ones.
Hard to believe, but this might just be a very very very old bug.
P.
>
>
> > -----Original Message-----
> > From: postgis-users [mailto:postgis-users-bounces at lists.osgeo.org] On
> Behalf
> > Of Paul Ramsey
> > Sent: Monday, December 12, 2022 5:57 PM
> > To: PostGIS Users Discussion <postgis-users at lists.osgeo.org>
> > Subject: Re: [postgis-users] Row estimations
> >
> > From what I've been able to find out today, this bug may have existed for
> a
> > very long time. Firstly, the selectivity calculation is correct (using
> your test
> > setup).
> >
> > =# select _postgis_selectivity('test'::regclass, 'p', 'POLYGON((0
> > 0,0.8 0,0.8 0.8,0 0.8,0 0))');
> > _postgis_selectivity
> > ----------------------
> > 0.7886187126157185
> >
> > That seems to be the selectivity applied in the ST_Intersects plan, but
> not in
> > the ST_Contains plan.
> >
> > Now, both ST_Intersects and ST_Contains are "index assisted"
> > functions. Under the covers, we add index operators to them before they
> are
> > planned. Before PgSQL 12, this was done with a SQL wrapper, and since then
> > with a "support function". In both cases though, the effect is to add an
> > operator to the query.
> >
> > For ST_Intersects, && is added. For ST_Contains, @ is added. And here we
> get
> > into the problem... if you look at the definitions of && and @, you'll see
> this.
> >
> > CREATE OPERATOR && (
> > LEFTARG = geometry, RIGHTARG = geometry, PROCEDURE =
> > geometry_overlaps,
> > COMMUTATOR = '&&',
> > RESTRICT = gserialized_gist_sel_2d,
> > JOIN = gserialized_gist_joinsel_2d
> > );
> >
> > CREATE OPERATOR @ (
> > LEFTARG = geometry, RIGHTARG = geometry, PROCEDURE =
> > geometry_within,
> > COMMUTATOR = '~',
> > RESTRICT = contsel, JOIN = contjoinsel );
> >
> > The && operator is using the PostGIS selectivity calculator to estimate
> rows
> > returned. The @ operator is using an in-built function that returns a
> constant
> > (small) selectivity. Effectively pushing the planner to use the index
> almost all
> > the time.
> >
> > For many (most) spatial queries, a selectivity that pushes the plan into
> using
> > the index is a Good and Correct Thing (tm). But as queries get more
> complex
> > and involve more tables, the odds that this is the Wrong Thing, go up.
> >
> > I'm not sure how long this situation has existed, but I'm inclined to
> thing a Very
> > Long Time. So I'm surprised we haven't had reports about it before. I'm
> also a
> > little surprised we didn't apply selectivity calculations to
> contains/within a long
> > time ago. A simple and effective fix is to replace contsel with
> > gserialized_gist_sel_2d for the ~ and @ operators. Unfortunately it's a
> hard fix
> > to apply in in-place upgrades and in batch patches.
> >
> > Anyone who feels like researching the history of selectivity in @/~ by
> looking
> > at old releases, I'd be interested in knowing how long this situation has
> existed.
> > (Forever?)
> >
> > P.
> >
> >
> > On Fri, Dec 9, 2022 at 10:14 AM Paul Ramsey <pramsey at cleverelephant.ca>
> > wrote:
> > >
> > >
> > >
> > > > On Dec 9, 2022, at 10:11 AM, Igor ALBUQUERQUE SILVA <i.albuquerque-
> > silva at kayrros.com> wrote:
> > > >
> > > > In this example ST_Contains estimates 785 rows, while ST_Intersect
> > estimates 615469 which is closer to the real 621571 rows. Using the
> bounding
> > box operation I also obtain good estimations, and I also obtain good
> > estimations if I don't use the gist index.
> > > >
> > >
> > > This still seems Really Bad (tm) in that the contains estimate is still
> tragically
> > wrong. However, we have at least some clue as to where to hunt for this
> bug.
> > Thanks for continuing to push the testing forward.
> > >
> > > P.
> > _______________________________________________
> > postgis-users mailing list
> > postgis-users at lists.osgeo.org
> > https://lists.osgeo.org/mailman/listinfo/postgis-users
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users
More information about the postgis-users
mailing list