[postgis-users] Row estimations

Paul Ramsey pramsey at cleverelephant.ca
Tue Dec 13 11:31:05 PST 2022


Just to put a nail in this one: the last time our selectivity estimates for Contains/Within were good was version 2.1, which used the && operator instead of hte ~/@ operators in the SQL wrapper functions.

> On Dec 13, 2022, at 9:04 AM, Paul Ramsey <pramsey at cleverelephant.ca> wrote:
> 
> On Mon, Dec 12, 2022 at 7:17 PM Regina Obe <lr at pcorp.us> wrote:
>> 
>> Paul,
>> 
>> Is it the same story with ST_CoveredBy and ST_Covers.  I care more about
>> those than ST_Contains and ST_Within.
> 
> Yes, it's the same story. The only place we use a real selectivity
> calculation is in the && operator.
> 
>> Maybe part of the reason it hasn't been brought up if it is a really old
>> issue, is most people just use ST_Intersects or ST_DWithin.
> 
> I assume that's a lot of it. Though I know that there's a certain
> gravitation towards contains/within for point-in-polygon work (people
> "feel" the containment more for points in polygons... ironically it's
> a case with no performance advantage to using containment, while
> poly/poly tests actually get a small boost on walking the index for
> pure containment.
> 
>> 
>> This ticket looks like it might be a possible complaint of the issue.
>> https://trac.osgeo.org/postgis/ticket/4754
>> That suggests it's an issue of PostGIS 3+
> 
> I don't think so..., just eyeballing 2.5 we use constant selectivity
> on all those operators too, which would seem to indicate that this has
> been the case pretty much forever. And the wrapper for 2.5 ST_Contains
> does use the containment (~) operator, not the overlaps (&&) operator.
> 
> CREATE OR REPLACE FUNCTION ST_Contains(geom1 geometry, geom2 geometry)
>    RETURNS boolean
>    AS 'SELECT $1 OPERATOR(@extschema at .~) $2 AND
> @extschema at ._ST_Contains($1,$2)'
>    LANGUAGE 'sql' IMMUTABLE _PARALLEL;
> 
> It is possible that at some earlier stage we used && in all the
> wrappers? Then we'd get correct selectivity on those plans, at the
> expense of slightly less efficient index searching.
> 
>> I thought we had others too about ST_covers but can't find any open ones.
> 
> Hard to believe, but this might just be a very very very old bug.
> 
> P.
> 
> 
>> 
>> 
>>> -----Original Message-----
>>> From: postgis-users [mailto:postgis-users-bounces at lists.osgeo.org] On
>> Behalf
>>> Of Paul Ramsey
>>> Sent: Monday, December 12, 2022 5:57 PM
>>> To: PostGIS Users Discussion <postgis-users at lists.osgeo.org>
>>> Subject: Re: [postgis-users] Row estimations
>>> 
>>> From what I've been able to find out today, this bug may have existed for
>> a
>>> very long time. Firstly, the selectivity calculation is correct (using
>> your test
>>> setup).
>>> 
>>> =# select _postgis_selectivity('test'::regclass, 'p', 'POLYGON((0
>>> 0,0.8 0,0.8 0.8,0 0.8,0 0))');
>>> _postgis_selectivity
>>> ----------------------
>>>   0.7886187126157185
>>> 
>>> That seems to be the selectivity applied in the ST_Intersects plan, but
>> not in
>>> the ST_Contains plan.
>>> 
>>> Now, both ST_Intersects and ST_Contains are "index assisted"
>>> functions. Under the covers, we add index operators to them before they
>> are
>>> planned. Before PgSQL 12, this was done with a SQL wrapper, and since then
>>> with a "support function". In both cases though, the effect is to add an
>>> operator to the query.
>>> 
>>> For ST_Intersects, && is added. For ST_Contains, @ is added. And here we
>> get
>>> into the problem... if you look at the definitions of && and @, you'll see
>> this.
>>> 
>>> CREATE OPERATOR && (
>>>    LEFTARG = geometry, RIGHTARG = geometry, PROCEDURE =
>>> geometry_overlaps,
>>>    COMMUTATOR = '&&',
>>>    RESTRICT = gserialized_gist_sel_2d,
>>>    JOIN = gserialized_gist_joinsel_2d
>>> );
>>> 
>>> CREATE OPERATOR @ (
>>>    LEFTARG = geometry, RIGHTARG = geometry, PROCEDURE =
>>> geometry_within,
>>>    COMMUTATOR = '~',
>>>    RESTRICT = contsel, JOIN = contjoinsel );
>>> 
>>> The && operator is using the PostGIS selectivity calculator to estimate
>> rows
>>> returned. The @ operator is using an in-built function that returns a
>> constant
>>> (small) selectivity. Effectively pushing the planner to use the index
>> almost all
>>> the time.
>>> 
>>> For many (most) spatial queries, a selectivity that pushes the plan into
>> using
>>> the index is a Good and Correct Thing (tm). But as queries get more
>> complex
>>> and involve more tables, the odds that this is the Wrong Thing, go up.
>>> 
>>> I'm not sure how long this situation has existed, but I'm inclined to
>> thing a Very
>>> Long Time. So I'm surprised we haven't had reports about it before. I'm
>> also a
>>> little surprised we didn't apply selectivity calculations to
>> contains/within a long
>>> time ago. A simple and effective fix is to replace contsel with
>>> gserialized_gist_sel_2d for the ~ and @ operators. Unfortunately it's a
>> hard fix
>>> to apply in in-place upgrades and in batch patches.
>>> 
>>> Anyone who feels like researching the history of selectivity in @/~ by
>> looking
>>> at old releases, I'd be interested in knowing how long this situation has
>> existed.
>>> (Forever?)
>>> 
>>> P.
>>> 
>>> 
>>> On Fri, Dec 9, 2022 at 10:14 AM Paul Ramsey <pramsey at cleverelephant.ca>
>>> wrote:
>>>> 
>>>> 
>>>> 
>>>>> On Dec 9, 2022, at 10:11 AM, Igor ALBUQUERQUE SILVA <i.albuquerque-
>>> silva at kayrros.com> wrote:
>>>>> 
>>>>> In this example ST_Contains estimates 785 rows, while ST_Intersect
>>> estimates 615469 which is closer to the real 621571 rows. Using the
>> bounding
>>> box operation I also obtain good estimations, and I also obtain good
>>> estimations if I don't use the gist index.
>>>>> 
>>>> 
>>>> This still seems Really Bad (tm) in that the contains estimate is still
>> tragically
>>> wrong. However, we have at least some clue as to where to hunt for this
>> bug.
>>> Thanks for continuing to push the testing forward.
>>>> 
>>>> P.
>>> _______________________________________________
>>> postgis-users mailing list
>>> postgis-users at lists.osgeo.org
>>> https://lists.osgeo.org/mailman/listinfo/postgis-users
>> 
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/postgis-users



More information about the postgis-users mailing list