[postgis-devel] cost of some functions (e.g. st_within) might be too high after 3ddc11483b

Tue Oct 4 15:49:31 PDT 2022

On Mon, Oct 3, 2022 at 9:59 AM Tomas Vondra
<tomas.vondra at 2ndquadrant.com> wrote:

>
> On 9/29/22 23:11, Paul Ramsey wrote:
> > So, historically, we have had these these functions like
> > ST_Contains() that have an implicit index operation within them, so
> > you could do
> >
> > SELECT * FROM foo WHERE ST_Contains(geom, ...)
> >
> > and you'd get an efficient plan, even without adding the magical
> > index operator (&&). This was being friendly to our users and
> > following the standard for spatial SQL to the letter.
> >
> > This was accomplished with a SQL function ST_Contains that wrapped up
> > the C function _ST_Contains() and the magic operator. This trick
> > worked for a long time indeed (and was, in fact, the only solution)
> > but it did force us to not "correctly" cost our (really expensive,
> > compared to other database functions) spatial functions. When we
> > pushed up costs for functions, the planner began to get angry.
>
> > /*
> >  * High costs can only be used for PostGIS 3/PgSQL 12
> >  * where the support functions have been used in
> >  * place of index SQL inlining.
> >  * See https://trac.osgeo.org/postgis/ticket/3675
> >  * for sideffects of costing inlined SQL.
> >  */
> > #if POSTGIS_PGSQL_VERSION >= 120
> > #define _COST_DEFAULT COST 1
> > #define _COST_LOW COST 50
> > #define _COST_MEDIUM COST 500
> > #define _COST_HIGH COST 10000
> > #else
> > #define _COST_DEFAULT COST 1
> > #define _COST_LOW COST 1
> > #define _COST_MEDIUM COST 10
> > #define _COST_HIGH COST 10
> > #endif
> >
> > With PgSQL 12, we were no longer in-lining, and we could bump up
> > those costs without any apparently downside, with the upside that we
> > even got more parallel plans for things like joins and so on. So we
> > did that.
> >
> > Notwithstanding that there were some attempts to get metrics on the
> > cost of some of our functions, we didn't really do anything
> > generating true metrics behind these numbers, except in an
> > order-of-magnitude sense, sort of. So there's no moral reason not to
> > jimmy with them. What we lack is any sort of way of cross-comparing
> > the upsides and downsides of various arbitrary numbers. Maybe we
> > should but HIGH_COST in half? Why not? I don't know? Why? Because in
> > this case it seems to fit better.
> >
>
> Maybe. I agree our costing model is not really exact science, so maybe
> tweaking it a bit is the right thing to do. OTOH once the costs get into
> production it's generally harder because it may change behavior (even if
> the negative cases are rare).
>
> > Since the actual cost of these high cost functions is in fact
> > incredibly variable (depending on size of inputs) there's no actual
> > right answer. Doing some extra work on function level costing could
> > potentially make things less variable, but we'd still have an opaque
> > function cost multiplier in the equation.
> >
>
> Yeah.
>
> > I'm tempted to just bump the value down, and see what happens,
> > navigating by sonar, as it were, since we lack any other basis for
> > these numbers at the moment.

Any objections to cutting the HIGH value in half? Honestly we could
cut all values in half. In main? In 3.3? 3.2? comments?

P

>
> Fine with me. I don't have any better idea (or data) at the moment.
>
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company