[postgis-devel] cost of some functions (e.g. st_within) might be too high after 3ddc11483b

Tue Oct 4 19:26:21 PDT 2022

> On Mon, Oct 3, 2022 at 9:59 AM Tomas Vondra
> <tomas.vondra at 2ndquadrant.com> wrote:
> 
> >
> > On 9/29/22 23:11, Paul Ramsey wrote:
> > > So, historically, we have had these these functions like
> > > ST_Contains() that have an implicit index operation within them, so
> > > you could do
> > >
> > > SELECT * FROM foo WHERE ST_Contains(geom, ...)
> > >
> > > and you'd get an efficient plan, even without adding the magical
> > > index operator (&&). This was being friendly to our users and
> > > following the standard for spatial SQL to the letter.
> > >
> > > This was accomplished with a SQL function ST_Contains that wrapped
> > > up the C function _ST_Contains() and the magic operator. This trick
> > > worked for a long time indeed (and was, in fact, the only solution)
> > > but it did force us to not "correctly" cost our (really expensive,
> > > compared to other database functions) spatial functions. When we
> > > pushed up costs for functions, the planner began to get angry.
> >
> > > /*
> > >  * High costs can only be used for PostGIS 3/PgSQL 12
> > >  * where the support functions have been used in
> > >  * place of index SQL inlining.
> > >  * See https://trac.osgeo.org/postgis/ticket/3675
> > >  * for sideffects of costing inlined SQL.
> > >  */
> > > #if POSTGIS_PGSQL_VERSION >= 120
> > > #define _COST_DEFAULT COST 1
> > > #define _COST_LOW COST 50
> > > #define _COST_MEDIUM COST 500
> > > #define _COST_HIGH COST 10000
> > > #else
> > > #define _COST_DEFAULT COST 1
> > > #define _COST_LOW COST 1
> > > #define _COST_MEDIUM COST 10
> > > #define _COST_HIGH COST 10
> > > #endif
> > >
> > > With PgSQL 12, we were no longer in-lining, and we could bump up
> > > those costs without any apparently downside, with the upside that we
> > > even got more parallel plans for things like joins and so on. So we
> > > did that.
> > >
> > > Notwithstanding that there were some attempts to get metrics on the
> > > cost of some of our functions, we didn't really do anything
> > > generating true metrics behind these numbers, except in an
> > > order-of-magnitude sense, sort of. So there's no moral reason not to
> > > jimmy with them. What we lack is any sort of way of cross-comparing
> > > the upsides and downsides of various arbitrary numbers. Maybe we
> > > should but HIGH_COST in half? Why not? I don't know? Why? Because in
> > > this case it seems to fit better.
> > >
> >
> > Maybe. I agree our costing model is not really exact science, so maybe
> > tweaking it a bit is the right thing to do. OTOH once the costs get
> > into production it's generally harder because it may change behavior
> > (even if the negative cases are rare).
> >
> > > Since the actual cost of these high cost functions is in fact
> > > incredibly variable (depending on size of inputs) there's no actual
> > > right answer. Doing some extra work on function level costing could
> > > potentially make things less variable, but we'd still have an opaque
> > > function cost multiplier in the equation.
> > >
> >
> > Yeah.
> >
> > > I'm tempted to just bump the value down, and see what happens,
> > > navigating by sonar, as it were, since we lack any other basis for
> > > these numbers at the moment.
> 
> Any objections to cutting the HIGH value in half? Honestly we could cut
all
> values in half. In main? In 3.3? 3.2? comments?
> 
> P
> 
I'd feel more comfortable if we did it only for 3.4 and perhaps do a
backport later.  I'd like to create some queries in my performance suite to
try to exercise some of these and would be easier if I can compare 3.4 and
3.3, which are largely still the same.

I would also half the MEDIUM cost one.