[postgis-devel] cost of some functions (e.g. st_within) might be too high after 3ddc11483b

Tue Oct 18 19:34:57 PDT 2022

I am willing to set up a test environment on a arm64 machine to run tests
for this.  I did something like this to test the advantage of a distributed
database vs stand alone.  If you could set up a matrix on the requirements
(function, PG/PostGIS versions, input data, etc.) I could play with it.  It
would be a raspberry pi 4, 64 bit, and 4 or 8GB memory.  Doing it via
Jenkins would be also possible.  Advantage on the pi is it is slow enough
that improvements should be obvious.

On Tue, Oct 18, 2022, 3:49 PM Paul Ramsey <pramsey at cleverelephant.ca> wrote:

> Going to track this here https://trac.osgeo.org/postgis/ticket/5259
>
> On Tue, Oct 4, 2022 at 7:26 PM Regina Obe <lr at pcorp.us> wrote:
> >
> > > On Mon, Oct 3, 2022 at 9:59 AM Tomas Vondra
> > > <tomas.vondra at 2ndquadrant.com> wrote:
> > >
> > > >
> > > > On 9/29/22 23:11, Paul Ramsey wrote:
> > > > > So, historically, we have had these these functions like
> > > > > ST_Contains() that have an implicit index operation within them, so
> > > > > you could do
> > > > >
> > > > > SELECT * FROM foo WHERE ST_Contains(geom, ...)
> > > > >
> > > > > and you'd get an efficient plan, even without adding the magical
> > > > > index operator (&&). This was being friendly to our users and
> > > > > following the standard for spatial SQL to the letter.
> > > > >
> > > > > This was accomplished with a SQL function ST_Contains that wrapped
> > > > > up the C function _ST_Contains() and the magic operator. This trick
> > > > > worked for a long time indeed (and was, in fact, the only solution)
> > > > > but it did force us to not "correctly" cost our (really expensive,
> > > > > compared to other database functions) spatial functions. When we
> > > > > pushed up costs for functions, the planner began to get angry.
> > > >
> > > > > /*
> > > > >  * High costs can only be used for PostGIS 3/PgSQL 12
> > > > >  * where the support functions have been used in
> > > > >  * place of index SQL inlining.
> > > > >  * See https://trac.osgeo.org/postgis/ticket/3675
> > > > >  * for sideffects of costing inlined SQL.
> > > > >  */
> > > > > #if POSTGIS_PGSQL_VERSION >= 120
> > > > > #define _COST_DEFAULT COST 1
> > > > > #define _COST_LOW COST 50
> > > > > #define _COST_MEDIUM COST 500
> > > > > #define _COST_HIGH COST 10000
> > > > > #else
> > > > > #define _COST_DEFAULT COST 1
> > > > > #define _COST_LOW COST 1
> > > > > #define _COST_MEDIUM COST 10
> > > > > #define _COST_HIGH COST 10
> > > > > #endif
> > > > >
> > > > > With PgSQL 12, we were no longer in-lining, and we could bump up
> > > > > those costs without any apparently downside, with the upside that
> we
> > > > > even got more parallel plans for things like joins and so on. So we
> > > > > did that.
> > > > >
> > > > > Notwithstanding that there were some attempts to get metrics on the
> > > > > cost of some of our functions, we didn't really do anything
> > > > > generating true metrics behind these numbers, except in an
> > > > > order-of-magnitude sense, sort of. So there's no moral reason not
> to
> > > > > jimmy with them. What we lack is any sort of way of cross-comparing
> > > > > the upsides and downsides of various arbitrary numbers. Maybe we
> > > > > should but HIGH_COST in half? Why not? I don't know? Why? Because
> in
> > > > > this case it seems to fit better.
> > > > >
> > > >
> > > > Maybe. I agree our costing model is not really exact science, so
> maybe
> > > > tweaking it a bit is the right thing to do. OTOH once the costs get
> > > > into production it's generally harder because it may change behavior
> > > > (even if the negative cases are rare).
> > > >
> > > > > Since the actual cost of these high cost functions is in fact
> > > > > incredibly variable (depending on size of inputs) there's no actual
> > > > > right answer. Doing some extra work on function level costing could
> > > > > potentially make things less variable, but we'd still have an
> opaque
> > > > > function cost multiplier in the equation.
> > > > >
> > > >
> > > > Yeah.
> > > >
> > > > > I'm tempted to just bump the value down, and see what happens,
> > > > > navigating by sonar, as it were, since we lack any other basis for
> > > > > these numbers at the moment.
> > >
> > > Any objections to cutting the HIGH value in half? Honestly we could cut
> > all
> > > values in half. In main? In 3.3? 3.2? comments?
> > >
> > > P
> > >
> > I'd feel more comfortable if we did it only for 3.4 and perhaps do a
> > backport later.  I'd like to create some queries in my performance suite
> to
> > try to exercise some of these and would be easier if I can compare 3.4
> and
> > 3.3, which are largely still the same.
> >
> > I would also half the MEDIUM cost one.
> >
> >
> > _______________________________________________
> > postgis-devel mailing list
> > postgis-devel at lists.osgeo.org
> > https://lists.osgeo.org/mailman/listinfo/postgis-devel
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20221018/bb51269e/attachment.htm>