[postgis-devel] cost of some functions (e.g. st_within) might be too high after 3ddc11483b

Thu Oct 20 14:32:54 PDT 2022

Appreciated. I'm not sure what tests to suggest however, it's quite hard to pin down specific workloads.
Like, "would like to get parallel plans for CPU bound things", but... we don't have a fixed test set for that, and even if we did it would just be onen workload.
P

> On Oct 18, 2022, at 7:34 PM, Bruce Rindahl <bruce.rindahl at gmail.com> wrote:
> 
> I am willing to set up a test environment on a arm64 machine to run tests for this.  I did something like this to test the advantage of a distributed database vs stand alone.  If you could set up a matrix on the requirements (function, PG/PostGIS versions, input data, etc.) I could play with it.  It would be a raspberry pi 4, 64 bit, and 4 or 8GB memory.  Doing it via Jenkins would be also possible.  Advantage on the pi is it is slow enough that improvements should be obvious.
> 
> On Tue, Oct 18, 2022, 3:49 PM Paul Ramsey <pramsey at cleverelephant.ca> wrote:
> Going to track this here https://trac.osgeo.org/postgis/ticket/5259
> 
> On Tue, Oct 4, 2022 at 7:26 PM Regina Obe <lr at pcorp.us> wrote:
> >
> > > On Mon, Oct 3, 2022 at 9:59 AM Tomas Vondra
> > > <tomas.vondra at 2ndquadrant.com> wrote:
> > >
> > > >
> > > > On 9/29/22 23:11, Paul Ramsey wrote:
> > > > > So, historically, we have had these these functions like
> > > > > ST_Contains() that have an implicit index operation within them, so
> > > > > you could do
> > > > >
> > > > > SELECT * FROM foo WHERE ST_Contains(geom, ...)
> > > > >
> > > > > and you'd get an efficient plan, even without adding the magical
> > > > > index operator (&&). This was being friendly to our users and
> > > > > following the standard for spatial SQL to the letter.
> > > > >
> > > > > This was accomplished with a SQL function ST_Contains that wrapped
> > > > > up the C function _ST_Contains() and the magic operator. This trick
> > > > > worked for a long time indeed (and was, in fact, the only solution)
> > > > > but it did force us to not "correctly" cost our (really expensive,
> > > > > compared to other database functions) spatial functions. When we
> > > > > pushed up costs for functions, the planner began to get angry.
> > > >
> > > > > /*
> > > > >  * High costs can only be used for PostGIS 3/PgSQL 12
> > > > >  * where the support functions have been used in
> > > > >  * place of index SQL inlining.
> > > > >  * See https://trac.osgeo.org/postgis/ticket/3675
> > > > >  * for sideffects of costing inlined SQL.
> > > > >  */
> > > > > #if POSTGIS_PGSQL_VERSION >= 120
> > > > > #define _COST_DEFAULT COST 1
> > > > > #define _COST_LOW COST 50
> > > > > #define _COST_MEDIUM COST 500
> > > > > #define _COST_HIGH COST 10000
> > > > > #else
> > > > > #define _COST_DEFAULT COST 1
> > > > > #define _COST_LOW COST 1
> > > > > #define _COST_MEDIUM COST 10
> > > > > #define _COST_HIGH COST 10
> > > > > #endif
> > > > >
> > > > > With PgSQL 12, we were no longer in-lining, and we could bump up
> > > > > those costs without any apparently downside, with the upside that we
> > > > > even got more parallel plans for things like joins and so on. So we
> > > > > did that.
> > > > >
> > > > > Notwithstanding that there were some attempts to get metrics on the
> > > > > cost of some of our functions, we didn't really do anything
> > > > > generating true metrics behind these numbers, except in an
> > > > > order-of-magnitude sense, sort of. So there's no moral reason not to
> > > > > jimmy with them. What we lack is any sort of way of cross-comparing
> > > > > the upsides and downsides of various arbitrary numbers. Maybe we
> > > > > should but HIGH_COST in half? Why not? I don't know? Why? Because in
> > > > > this case it seems to fit better.
> > > > >
> > > >
> > > > Maybe. I agree our costing model is not really exact science, so maybe
> > > > tweaking it a bit is the right thing to do. OTOH once the costs get
> > > > into production it's generally harder because it may change behavior
> > > > (even if the negative cases are rare).
> > > >
> > > > > Since the actual cost of these high cost functions is in fact
> > > > > incredibly variable (depending on size of inputs) there's no actual
> > > > > right answer. Doing some extra work on function level costing could
> > > > > potentially make things less variable, but we'd still have an opaque
> > > > > function cost multiplier in the equation.
> > > > >
> > > >
> > > > Yeah.
> > > >
> > > > > I'm tempted to just bump the value down, and see what happens,
> > > > > navigating by sonar, as it were, since we lack any other basis for
> > > > > these numbers at the moment.
> > >
> > > Any objections to cutting the HIGH value in half? Honestly we could cut
> > all
> > > values in half. In main? In 3.3? 3.2? comments?
> > >
> > > P
> > >
> > I'd feel more comfortable if we did it only for 3.4 and perhaps do a
> > backport later.  I'd like to create some queries in my performance suite to
> > try to exercise some of these and would be easier if I can compare 3.4 and
> > 3.3, which are largely still the same.
> >
> > I would also half the MEDIUM cost one.
> >
> >
> > _______________________________________________
> > postgis-devel mailing list
> > postgis-devel at lists.osgeo.org
> > https://lists.osgeo.org/mailman/listinfo/postgis-devel
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-devel
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-devel