[postgis-devel] cost of some functions (e.g. st_within) might be too high after 3ddc11483b
Tomas Vondra
tomas.vondra at 2ndquadrant.com
Mon Oct 3 09:58:58 PDT 2022
On 9/29/22 23:11, Paul Ramsey wrote:
> So, historically, we have had these these functions like
> ST_Contains() that have an implicit index operation within them, so
> you could do
>
> SELECT * FROM foo WHERE ST_Contains(geom, ...)
>
> and you'd get an efficient plan, even without adding the magical
> index operator (&&). This was being friendly to our users and
> following the standard for spatial SQL to the letter.
>
> This was accomplished with a SQL function ST_Contains that wrapped up
> the C function _ST_Contains() and the magic operator. This trick
> worked for a long time indeed (and was, in fact, the only solution)
> but it did force us to not "correctly" cost our (really expensive,
> compared to other database functions) spatial functions. When we
> pushed up costs for functions, the planner began to get angry.
> /*
> * High costs can only be used for PostGIS 3/PgSQL 12
> * where the support functions have been used in
> * place of index SQL inlining.
> * See https://trac.osgeo.org/postgis/ticket/3675
> * for sideffects of costing inlined SQL.
> */
> #if POSTGIS_PGSQL_VERSION >= 120
> #define _COST_DEFAULT COST 1
> #define _COST_LOW COST 50
> #define _COST_MEDIUM COST 500
> #define _COST_HIGH COST 10000
> #else
> #define _COST_DEFAULT COST 1
> #define _COST_LOW COST 1
> #define _COST_MEDIUM COST 10
> #define _COST_HIGH COST 10
> #endif
>
> With PgSQL 12, we were no longer in-lining, and we could bump up
> those costs without any apparently downside, with the upside that we
> even got more parallel plans for things like joins and so on. So we
> did that.
>
> Notwithstanding that there were some attempts to get metrics on the
> cost of some of our functions, we didn't really do anything
> generating true metrics behind these numbers, except in an
> order-of-magnitude sense, sort of. So there's no moral reason not to
> jimmy with them. What we lack is any sort of way of cross-comparing
> the upsides and downsides of various arbitrary numbers. Maybe we
> should but HIGH_COST in half? Why not? I don't know? Why? Because in
> this case it seems to fit better.
>
Maybe. I agree our costing model is not really exact science, so maybe
tweaking it a bit is the right thing to do. OTOH once the costs get into
production it's generally harder because it may change behavior (even if
the negative cases are rare).
> Since the actual cost of these high cost functions is in fact
> incredibly variable (depending on size of inputs) there's no actual
> right answer. Doing some extra work on function level costing could
> potentially make things less variable, but we'd still have an opaque
> function cost multiplier in the equation.
>
Yeah.
> I'm tempted to just bump the value down, and see what happens,
> navigating by sonar, as it were, since we lack any other basis for
> these numbers at the moment.
Fine with me. I don't have any better idea (or data) at the moment.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
More information about the postgis-devel
mailing list