[postgis-devel] cost of some functions (e.g. st_within) might be too high after 3ddc11483b

Mon Oct 3 09:58:58 PDT 2022

On 9/29/22 23:11, Paul Ramsey wrote:
> So, historically, we have had these these functions like
> ST_Contains() that have an implicit index operation within them, so
> you could do
> 
> SELECT * FROM foo WHERE ST_Contains(geom, ...)
> 
> and you'd get an efficient plan, even without adding the magical 
> index operator (&&). This was being friendly to our users and
> following the standard for spatial SQL to the letter.
> 
> This was accomplished with a SQL function ST_Contains that wrapped up
> the C function _ST_Contains() and the magic operator. This trick 
> worked for a long time indeed (and was, in fact, the only solution) 
> but it did force us to not "correctly" cost our (really expensive,
> compared to other database functions) spatial functions. When we
> pushed up costs for functions, the planner began to get angry.

> /*
>  * High costs can only be used for PostGIS 3/PgSQL 12
>  * where the support functions have been used in
>  * place of index SQL inlining.
>  * See https://trac.osgeo.org/postgis/ticket/3675
>  * for sideffects of costing inlined SQL.
>  */
> #if POSTGIS_PGSQL_VERSION >= 120
> #define _COST_DEFAULT COST 1
> #define _COST_LOW COST 50
> #define _COST_MEDIUM COST 500
> #define _COST_HIGH COST 10000
> #else
> #define _COST_DEFAULT COST 1
> #define _COST_LOW COST 1
> #define _COST_MEDIUM COST 10
> #define _COST_HIGH COST 10
> #endif
> 
> With PgSQL 12, we were no longer in-lining, and we could bump up 
> those costs without any apparently downside, with the upside that we
> even got more parallel plans for things like joins and so on. So we
> did that.
> 
> Notwithstanding that there were some attempts to get metrics on the 
> cost of some of our functions, we didn't really do anything
> generating true metrics behind these numbers, except in an
> order-of-magnitude sense, sort of. So there's no moral reason not to
> jimmy with them. What we lack is any sort of way of cross-comparing
> the upsides and downsides of various arbitrary numbers. Maybe we
> should but HIGH_COST in half? Why not? I don't know? Why? Because in
> this case it seems to fit better.
> 

Maybe. I agree our costing model is not really exact science, so maybe
tweaking it a bit is the right thing to do. OTOH once the costs get into
production it's generally harder because it may change behavior (even if
the negative cases are rare).

> Since the actual cost of these high cost functions is in fact 
> incredibly variable (depending on size of inputs) there's no actual 
> right answer. Doing some extra work on function level costing could 
> potentially make things less variable, but we'd still have an opaque 
> function cost multiplier in the equation.
> 

Yeah.

> I'm tempted to just bump the value down, and see what happens, 
> navigating by sonar, as it were, since we lack any other basis for
> these numbers at the moment.

Fine with me. I don't have any better idea (or data) at the moment.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company