[postgis-devel] cost of some functions (e.g. st_within) might be too high after 3ddc11483b

Paul Ramsey pramsey at cleverelephant.ca
Mon Sep 26 09:29:55 PDT 2022


Dynamic costing based on column size (main determinant of costs) would be interesting. However, in the meanwhile it's possible we just need to tweak the constants we have.

I need to check the dates, but one reason function costs got plumped waaaay up is that it was quite hard to get a parallel plan out of PgSQL without hitting it on the head with a hammer, and being CPU-bound a lot of the time, it was/is frequently good to get a parallel plan. The difficulty of getting a parallel plan went down with PgSQL 12, so maybe we need a switch in version, or just a general reduction in some of the more extreme costs. 

How far down did you need to drop costs to get your particular query flipping into a smarter plan again?

P

> On Sep 26, 2022, at 2:31 AM, Raúl Marín <raul at rmr.ninja> wrote:
> 
> Hi,
> 
> The main problem is that the cost of functions are a static value (same for all inputs) and that makes it impossible for Postgis to set it to a value that's ok for all inputs. For example, st_within could be a really fast operation (if the bounding box don't fit), it could be a point in polygon calculation inside Postgis or more complex operation with a multigeometry inside another multigeometry in GEOS. Add this to the fact that it might or might not hit several caches to speed up the process (detoasting tuples or preparing the geometries in GEOS) and the possibilities are endless.
> 
> To support something remotely close to this, Postgis would need to calculate and report costs to Postgresql based on statistics (geometry types, geometry sizes and so on) and, apart from being quite a bit of work it's not possible right now AFAIK. The last time I looked into this I came up with a guesstimation based on whether deserialization happened or not and how fast or slow the operation was against a baseline, but I only did it for unary functions and left variadic ones, like st_within, untouched. You can see it https://github.com/postgis/postgis/pull/556.
> 
> If somebody comes up with a way to improve this in a generic way, that'd be great.
> 
> 
> At some points I added some tweaks to functions costs in https://github.com/postgis/postgis/pull/556. The problem with
> 
> On 25/9/22 0:40, Tomas Vondra wrote:
>> On 9/24/22 21:40, Darafei "Komяpa" Praliaskouski wrote:
>>> Hi,
>>> 
>>> Bitmap scans in the plans are usually a sign of a system's
>>> random_page_cost being too high or/and work_mem being too low.
>>> 
>> Well, work_mem is 2GB and random_page_cost is 4 (i.e. default). Maybe
>> random_page_cost could have been lower, say 2, but that does not really
>> solve anything because that cost was included in the original cost ~100k
>> already, which now increased to 1.1M. The increase alone would be
>> sufficient to make this much more expensive than bitmap index scan, and
>> AFAICS that has nothing to do with random_page_cost.
>> 
>>> In general, I'd say it is more about incorrect costing of Bitmap scans
>>> in postgres itself - there is an assumption buried deep into it that
>>> bitmap scan with per-page bitmap and with per-tuple bitmap will cost
>>> similar, while actually it should be st_within_cost*n_tuples_per_page
>>> times higher. This can sometimes be fought by higher work_mem but if
>>> someone gets this addressed upstream this will be nice.
>> I don't think this applies, because with work_mem=2GB we should have
>> per-tuple bitmap. In fact, this is not dealing with that many tuples so
>> even lower work_mem values should be enough.
>> 
>>> For costing of postgis functions, there was a research that measuret
>>> real cost, unlike the rest of postgres where costs are
>>> guesswork: https://github.com/postgis/postgis/pull/104
>>> <https://github.com/postgis/postgis/pull/104>
>>> 
>> I don't find that pull pull request discussion convincing - it discusses
>> which functions should cost more than other functions, yet what the
>> commit 3ddc11483b is setting most of the costs to 10000 (which is not
>> mentioned in the discussion at all).
>> 
>> Moreover, the commit talks about inlining, which is not discussed in the
>> PR at all, and only enables the higher costs on PG12. But that was
>> released in 2019, and the discussion is from 2017 ...
>> 
>> 
>> regards
>> 
> 
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-devel



More information about the postgis-devel mailing list