[postgis-devel] cost of some functions (e.g. st_within) might be too high after 3ddc11483b
Raúl Marín
raul at rmr.ninja
Mon Sep 26 02:31:33 PDT 2022
Hi,
The main problem is that the cost of functions are a static value (same
for all inputs) and that makes it impossible for Postgis to set it to a
value that's ok for all inputs. For example, st_within could be a really
fast operation (if the bounding box don't fit), it could be a point in
polygon calculation inside Postgis or more complex operation with a
multigeometry inside another multigeometry in GEOS. Add this to the fact
that it might or might not hit several caches to speed up the process
(detoasting tuples or preparing the geometries in GEOS) and the
possibilities are endless.
To support something remotely close to this, Postgis would need to
calculate and report costs to Postgresql based on statistics (geometry
types, geometry sizes and so on) and, apart from being quite a bit of
work it's not possible right now AFAIK. The last time I looked into this
I came up with a guesstimation based on whether deserialization happened
or not and how fast or slow the operation was against a baseline, but I
only did it for unary functions and left variadic ones, like st_within,
untouched. You can see it https://github.com/postgis/postgis/pull/556.
If somebody comes up with a way to improve this in a generic way, that'd
be great.
At some points I added some tweaks to functions costs in
https://github.com/postgis/postgis/pull/556. The problem with
On 25/9/22 0:40, Tomas Vondra wrote:
> On 9/24/22 21:40, Darafei "Komяpa" Praliaskouski wrote:
>> Hi,
>>
>> Bitmap scans in the plans are usually a sign of a system's
>> random_page_cost being too high or/and work_mem being too low.
>>
> Well, work_mem is 2GB and random_page_cost is 4 (i.e. default). Maybe
> random_page_cost could have been lower, say 2, but that does not really
> solve anything because that cost was included in the original cost ~100k
> already, which now increased to 1.1M. The increase alone would be
> sufficient to make this much more expensive than bitmap index scan, and
> AFAICS that has nothing to do with random_page_cost.
>
>> In general, I'd say it is more about incorrect costing of Bitmap scans
>> in postgres itself - there is an assumption buried deep into it that
>> bitmap scan with per-page bitmap and with per-tuple bitmap will cost
>> similar, while actually it should be st_within_cost*n_tuples_per_page
>> times higher. This can sometimes be fought by higher work_mem but if
>> someone gets this addressed upstream this will be nice.
> I don't think this applies, because with work_mem=2GB we should have
> per-tuple bitmap. In fact, this is not dealing with that many tuples so
> even lower work_mem values should be enough.
>
>> For costing of postgis functions, there was a research that measuret
>> real cost, unlike the rest of postgres where costs are
>> guesswork: https://github.com/postgis/postgis/pull/104
>> <https://github.com/postgis/postgis/pull/104>
>>
> I don't find that pull pull request discussion convincing - it discusses
> which functions should cost more than other functions, yet what the
> commit 3ddc11483b is setting most of the costs to 10000 (which is not
> mentioned in the discussion at all).
>
> Moreover, the commit talks about inlining, which is not discussed in the
> PR at all, and only enables the higher costs on PG12. But that was
> released in 2019, and the discussion is from 2017 ...
>
>
> regards
>
More information about the postgis-devel
mailing list