[postgis-devel] cost of some functions (e.g. st_within) might be too high after 3ddc11483b

Tomas Vondra tomas.vondra at 2ndquadrant.com
Sat Sep 24 12:13:46 PDT 2022


Hi,

We've been investigating a customer issue, where queries got much slower
due to a query plan change after upgrading from PG 9.5 with PostGIS 2.5
to PG 12 with PostGIS 3.1.

The exact query and query plan is not very complicated or important -
it's a simple non-join query, with st_within() in a WHERE condition (and
a LIMIT on top), and the cardinality estimates are pretty spot on.

The interesting bit is that on 9.5 the GIST index is accessed like this:

   ->  Index Scan using my_gist__index on public.my_table sc
       (cost=0.42..168630.72 rows=412 width=64) ...

while on PG12 the index scan is costed like this:

   ->  Index Scan using my_gist_index on public.my_table sc
       (cost=0.42..1165450.26 rows=322 width=72) ...

Yup, that's a 1M difference in cost, because in cost_index we do this:

    cpu_run_cost += cpu_per_tuple * tuples_fetched;

with tuples_fetched being ~40k rows, and the cpu_per_tuple jumps from
0.29 to more than 25.0. This happens because of st_within() in the WHERE
condition, and procost getting increased from 100 to 10000 in commit

  commit 3ddc11483b2266d3a914ecfdeb3e2a343e0f6dd4
  Author:     Paul Ramsey <pramsey at cleverelephant.ca> []
  AuthorDate: Fri Mar 8 21:48:08 2019 +0000
  CommitDate: Fri Mar 8 21:48:08 2019 +0000

      Add costs to functions in the form of macros:
      _COST_LOW, _COST_MEDIUM, _COST_HIGH
      Macros are set in sqldefines.h.in and are switched
      on PgSQL version. Prior to PG12, costs should be
      small or nonexistent, to avoid SQL inlining issues.
      After Pg12, costs can be higher to hopefully
      force more parallelism in plans.
      Closes #4344

This however makes the index scan too expensive, and makes the optimizer
to pick parallel query with a bitmap index scan instead. Which runs in
about 1000 ms, while the original plan (after pushing the right enable_
GUCs) runs in ~10ms.

I agree the idea of commit 3ddc11483b might be right in principle, but I
wonder if it went a bit too far - maybe jump from 100 to 10000 is a bit
too much? I tried searching the postgis-devel archives for explanation
why this is the right increase, but I haven't found anything.

Of course, I can do

    ALTER FUNCTION  st_within(geometry,geometry) COST 100

to revert to the original cost, but doing that on every PostGIS install
seems a bit impractical. Furthermore, there seem to be about ~120
functions with this cost, and I'm not sure I want to tweak all of them.

Opinions?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


More information about the postgis-devel mailing list