[postgis-devel] PostGIS function costs

Paul Norman penorman at mac.com
Thu May 5 23:35:37 PDT 2016


Short version: I have a way to get accurate PostGIS function costs and
will be doing a PR.

In an effort to allow setting better function costs, I ran some benchmarks.

PostgreSQL function costs are defined in units of cpu_operator_cost and
default to 1 for C functions and 100 for other language functions. Some
PostGIS functions are set to a cost of 100, but I'm not aware of any
prior efforts to measure the cost.

For a representative dataset, I used a table of
(id bigint, geom geometry(Geometry,3857)) with the geometries being from
  a UK OSM extract. There were 5504680 rows and the planner estimated
  5515609 rows. The test polygons were 1401 MB.

PostgreSQL 9.4.6 on Ubuntu 14.04 was used, and the full PostGIS version
was POSTGIS="2.1.8 r13780" GEOS="3.4.2-CAPI-1.8.2 r3921"
PROJ="Rel. 4.8.0, 6 March 2012" GDAL="GDAL 1.10.1, released 2013/08/26"
LIBXML="2.9.1" LIBJSON="UNKNOWN" RASTER. The machine has 32GB RAM, and
an Intel i7-4770 @ 3.40GHz

To establish costs, I looked at costs and execution times for functions
a known cost, and compared it with PostGIS functions. This was done by
executing the statement EXPLAIN ANALYZE SELECT <function> FROM test_polys;
  five times after first warming up the cache. From the EXPLAIN output,
  I got planner costs and execution times.

First, I had to find out the time/cost for a known function. I started
with bigint addition, comparing the difference in time and cost between
<function> = id and <function> = id + 1∷bigint. I could then get a few 
values

-    Cost (from addition)
-    Time (from addition)
-    Cost per est. row
-    Time per actual row
-    Time per cost (row basis)

Repeating with <function> = geom and <function> = ST_Buffer(geom,0) let
me get time per actual row for ST_Buffer, which could then be divided by
time per cost of addition and cpu_operator_cost, giving me function cost
in units of cpu_operator_cost.

For greater accuracy, I averaged over bigint addition, bigint subtraction,
bigint modulo 3, bigint divided by 3, bigint squared, and square root.

Selected results

ST_Buffer(geom, 0): 740
ST_Buffer(geom, 10): 2102
ST_Boundary(geom): 59
ST_CoordDim(geom): 5
ST_Dimension(geom): 9
ST_IsValid(geom): 903
ST_IsValidDetail(geom): 1043

I’m going to prepare a PR which sets costs for most functions. These
initial costs will probably need adjustment over time, but I’d like to
get some evidence-based costs in place, even if they'll need later
adjustment.




More information about the postgis-devel mailing list