[postgis-devel] Recheck Considered Harmful (or, at least, Slow)

Paul Ramsey pramsey at cleverelephant.ca
Sat Oct 11 15:07:05 PDT 2008


Since we got all that great performance from the prepared geometry, I
wondered to myself, I did, "what's the next biggest CPU burner in that
workload"?

So, for the first time, I bothered to put PostgreSQL/PostGIS under the
profiler.

And for my standard "8000 polygons in 80 larger polygons" use case,
the answer is "memcpy".

In particular, it is the memcpy being called as a result of

-> toast_fetch_datum
  -> heap_tuple_untoas_attr
    -> LWGEOM_gist_consistent

Which accounts for 7% of all CPU ticks, the largest single block
remaining.  (I was actually wondering if the LWGEOM2GEOS conversion
was a big overhead, but it's probably only worth 2%, adding together
all the different ones. One of the side-effects of caching is that a
large % of those PostGIS->GEOS conversions also go away, because the
largest geometry in most comparison pairs is being pulled out of the
cache.)

So:

First, do we really, really, really need to de-toast the geometry to
check consistent? We are deliberately expanding to create the float
bbox so it should always return a proper superset of what we ask for,
which is the point, it's a pre-filter, not a final filter. The only
people we are screwing by not doing a re-check are the people who
expect an exact result set from a bbox query, and most people just
want a "good enough set".

Second, if we do, really, really, really need to check consistent for
these damned bboxes, perhaps only pulling the first few bytes, with
pg_detoast_datum_slice
<http://doxygen.postgresql.org/fmgr_8c.html#b149334aa198409ab2dffa65c8538f24>
would do the trick?

Thoughts?

Paul



More information about the postgis-devel mailing list