[postgis-devel] Recheck Considered Harmful (or, at least, Slow)

David Fuhry dfuhry at acm.org
Sat Oct 11 16:29:38 PDT 2008


Paul,

    Some broader questions.  Does the removal of the RECHECK clause from 
gist_geometry_ops for 8.4+ means that RECHECK is now defined in the code 
rather than in SQL opclass definition?  (I'm looking at the *recheck = 
true in LWGEOM_gist_consistent()).

    My understanding of exact GIST behavior here is fuzzy.  I don't 
quite understand what geometries would be rejected only by the 
LWGEOM_gist_consistent() RECHECK, given that the bbox extracted from the 
geometry is of type float (BOX2DFLOAT4), not double precision.  Won't 
the extracted getbox2d_p bbox match exactly the bbox from the index?

    Without RECHECK, could current behavior be preserved by adding an 
"AND ST_Intersects(...)" to the query?  If so, that would be a simple 
justification for the change.  I'm using 8.3 without RECHECK but I 
didn't realize it skipped anything beyond the SRID equality check.

    In the worst case, pg_detoast_datum_slice would have to grab only 
the first VARHDRSZ + sizeof(flags) + sizeof(SRID) + sizeof(BOX2DFLOAT4) 
bytes, is that right?  It seems there's no downside.

-Dave

Paul Ramsey wrote:
> Since we got all that great performance from the prepared geometry, I
> wondered to myself, I did, "what's the next biggest CPU burner in that
> workload"?
> 
> So, for the first time, I bothered to put PostgreSQL/PostGIS under the
> profiler.
> 
> And for my standard "8000 polygons in 80 larger polygons" use case,
> the answer is "memcpy".
> 
> In particular, it is the memcpy being called as a result of
> 
> -> toast_fetch_datum
>   -> heap_tuple_untoas_attr
>     -> LWGEOM_gist_consistent
> 
> Which accounts for 7% of all CPU ticks, the largest single block
> remaining.  (I was actually wondering if the LWGEOM2GEOS conversion
> was a big overhead, but it's probably only worth 2%, adding together
> all the different ones. One of the side-effects of caching is that a
> large % of those PostGIS->GEOS conversions also go away, because the
> largest geometry in most comparison pairs is being pulled out of the
> cache.)
> 
> So:
> 
> First, do we really, really, really need to de-toast the geometry to
> check consistent? We are deliberately expanding to create the float
> bbox so it should always return a proper superset of what we ask for,
> which is the point, it's a pre-filter, not a final filter. The only
> people we are screwing by not doing a re-check are the people who
> expect an exact result set from a bbox query, and most people just
> want a "good enough set".
> 
> Second, if we do, really, really, really need to check consistent for
> these damned bboxes, perhaps only pulling the first few bytes, with
> pg_detoast_datum_slice
> <http://doxygen.postgresql.org/fmgr_8c.html#b149334aa198409ab2dffa65c8538f24>
> would do the trick?
> 
> Thoughts?
> 
> Paul
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-devel



More information about the postgis-devel mailing list