[postgis-devel] Recheck Considered Harmful (or, at least, Slow)
David Fuhry
dfuhry at acm.org
Sat Oct 11 16:29:38 PDT 2008
Paul,
Some broader questions. Does the removal of the RECHECK clause from
gist_geometry_ops for 8.4+ means that RECHECK is now defined in the code
rather than in SQL opclass definition? (I'm looking at the *recheck =
true in LWGEOM_gist_consistent()).
My understanding of exact GIST behavior here is fuzzy. I don't
quite understand what geometries would be rejected only by the
LWGEOM_gist_consistent() RECHECK, given that the bbox extracted from the
geometry is of type float (BOX2DFLOAT4), not double precision. Won't
the extracted getbox2d_p bbox match exactly the bbox from the index?
Without RECHECK, could current behavior be preserved by adding an
"AND ST_Intersects(...)" to the query? If so, that would be a simple
justification for the change. I'm using 8.3 without RECHECK but I
didn't realize it skipped anything beyond the SRID equality check.
In the worst case, pg_detoast_datum_slice would have to grab only
the first VARHDRSZ + sizeof(flags) + sizeof(SRID) + sizeof(BOX2DFLOAT4)
bytes, is that right? It seems there's no downside.
-Dave
Paul Ramsey wrote:
> Since we got all that great performance from the prepared geometry, I
> wondered to myself, I did, "what's the next biggest CPU burner in that
> workload"?
>
> So, for the first time, I bothered to put PostgreSQL/PostGIS under the
> profiler.
>
> And for my standard "8000 polygons in 80 larger polygons" use case,
> the answer is "memcpy".
>
> In particular, it is the memcpy being called as a result of
>
> -> toast_fetch_datum
> -> heap_tuple_untoas_attr
> -> LWGEOM_gist_consistent
>
> Which accounts for 7% of all CPU ticks, the largest single block
> remaining. (I was actually wondering if the LWGEOM2GEOS conversion
> was a big overhead, but it's probably only worth 2%, adding together
> all the different ones. One of the side-effects of caching is that a
> large % of those PostGIS->GEOS conversions also go away, because the
> largest geometry in most comparison pairs is being pulled out of the
> cache.)
>
> So:
>
> First, do we really, really, really need to de-toast the geometry to
> check consistent? We are deliberately expanding to create the float
> bbox so it should always return a proper superset of what we ask for,
> which is the point, it's a pre-filter, not a final filter. The only
> people we are screwing by not doing a re-check are the people who
> expect an exact result set from a bbox query, and most people just
> want a "good enough set".
>
> Second, if we do, really, really, really need to check consistent for
> these damned bboxes, perhaps only pulling the first few bytes, with
> pg_detoast_datum_slice
> <http://doxygen.postgresql.org/fmgr_8c.html#b149334aa198409ab2dffa65c8538f24>
> would do the trick?
>
> Thoughts?
>
> Paul
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-devel
More information about the postgis-devel
mailing list