[postgis-devel] Prepared Geometry API

Chris Hodgson chodgson at refractions.net
Mon Oct 6 10:42:11 PDT 2008


The problem with both memcmp and hashing is that the case that we our 
trying to speed up, where the geometry is the same as the previous one, 
invokes the worst case run time. memcmp WILL have to compare every byte, 
and your hashes WILL be equal, so you WILL have to do a memcmp anyway to 
check that it's not due to a collision. In the cases that we are really 
trying to speed-up, the number of successful comparisons will be 
hundreds to millions of times more than the unsuccessful comparisons.

The cases that get the biggest speedup from the cached prepared geometry 
are the big geometries, which are also the ones that take the longest to 
memcmp.

Perhaps there any way to ask postgresql if we are still working with the 
same tuple? The query engine must know, at some level, which row is on 
the side of the join or condition that isn't changing, and which one is 
being looped through... getting at that info is the only alternative I 
can see to the Id approach that doesn't have most of it's performance 
gain destroyed by memcmp. Or can we hack in something like 
generate_series to automatically give us a "tuple id"?

Chris

Mark Cave-Ayland wrote:
> It's not so much an index, just a unique identifier for each geometry 
> that can be used to determine whether it is already in the prepared 
> cache. At the moment, synthetic keys are used with an extended API so 
> as to provide a direct key into the cache. I'm wondering if we could 
> use something else such as a CRC32 (assuming the PostgreSQL hash 
> implementation handles collisions using memcmp() internally).
>
> *thinks*... maybe GEOS should generate a CRC32 hash key as part of the 
> creation of the prepared geometry? Assuming we could access this using 
> a the GEOS CAPI, it would just be a case of handling the few 
> collisions using memcmp()...
>




More information about the postgis-devel mailing list