[postgis-devel] Prepared Geometry API

Martin Davis mbdavis at refractions.net
Wed Oct 8 10:28:43 PDT 2008



Mark Cave-Ayland wrote:
> Martin Davis wrote:
>> Some comments:
>>
>> - The whole point of the geometry cache key is that it checks EXACT 
>> identity.  Would you really trust a hash/CRC to tell you that two 
>> (potentially very large and differing only very slightly) geometries 
>> are different?  I'm not sure I would....   This is a database, after 
>> all - I think there's an expectation that it will return precise, 
>> correct answers.
>> - re Mark's comment that "memcmp exits as soon as it detects a 
>> difference".  In other words, cache misses can be cheap.  True 
>> enough, but the whole point of using PreparedGeometry is that there 
>> is an expectation that the majority of the tests made against the 
>> cache will result in *hits*.  Suppose you have a situation where you 
>> are comparing  M geoms against N geoms.  You'll be accessing the 
>> cache MN times, but you will only get a cache miss M times.  For 
>> large M and N this essentially means that every cache check is a hit.
>> Both of the above are really different aspects of the same 
>> situation.  Methods such as CRC can determine quickly whether two 
>> objects are different.  Sometimes that exactly what you want, because 
>> you don't mind paying a price when you need to check equality.  But 
>> PrepGeom is exactly the opposite - checking equality is the common case.
>
> [Wow, it's impressive how a thread snowballs in your absence, so I'll 
> try and summarise what I've read from various messages...]
>
> Okay so I see now that my thinking was focused on the basis that the 
> majority of the cache accesses are expected to be hits rather than 
> misses. But thinking about the CRC case even more, does it matter if 
> occasionally we return a miss rather than a hit? The worse case is 
> that we have to prepare a geometry once again; for a small dataset the 
> runtime is small and so we don't care, and for a larger dataset the 
> chances of a collision would be N times for an M*N size result which 
> is still not too unreasonable.
Agreed.  The bad situation is where you get a hit which should have been 
a miss.  Then you get the wrong answer. 
>
> To answer the comment that generating a CRC would involve reading the 
> entire geometry anyway, I'm sure we could come up with a way of 
> sampling parts the geometry based upon its deTOASTed size and 
> generating a fixed size key of maybe 16-32 bytes based upon that.
>
>
> ATB,
>
> Mark.
>

-- 
Martin Davis
Senior Technical Architect
Refractions Research, Inc.
(250) 383-3022




More information about the postgis-devel mailing list