[postgis-devel] Prepared Geometry API

Mark Cave-Ayland mark.cave-ayland at siriusit.co.uk
Tue Oct 7 13:23:10 PDT 2008


Martin Davis wrote:
> Some comments:
> 
> - The whole point of the geometry cache key is that it checks EXACT 
> identity.  Would you really trust a hash/CRC to tell you that two 
> (potentially very large and differing only very slightly) geometries are 
> different?  I'm not sure I would....   This is a database, after all - I 
> think there's an expectation that it will return precise, correct answers.
> - re Mark's comment that "memcmp exits as soon as it detects a 
> difference".  In other words, cache misses can be cheap.  True enough, 
> but the whole point of using PreparedGeometry is that there is an 
> expectation that the majority of the tests made against the cache will 
> result in *hits*.  Suppose you have a situation where you are comparing  
> M geoms against N geoms.  You'll be accessing the cache MN times, but 
> you will only get a cache miss M times.  For large M and N this 
> essentially means that every cache check is a hit.
> Both of the above are really different aspects of the same situation.  
> Methods such as CRC can determine quickly whether two objects are 
> different.  Sometimes that exactly what you want, because you don't mind 
> paying a price when you need to check equality.  But PrepGeom is exactly 
> the opposite - checking equality is the common case.

[Wow, it's impressive how a thread snowballs in your absence, so I'll 
try and summarise what I've read from various messages...]

Okay so I see now that my thinking was focused on the basis that the 
majority of the cache accesses are expected to be hits rather than 
misses. But thinking about the CRC case even more, does it matter if 
occasionally we return a miss rather than a hit? The worse case is that 
we have to prepare a geometry once again; for a small dataset the 
runtime is small and so we don't care, and for a larger dataset the 
chances of a collision would be N times for an M*N size result which is 
still not too unreasonable.

To answer the comment that generating a CRC would involve reading the 
entire geometry anyway, I'm sure we could come up with a way of sampling 
parts the geometry based upon its deTOASTed size and generating a fixed 
size key of maybe 16-32 bytes based upon that.


ATB,

Mark.

-- 
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063



More information about the postgis-devel mailing list