[postgis-devel] Prepared Geometry API
Martin Davis
mbdavis at refractions.net
Wed Oct 8 10:28:43 PDT 2008
Mark Cave-Ayland wrote:
> Martin Davis wrote:
>> Some comments:
>>
>> - The whole point of the geometry cache key is that it checks EXACT
>> identity. Would you really trust a hash/CRC to tell you that two
>> (potentially very large and differing only very slightly) geometries
>> are different? I'm not sure I would.... This is a database, after
>> all - I think there's an expectation that it will return precise,
>> correct answers.
>> - re Mark's comment that "memcmp exits as soon as it detects a
>> difference". In other words, cache misses can be cheap. True
>> enough, but the whole point of using PreparedGeometry is that there
>> is an expectation that the majority of the tests made against the
>> cache will result in *hits*. Suppose you have a situation where you
>> are comparing M geoms against N geoms. You'll be accessing the
>> cache MN times, but you will only get a cache miss M times. For
>> large M and N this essentially means that every cache check is a hit.
>> Both of the above are really different aspects of the same
>> situation. Methods such as CRC can determine quickly whether two
>> objects are different. Sometimes that exactly what you want, because
>> you don't mind paying a price when you need to check equality. But
>> PrepGeom is exactly the opposite - checking equality is the common case.
>
> [Wow, it's impressive how a thread snowballs in your absence, so I'll
> try and summarise what I've read from various messages...]
>
> Okay so I see now that my thinking was focused on the basis that the
> majority of the cache accesses are expected to be hits rather than
> misses. But thinking about the CRC case even more, does it matter if
> occasionally we return a miss rather than a hit? The worse case is
> that we have to prepare a geometry once again; for a small dataset the
> runtime is small and so we don't care, and for a larger dataset the
> chances of a collision would be N times for an M*N size result which
> is still not too unreasonable.
Agreed. The bad situation is where you get a hit which should have been
a miss. Then you get the wrong answer.
>
> To answer the comment that generating a CRC would involve reading the
> entire geometry anyway, I'm sure we could come up with a way of
> sampling parts the geometry based upon its deTOASTed size and
> generating a fixed size key of maybe 16-32 bytes based upon that.
>
>
> ATB,
>
> Mark.
>
--
Martin Davis
Senior Technical Architect
Refractions Research, Inc.
(250) 383-3022
More information about the postgis-devel
mailing list