API for optimized predicates (was Re: [postgis-devel] 1.3.3)

Ben Jubb benjubb at refractions.net
Mon Mar 31 15:50:51 PDT 2008


Hiya,
I've attached a patch to lwgeom_geos_c.c, modifying its 1st arg caching 
behaviour.
The third argument is used as before, as a surrogate key, and the 
caching will use that as its key;
UNLESS the key is NULL. 
If the key is NULL, the predicates use the memcmp technique to determine 
if the cached prepared geometry is in sync with the first arg.
Note that the two caching approaches have essentially independent caches.
This patch is intended for testing purposes only.
enjoy
b

Paul Ramsey wrote:
> A unique-on-insert ID would be another approach. It would, however,
> involve a disk-format change, so we're talking about pretty big
> hammers here, regardless of whether we did a hash or a uuid.
>
> Ben, maybe just stick some small timing statements into your current
> code... start time, end time, and then do a noop memcmp with start/end
> times as well. That way we can compare the memcmp times to the total
> times.
>
> P.
>
> On Mon, Mar 31, 2008 at 10:17 AM, Martin Davis <mbdavis at refractions.net> wrote:
>   
>> (renaming this thread, since the current one is way overloaded)
>>
>>  I agree with Paul and Mark - there should be a simple function signature
>>  for the fast preds.  The more complex one can be provided as well, but
>>  it will need to be VERY well documented, since it's a tricky thing to grok.
>>
>>  re spatial hash - would you really trust a hash to confirm identity?  I
>>  don't think I would...
>>
>>  Would another alternative would be to assign a hidden unique ID to each
>>  geom entered into the DB.  This could be a honking big integer, or maybe
>>  a UUID.
>>
>>  Paul Ramsey wrote:
>>  > The problem is that the memcmp hit gets worse in exactly the cases
>>  > were we expect better and better performance from the prepared
>>  > algorithm...  still, it would be nice to know what that hit is...
>>  > compared to the original, unprepared time, it will be small, but
>>  > compared to the prepared-with-id-API implementation... hard to say.
>>  >
>>  > Something to resolve before 1.4... It's unfortunate that all the
>>  > *fast* tests can only falsify identity, not confirm it.  I was talking
>>  > to a fellow who has done a spatial db implementation on a proprietary
>>  > system, and he was pleased with the idea of a "geographic hash" that
>>  > he can calculate for each shape and use to test identity.  If we were
>>  > to do something like that, it would have to be optional, like the bbox
>>  > calculation is currently.
>>  >
>>  > P.
>>  >
>>  > On Mon, Mar 31, 2008 at 2:51 AM, Mark Cave-Ayland
>>  > <mark.cave-ayland at siriusit.co.uk> wrote:
>>  >
>>  >> On Friday 28 March 2008 23:53:53 Ben Jubb wrote:
>>  >>  > Howdy,
>>  >>  > In my testing, I did see a performance hit when using the memcmp test,
>>  >>  > although it was noticable only in the largest of my test geometries
>>  >>  > (5000 vertices or so).
>>  >>  > The three parameter form seemed like the best way to go because the
>>  >>  > whole point of the prepared version of the functions was to get the best
>>  >>  > possible performance.  The cases when the performance matters most is
>>  >>  > with large geoms, and then the cost of doing the memcmp is the highest.
>>  >>  > Using a third argument seemed the simplest way to get the best possible
>>  >>  > performance from the predicates, with a minimal increase in the
>>  >>  > complexity of the interface.
>>  >>  > I agree it would be nice to have a single form for those predicates that
>>  >>  > automatically determines the most efficient manner to do the tests, but
>>  >>  > there didn't seem to be any efficient way to accomplish that.
>>  >>  >
>>  >>  > b
>>  >>
>>  >>
>>  >>  Hi Ben,
>>  >>
>>  >>  Well I think it really comes down to what exactly is the performance hit and
>>  >>  how did you measure it? Which platform/OS/C library did you use? Obviously
>>  >>  there will be *some* overhead having the extra memcmp() in there but does it
>>  >>  matter? For example, if the overhead is just 1-2s on a 30s query then that
>>  >>  doesn't really matter. Then again, if the overhead is 1s on a 3s query then
>>  >>  that is significant.
>>  >>
>>  >>  Since this is a new feature then I'd be inclined to say that for a first cut
>>  >>  we should keep the standard API, and depending on the reports we get back,
>>  >>  look at improving it later. That seems a lot more preferable to having a
>>  >>  fairly nasty API hack that will catch a lot of people out :(
>>  >>
>>  >>
>>  >>
>>  >>  ATB,
>>  >>
>>  >>  Mark.
>>  >>
>>  >>  --
>>  >>  Mark Cave-Ayland
>>  >>  Sirius Corporation - The Open Source Experts
>>  >>  http://www.siriusit.co.uk
>>  >>  T: +44 870 608 0063
>>  >>  _______________________________________________
>>  >>  postgis-devel mailing list
>>  >>  postgis-devel at postgis.refractions.net
>>  >>  http://postgis.refractions.net/mailman/listinfo/postgis-devel
>>  >>
>>  >>
>>  > _______________________________________________
>>  > postgis-devel mailing list
>>  > postgis-devel at postgis.refractions.net
>>  > http://postgis.refractions.net/mailman/listinfo/postgis-devel
>>  >
>>  >
>>
>>  --
>>  Martin Davis
>>  Senior Technical Architect
>>  Refractions Research, Inc.
>>  (250) 383-3022
>>
>>  _______________________________________________
>>  postgis-devel mailing list
>>  postgis-devel at postgis.refractions.net
>>  http://postgis.refractions.net/mailman/listinfo/postgis-devel
>>
>>     
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20080331/d2aa5df6/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: memcpy.patch
Type: text/x-patch
Size: 10357 bytes
Desc: not available
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20080331/d2aa5df6/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: benjubb.vcf
Type: text/x-vcard
Size: 255 bytes
Desc: not available
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20080331/d2aa5df6/attachment.vcf>


More information about the postgis-devel mailing list