API for optimized predicates (was Re: [postgis-devel] 1.3.3)

Mon Mar 31 10:17:24 PDT 2008

(renaming this thread, since the current one is way overloaded)

I agree with Paul and Mark - there should be a simple function signature 
for the fast preds.  The more complex one can be provided as well, but 
it will need to be VERY well documented, since it's a tricky thing to grok.

re spatial hash - would you really trust a hash to confirm identity?  I 
don't think I would...

Would another alternative would be to assign a hidden unique ID to each 
geom entered into the DB.  This could be a honking big integer, or maybe 
a UUID. 

Paul Ramsey wrote:
> The problem is that the memcmp hit gets worse in exactly the cases
> were we expect better and better performance from the prepared
> algorithm...  still, it would be nice to know what that hit is...
> compared to the original, unprepared time, it will be small, but
> compared to the prepared-with-id-API implementation... hard to say.
>
> Something to resolve before 1.4... It's unfortunate that all the
> *fast* tests can only falsify identity, not confirm it.  I was talking
> to a fellow who has done a spatial db implementation on a proprietary
> system, and he was pleased with the idea of a "geographic hash" that
> he can calculate for each shape and use to test identity.  If we were
> to do something like that, it would have to be optional, like the bbox
> calculation is currently.
>
> P.
>
> On Mon, Mar 31, 2008 at 2:51 AM, Mark Cave-Ayland
> <mark.cave-ayland at siriusit.co.uk> wrote:
>   
>> On Friday 28 March 2008 23:53:53 Ben Jubb wrote:
>>  > Howdy,
>>  > In my testing, I did see a performance hit when using the memcmp test,
>>  > although it was noticable only in the largest of my test geometries
>>  > (5000 vertices or so).
>>  > The three parameter form seemed like the best way to go because the
>>  > whole point of the prepared version of the functions was to get the best
>>  > possible performance.  The cases when the performance matters most is
>>  > with large geoms, and then the cost of doing the memcmp is the highest.
>>  > Using a third argument seemed the simplest way to get the best possible
>>  > performance from the predicates, with a minimal increase in the
>>  > complexity of the interface.
>>  > I agree it would be nice to have a single form for those predicates that
>>  > automatically determines the most efficient manner to do the tests, but
>>  > there didn't seem to be any efficient way to accomplish that.
>>  >
>>  > b
>>
>>
>>  Hi Ben,
>>
>>  Well I think it really comes down to what exactly is the performance hit and
>>  how did you measure it? Which platform/OS/C library did you use? Obviously
>>  there will be *some* overhead having the extra memcmp() in there but does it
>>  matter? For example, if the overhead is just 1-2s on a 30s query then that
>>  doesn't really matter. Then again, if the overhead is 1s on a 3s query then
>>  that is significant.
>>
>>  Since this is a new feature then I'd be inclined to say that for a first cut
>>  we should keep the standard API, and depending on the reports we get back,
>>  look at improving it later. That seems a lot more preferable to having a
>>  fairly nasty API hack that will catch a lot of people out :(
>>
>>
>>
>>  ATB,
>>
>>  Mark.
>>
>>  --
>>  Mark Cave-Ayland
>>  Sirius Corporation - The Open Source Experts
>>  http://www.siriusit.co.uk
>>  T: +44 870 608 0063
>>  _______________________________________________
>>  postgis-devel mailing list
>>  postgis-devel at postgis.refractions.net
>>  http://postgis.refractions.net/mailman/listinfo/postgis-devel
>>
>>     
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>
>   

-- 
Martin Davis
Senior Technical Architect
Refractions Research, Inc.
(250) 383-3022