API for optimized predicates (was Re: [postgis-devel] 1.3.3)
Martin Davis
mbdavis at refractions.net
Mon Mar 31 10:17:24 PDT 2008
(renaming this thread, since the current one is way overloaded)
I agree with Paul and Mark - there should be a simple function signature
for the fast preds. The more complex one can be provided as well, but
it will need to be VERY well documented, since it's a tricky thing to grok.
re spatial hash - would you really trust a hash to confirm identity? I
don't think I would...
Would another alternative would be to assign a hidden unique ID to each
geom entered into the DB. This could be a honking big integer, or maybe
a UUID.
Paul Ramsey wrote:
> The problem is that the memcmp hit gets worse in exactly the cases
> were we expect better and better performance from the prepared
> algorithm... still, it would be nice to know what that hit is...
> compared to the original, unprepared time, it will be small, but
> compared to the prepared-with-id-API implementation... hard to say.
>
> Something to resolve before 1.4... It's unfortunate that all the
> *fast* tests can only falsify identity, not confirm it. I was talking
> to a fellow who has done a spatial db implementation on a proprietary
> system, and he was pleased with the idea of a "geographic hash" that
> he can calculate for each shape and use to test identity. If we were
> to do something like that, it would have to be optional, like the bbox
> calculation is currently.
>
> P.
>
> On Mon, Mar 31, 2008 at 2:51 AM, Mark Cave-Ayland
> <mark.cave-ayland at siriusit.co.uk> wrote:
>
>> On Friday 28 March 2008 23:53:53 Ben Jubb wrote:
>> > Howdy,
>> > In my testing, I did see a performance hit when using the memcmp test,
>> > although it was noticable only in the largest of my test geometries
>> > (5000 vertices or so).
>> > The three parameter form seemed like the best way to go because the
>> > whole point of the prepared version of the functions was to get the best
>> > possible performance. The cases when the performance matters most is
>> > with large geoms, and then the cost of doing the memcmp is the highest.
>> > Using a third argument seemed the simplest way to get the best possible
>> > performance from the predicates, with a minimal increase in the
>> > complexity of the interface.
>> > I agree it would be nice to have a single form for those predicates that
>> > automatically determines the most efficient manner to do the tests, but
>> > there didn't seem to be any efficient way to accomplish that.
>> >
>> > b
>>
>>
>> Hi Ben,
>>
>> Well I think it really comes down to what exactly is the performance hit and
>> how did you measure it? Which platform/OS/C library did you use? Obviously
>> there will be *some* overhead having the extra memcmp() in there but does it
>> matter? For example, if the overhead is just 1-2s on a 30s query then that
>> doesn't really matter. Then again, if the overhead is 1s on a 3s query then
>> that is significant.
>>
>> Since this is a new feature then I'd be inclined to say that for a first cut
>> we should keep the standard API, and depending on the reports we get back,
>> look at improving it later. That seems a lot more preferable to having a
>> fairly nasty API hack that will catch a lot of people out :(
>>
>>
>>
>> ATB,
>>
>> Mark.
>>
>> --
>> Mark Cave-Ayland
>> Sirius Corporation - The Open Source Experts
>> http://www.siriusit.co.uk
>> T: +44 870 608 0063
>> _______________________________________________
>> postgis-devel mailing list
>> postgis-devel at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>>
>>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>
>
--
Martin Davis
Senior Technical Architect
Refractions Research, Inc.
(250) 383-3022
More information about the postgis-devel
mailing list