API for optimized predicates (was Re: [postgis-devel] 1.3.3)

Tue Apr 1 17:26:32 PDT 2008

Hmm... good point, when you say "2-param" case do you mean passing a 
NULL to the 3-param version? Because I think the 2-param version IS the 
usual un-prepared approach, which would explain your results... unless 
I'm misunderstanding Ben's patch...

Chris

Ben Jubb wrote:
> for the 3 param version, where you using an integer key, or NULL?
> b
> 
> Paul Ramsey wrote:
>> I gave this a try, but in the three-parameter case it caused the
>> backend to crash and in the two-parameter case provided the same speed
>> as the usual un-prepared approach...
>>
>> I was testing with st_contains(polycolumn, pointcolumn), with 80 polys
>> and 7000 points.
>>
>> P
>>
>> On Mon, Mar 31, 2008 at 3:50 PM, Ben Jubb <benjubb at refractions.net> wrote:
>>   
>>>  Hiya,
>>>  I've attached a patch to lwgeom_geos_c.c, modifying its 1st arg caching
>>> behaviour.
>>>  The third argument is used as before, as a surrogate key, and the caching
>>> will use that as its key;
>>>  UNLESS the key is NULL.
>>>  If the key is NULL, the predicates use the memcmp technique to determine if
>>> the cached prepared geometry is in sync with the first arg.
>>>  Note that the two caching approaches have essentially independent caches.
>>>  This patch is intended for testing purposes only.
>>>  enjoy
>>>  b
>>>
>>>
>>>
>>>  Paul Ramsey wrote:
>>>  A unique-on-insert ID would be another approach. It would, however,
>>> involve a disk-format change, so we're talking about pretty big
>>> hammers here, regardless of whether we did a hash or a uuid.
>>>
>>> Ben, maybe just stick some small timing statements into your current
>>> code... start time, end time, and then do a noop memcmp with start/end
>>> times as well. That way we can compare the memcmp times to the total
>>> times.
>>>
>>> P.
>>>
>>> On Mon, Mar 31, 2008 at 10:17 AM, Martin Davis <mbdavis at refractions.net>
>>> wrote:
>>>
>>>
>>>  (renaming this thread, since the current one is way overloaded)
>>>
>>>  I agree with Paul and Mark - there should be a simple function signature
>>>  for the fast preds. The more complex one can be provided as well, but
>>>  it will need to be VERY well documented, since it's a tricky thing to grok.
>>>
>>>  re spatial hash - would you really trust a hash to confirm identity? I
>>>  don't think I would...
>>>
>>>  Would another alternative would be to assign a hidden unique ID to each
>>>  geom entered into the DB. This could be a honking big integer, or maybe
>>>  a UUID.
>>>
>>>  Paul Ramsey wrote:
>>>  > The problem is that the memcmp hit gets worse in exactly the cases
>>>  > were we expect better and better performance from the prepared
>>>  > algorithm... still, it would be nice to know what that hit is...
>>>  > compared to the original, unprepared time, it will be small, but
>>>  > compared to the prepared-with-id-API implementation... hard to say.
>>>  >
>>>  > Something to resolve before 1.4... It's unfortunate that all the
>>>  > *fast* tests can only falsify identity, not confirm it. I was talking
>>>  > to a fellow who has done a spatial db implementation on a proprietary
>>>  > system, and he was pleased with the idea of a "geographic hash" that
>>>  > he can calculate for each shape and use to test identity. If we were
>>>  > to do something like that, it would have to be optional, like the bbox
>>>  > calculation is currently.
>>>  >
>>>  > P.
>>>  >
>>>  > On Mon, Mar 31, 2008 at 2:51 AM, Mark Cave-Ayland
>>>  > <mark.cave-ayland at siriusit.co.uk> wrote:
>>>  >
>>>  >> On Friday 28 March 2008 23:53:53 Ben Jubb wrote:
>>>  >> > Howdy,
>>>  >> > In my testing, I did see a performance hit when using the memcmp test,
>>>  >> > although it was noticable only in the largest of my test geometries
>>>  >> > (5000 vertices or so).
>>>  >> > The three parameter form seemed like the best way to go because the
>>>  >> > whole point of the prepared version of the functions was to get the
>>> best
>>>  >> > possible performance. The cases when the performance matters most is
>>>  >> > with large geoms, and then the cost of doing the memcmp is the
>>> highest.
>>>  >> > Using a third argument seemed the simplest way to get the best
>>> possible
>>>  >> > performance from the predicates, with a minimal increase in the
>>>  >> > complexity of the interface.
>>>  >> > I agree it would be nice to have a single form for those predicates
>>> that
>>>  >> > automatically determines the most efficient manner to do the tests,
>>> but
>>>  >> > there didn't seem to be any efficient way to accomplish that.
>>>  >> >
>>>  >> > b
>>>  >>
>>>  >>
>>>  >> Hi Ben,
>>>  >>
>>>  >> Well I think it really comes down to what exactly is the performance hit
>>> and
>>>  >> how did you measure it? Which platform/OS/C library did you use?
>>> Obviously
>>>  >> there will be *some* overhead having the extra memcmp() in there but
>>> does it
>>>  >> matter? For example, if the overhead is just 1-2s on a 30s query then
>>> that
>>>  >> doesn't really matter. Then again, if the overhead is 1s on a 3s query
>>> then
>>>  >> that is significant.
>>>  >>
>>>  >> Since this is a new feature then I'd be inclined to say that for a first
>>> cut
>>>  >> we should keep the standard API, and depending on the reports we get
>>> back,
>>>  >> look at improving it later. That seems a lot more preferable to having a
>>>  >> fairly nasty API hack that will catch a lot of people out :(
>>>  >>
>>>  >>
>>>  >>
>>>  >> ATB,
>>>  >>
>>>  >> Mark.
>>>  >>
>>>  >> --
>>>  >> Mark Cave-Ayland
>>>  >> Sirius Corporation - The Open Source Experts
>>>  >> http://www.siriusit.co.uk
>>>  >> T: +44 870 608 0063
>>>  >> _______________________________________________
>>>  >> postgis-devel mailing list
>>>  >> postgis-devel at postgis.refractions.net
>>>  >> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>>>  >>
>>>  >>
>>>  > _______________________________________________
>>>  > postgis-devel mailing list
>>>  > postgis-devel at postgis.refractions.net
>>>  > http://postgis.refractions.net/mailman/listinfo/postgis-devel
>>>  >
>>>  >
>>>
>>>  --
>>>  Martin Davis
>>>  Senior Technical Architect
>>>  Refractions Research, Inc.
>>>  (250) 383-3022
>>>
>>>  _______________________________________________
>>>  postgis-devel mailing list
>>>  postgis-devel at postgis.refractions.net
>>>  http://postgis.refractions.net/mailman/listinfo/postgis-devel
>>>
>>>
>>>  _______________________________________________
>>> postgis-devel mailing list
>>> postgis-devel at postgis.refractions.net
>>> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>>>
>>>
>>> _______________________________________________
>>>  postgis-devel mailing list
>>>  postgis-devel at postgis.refractions.net
>>>  http://postgis.refractions.net/mailman/listinfo/postgis-devel
>>>
>>>
>>>     
>> _______________________________________________
>> postgis-devel mailing list
>> postgis-devel at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>>   
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-devel