API for optimized predicates (was Re: [postgis-devel] 1.3.3)
Martin Davis
mbdavis at refractions.net
Wed Apr 2 09:43:49 PDT 2008
So are you going to test the NULL case?
Paul Ramsey wrote:
> Right. So, unsurprisingly, the 2-param case returned the same timing,
> since it *was* the same code line.
>
> The 3-param case I ran was ST_Contains(ed.the_geom, v.centroid,
> ed.gid), so the numeric case, not the NULL case.
>
> P
>
> On Tue, Apr 1, 2008 at 5:26 PM, Chris Hodgson <chodgson at refractions.net> wrote:
>
>> Hmm... good point, when you say "2-param" case do you mean passing a
>> NULL to the 3-param version? Because I think the 2-param version IS the
>> usual un-prepared approach, which would explain your results... unless
>> I'm misunderstanding Ben's patch...
>>
>> Chris
>>
>>
>>
>> Ben Jubb wrote:
>> > for the 3 param version, where you using an integer key, or NULL?
>> > b
>> >
>> > Paul Ramsey wrote:
>> >> I gave this a try, but in the three-parameter case it caused the
>> >> backend to crash and in the two-parameter case provided the same speed
>> >> as the usual un-prepared approach...
>> >>
>> >> I was testing with st_contains(polycolumn, pointcolumn), with 80 polys
>> >> and 7000 points.
>> >>
>> >> P
>> >>
>> >> On Mon, Mar 31, 2008 at 3:50 PM, Ben Jubb <benjubb at refractions.net> wrote:
>> >>
>> >>> Hiya,
>> >>> I've attached a patch to lwgeom_geos_c.c, modifying its 1st arg caching
>> >>> behaviour.
>> >>> The third argument is used as before, as a surrogate key, and the caching
>> >>> will use that as its key;
>> >>> UNLESS the key is NULL.
>> >>> If the key is NULL, the predicates use the memcmp technique to determine if
>> >>> the cached prepared geometry is in sync with the first arg.
>> >>> Note that the two caching approaches have essentially independent caches.
>> >>> This patch is intended for testing purposes only.
>> >>> enjoy
>> >>> b
>> >>>
>> >>>
>> >>>
>> >>> Paul Ramsey wrote:
>> >>> A unique-on-insert ID would be another approach. It would, however,
>> >>> involve a disk-format change, so we're talking about pretty big
>> >>> hammers here, regardless of whether we did a hash or a uuid.
>> >>>
>> >>> Ben, maybe just stick some small timing statements into your current
>> >>> code... start time, end time, and then do a noop memcmp with start/end
>> >>> times as well. That way we can compare the memcmp times to the total
>> >>> times.
>> >>>
>> >>> P.
>> >>>
>> >>> On Mon, Mar 31, 2008 at 10:17 AM, Martin Davis <mbdavis at refractions.net>
>> >>> wrote:
>> >>>
>> >>>
>> >>> (renaming this thread, since the current one is way overloaded)
>> >>>
>> >>> I agree with Paul and Mark - there should be a simple function signature
>> >>> for the fast preds. The more complex one can be provided as well, but
>> >>> it will need to be VERY well documented, since it's a tricky thing to grok.
>> >>>
>> >>> re spatial hash - would you really trust a hash to confirm identity? I
>> >>> don't think I would...
>> >>>
>> >>> Would another alternative would be to assign a hidden unique ID to each
>> >>> geom entered into the DB. This could be a honking big integer, or maybe
>> >>> a UUID.
>> >>>
>> >>> Paul Ramsey wrote:
>> >>> > The problem is that the memcmp hit gets worse in exactly the cases
>> >>> > were we expect better and better performance from the prepared
>> >>> > algorithm... still, it would be nice to know what that hit is...
>> >>> > compared to the original, unprepared time, it will be small, but
>> >>> > compared to the prepared-with-id-API implementation... hard to say.
>> >>> >
>> >>> > Something to resolve before 1.4... It's unfortunate that all the
>> >>> > *fast* tests can only falsify identity, not confirm it. I was talking
>> >>> > to a fellow who has done a spatial db implementation on a proprietary
>> >>> > system, and he was pleased with the idea of a "geographic hash" that
>> >>> > he can calculate for each shape and use to test identity. If we were
>> >>> > to do something like that, it would have to be optional, like the bbox
>> >>> > calculation is currently.
>> >>> >
>> >>> > P.
>> >>> >
>> >>> > On Mon, Mar 31, 2008 at 2:51 AM, Mark Cave-Ayland
>> >>> > <mark.cave-ayland at siriusit.co.uk> wrote:
>> >>> >
>> >>> >> On Friday 28 March 2008 23:53:53 Ben Jubb wrote:
>> >>> >> > Howdy,
>> >>> >> > In my testing, I did see a performance hit when using the memcmp test,
>> >>> >> > although it was noticable only in the largest of my test geometries
>> >>> >> > (5000 vertices or so).
>> >>> >> > The three parameter form seemed like the best way to go because the
>> >>> >> > whole point of the prepared version of the functions was to get the
>> >>> best
>> >>> >> > possible performance. The cases when the performance matters most is
>> >>> >> > with large geoms, and then the cost of doing the memcmp is the
>> >>> highest.
>> >>> >> > Using a third argument seemed the simplest way to get the best
>> >>> possible
>> >>> >> > performance from the predicates, with a minimal increase in the
>> >>> >> > complexity of the interface.
>> >>> >> > I agree it would be nice to have a single form for those predicates
>> >>> that
>> >>> >> > automatically determines the most efficient manner to do the tests,
>> >>> but
>> >>> >> > there didn't seem to be any efficient way to accomplish that.
>> >>> >> >
>> >>> >> > b
>> >>> >>
>> >>> >>
>> >>> >> Hi Ben,
>> >>> >>
>> >>> >> Well I think it really comes down to what exactly is the performance hit
>> >>> and
>> >>> >> how did you measure it? Which platform/OS/C library did you use?
>> >>> Obviously
>> >>> >> there will be *some* overhead having the extra memcmp() in there but
>> >>> does it
>> >>> >> matter? For example, if the overhead is just 1-2s on a 30s query then
>> >>> that
>> >>> >> doesn't really matter. Then again, if the overhead is 1s on a 3s query
>> >>> then
>> >>> >> that is significant.
>> >>> >>
>> >>> >> Since this is a new feature then I'd be inclined to say that for a first
>> >>> cut
>> >>> >> we should keep the standard API, and depending on the reports we get
>> >>> back,
>> >>> >> look at improving it later. That seems a lot more preferable to having a
>> >>> >> fairly nasty API hack that will catch a lot of people out :(
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> ATB,
>> >>> >>
>> >>> >> Mark.
>> >>> >>
>> >>> >> --
>> >>> >> Mark Cave-Ayland
>> >>> >> Sirius Corporation - The Open Source Experts
>> >>> >> http://www.siriusit.co.uk
>> >>> >> T: +44 870 608 0063
>> >>> >> _______________________________________________
>> >>> >> postgis-devel mailing list
>> >>> >> postgis-devel at postgis.refractions.net
>> >>> >> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>> >>> >>
>> >>> >>
>> >>> > _______________________________________________
>> >>> > postgis-devel mailing list
>> >>> > postgis-devel at postgis.refractions.net
>> >>> > http://postgis.refractions.net/mailman/listinfo/postgis-devel
>> >>> >
>> >>> >
>> >>>
>> >>> --
>> >>> Martin Davis
>> >>> Senior Technical Architect
>> >>> Refractions Research, Inc.
>> >>> (250) 383-3022
>> >>>
>> >>> _______________________________________________
>> >>> postgis-devel mailing list
>> >>> postgis-devel at postgis.refractions.net
>> >>> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> postgis-devel mailing list
>> >>> postgis-devel at postgis.refractions.net
>> >>> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> postgis-devel mailing list
>> >>> postgis-devel at postgis.refractions.net
>> >>> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>> >>>
>> >>>
>> >>>
>> >> _______________________________________________
>> >> postgis-devel mailing list
>> >> postgis-devel at postgis.refractions.net
>> >> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>> >>
>> > _______________________________________________
>> > postgis-devel mailing list
>> > postgis-devel at postgis.refractions.net
>> > http://postgis.refractions.net/mailman/listinfo/postgis-devel
>>
>> _______________________________________________
>> postgis-devel mailing list
>> postgis-devel at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>>
>>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>
>
--
Martin Davis
Senior Technical Architect
Refractions Research, Inc.
(250) 383-3022
More information about the postgis-devel
mailing list