API for optimized predicates (was Re: [postgis-devel] 1.3.3)
Paul Ramsey
pramsey at cleverelephant.ca
Tue Apr 1 18:08:25 PDT 2008
Right. So, unsurprisingly, the 2-param case returned the same timing,
since it *was* the same code line.
The 3-param case I ran was ST_Contains(ed.the_geom, v.centroid,
ed.gid), so the numeric case, not the NULL case.
P
On Tue, Apr 1, 2008 at 5:26 PM, Chris Hodgson <chodgson at refractions.net> wrote:
> Hmm... good point, when you say "2-param" case do you mean passing a
> NULL to the 3-param version? Because I think the 2-param version IS the
> usual un-prepared approach, which would explain your results... unless
> I'm misunderstanding Ben's patch...
>
> Chris
>
>
>
> Ben Jubb wrote:
> > for the 3 param version, where you using an integer key, or NULL?
> > b
> >
> > Paul Ramsey wrote:
> >> I gave this a try, but in the three-parameter case it caused the
> >> backend to crash and in the two-parameter case provided the same speed
> >> as the usual un-prepared approach...
> >>
> >> I was testing with st_contains(polycolumn, pointcolumn), with 80 polys
> >> and 7000 points.
> >>
> >> P
> >>
> >> On Mon, Mar 31, 2008 at 3:50 PM, Ben Jubb <benjubb at refractions.net> wrote:
> >>
> >>> Hiya,
> >>> I've attached a patch to lwgeom_geos_c.c, modifying its 1st arg caching
> >>> behaviour.
> >>> The third argument is used as before, as a surrogate key, and the caching
> >>> will use that as its key;
> >>> UNLESS the key is NULL.
> >>> If the key is NULL, the predicates use the memcmp technique to determine if
> >>> the cached prepared geometry is in sync with the first arg.
> >>> Note that the two caching approaches have essentially independent caches.
> >>> This patch is intended for testing purposes only.
> >>> enjoy
> >>> b
> >>>
> >>>
> >>>
> >>> Paul Ramsey wrote:
> >>> A unique-on-insert ID would be another approach. It would, however,
> >>> involve a disk-format change, so we're talking about pretty big
> >>> hammers here, regardless of whether we did a hash or a uuid.
> >>>
> >>> Ben, maybe just stick some small timing statements into your current
> >>> code... start time, end time, and then do a noop memcmp with start/end
> >>> times as well. That way we can compare the memcmp times to the total
> >>> times.
> >>>
> >>> P.
> >>>
> >>> On Mon, Mar 31, 2008 at 10:17 AM, Martin Davis <mbdavis at refractions.net>
> >>> wrote:
> >>>
> >>>
> >>> (renaming this thread, since the current one is way overloaded)
> >>>
> >>> I agree with Paul and Mark - there should be a simple function signature
> >>> for the fast preds. The more complex one can be provided as well, but
> >>> it will need to be VERY well documented, since it's a tricky thing to grok.
> >>>
> >>> re spatial hash - would you really trust a hash to confirm identity? I
> >>> don't think I would...
> >>>
> >>> Would another alternative would be to assign a hidden unique ID to each
> >>> geom entered into the DB. This could be a honking big integer, or maybe
> >>> a UUID.
> >>>
> >>> Paul Ramsey wrote:
> >>> > The problem is that the memcmp hit gets worse in exactly the cases
> >>> > were we expect better and better performance from the prepared
> >>> > algorithm... still, it would be nice to know what that hit is...
> >>> > compared to the original, unprepared time, it will be small, but
> >>> > compared to the prepared-with-id-API implementation... hard to say.
> >>> >
> >>> > Something to resolve before 1.4... It's unfortunate that all the
> >>> > *fast* tests can only falsify identity, not confirm it. I was talking
> >>> > to a fellow who has done a spatial db implementation on a proprietary
> >>> > system, and he was pleased with the idea of a "geographic hash" that
> >>> > he can calculate for each shape and use to test identity. If we were
> >>> > to do something like that, it would have to be optional, like the bbox
> >>> > calculation is currently.
> >>> >
> >>> > P.
> >>> >
> >>> > On Mon, Mar 31, 2008 at 2:51 AM, Mark Cave-Ayland
> >>> > <mark.cave-ayland at siriusit.co.uk> wrote:
> >>> >
> >>> >> On Friday 28 March 2008 23:53:53 Ben Jubb wrote:
> >>> >> > Howdy,
> >>> >> > In my testing, I did see a performance hit when using the memcmp test,
> >>> >> > although it was noticable only in the largest of my test geometries
> >>> >> > (5000 vertices or so).
> >>> >> > The three parameter form seemed like the best way to go because the
> >>> >> > whole point of the prepared version of the functions was to get the
> >>> best
> >>> >> > possible performance. The cases when the performance matters most is
> >>> >> > with large geoms, and then the cost of doing the memcmp is the
> >>> highest.
> >>> >> > Using a third argument seemed the simplest way to get the best
> >>> possible
> >>> >> > performance from the predicates, with a minimal increase in the
> >>> >> > complexity of the interface.
> >>> >> > I agree it would be nice to have a single form for those predicates
> >>> that
> >>> >> > automatically determines the most efficient manner to do the tests,
> >>> but
> >>> >> > there didn't seem to be any efficient way to accomplish that.
> >>> >> >
> >>> >> > b
> >>> >>
> >>> >>
> >>> >> Hi Ben,
> >>> >>
> >>> >> Well I think it really comes down to what exactly is the performance hit
> >>> and
> >>> >> how did you measure it? Which platform/OS/C library did you use?
> >>> Obviously
> >>> >> there will be *some* overhead having the extra memcmp() in there but
> >>> does it
> >>> >> matter? For example, if the overhead is just 1-2s on a 30s query then
> >>> that
> >>> >> doesn't really matter. Then again, if the overhead is 1s on a 3s query
> >>> then
> >>> >> that is significant.
> >>> >>
> >>> >> Since this is a new feature then I'd be inclined to say that for a first
> >>> cut
> >>> >> we should keep the standard API, and depending on the reports we get
> >>> back,
> >>> >> look at improving it later. That seems a lot more preferable to having a
> >>> >> fairly nasty API hack that will catch a lot of people out :(
> >>> >>
> >>> >>
> >>> >>
> >>> >> ATB,
> >>> >>
> >>> >> Mark.
> >>> >>
> >>> >> --
> >>> >> Mark Cave-Ayland
> >>> >> Sirius Corporation - The Open Source Experts
> >>> >> http://www.siriusit.co.uk
> >>> >> T: +44 870 608 0063
> >>> >> _______________________________________________
> >>> >> postgis-devel mailing list
> >>> >> postgis-devel at postgis.refractions.net
> >>> >> http://postgis.refractions.net/mailman/listinfo/postgis-devel
> >>> >>
> >>> >>
> >>> > _______________________________________________
> >>> > postgis-devel mailing list
> >>> > postgis-devel at postgis.refractions.net
> >>> > http://postgis.refractions.net/mailman/listinfo/postgis-devel
> >>> >
> >>> >
> >>>
> >>> --
> >>> Martin Davis
> >>> Senior Technical Architect
> >>> Refractions Research, Inc.
> >>> (250) 383-3022
> >>>
> >>> _______________________________________________
> >>> postgis-devel mailing list
> >>> postgis-devel at postgis.refractions.net
> >>> http://postgis.refractions.net/mailman/listinfo/postgis-devel
> >>>
> >>>
> >>> _______________________________________________
> >>> postgis-devel mailing list
> >>> postgis-devel at postgis.refractions.net
> >>> http://postgis.refractions.net/mailman/listinfo/postgis-devel
> >>>
> >>>
> >>> _______________________________________________
> >>> postgis-devel mailing list
> >>> postgis-devel at postgis.refractions.net
> >>> http://postgis.refractions.net/mailman/listinfo/postgis-devel
> >>>
> >>>
> >>>
> >> _______________________________________________
> >> postgis-devel mailing list
> >> postgis-devel at postgis.refractions.net
> >> http://postgis.refractions.net/mailman/listinfo/postgis-devel
> >>
> > _______________________________________________
> > postgis-devel mailing list
> > postgis-devel at postgis.refractions.net
> > http://postgis.refractions.net/mailman/listinfo/postgis-devel
>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>
More information about the postgis-devel
mailing list