API for optimized predicates (was Re: [postgis-devel] 1.3.3)
Mark Leslie
mark.leslie at lisasoft.com
Wed Apr 2 15:40:11 PDT 2008
ST_Contains(geometry, geometry, integer) is defined as 'isstrict'. This
is a PostgreSQL optimization that will cause the function to return null
anytime any argument is null.
I'm sure it's screaming fast though.
--
Mark Leslie
Geospatial Software Architect
LISAsoft Pty Ltd
+61 (0)2 8570 5050
Commercial Support for Geospatial Open Source Software
http://www.lisasoft.com/LISAsoft/SupportedProducts.html
Paul Ramsey wrote:
> Correction, it returns NULL all the time.
>
> On Wed, Apr 2, 2008 at 9:45 AM, Paul Ramsey <pramsey at cleverelephant.ca> wrote:
>> Sure, it fails in a less catastrophic way, it just returns false all the time.
>>
>>
>>
>> On Wed, Apr 2, 2008 at 9:43 AM, Martin Davis <mbdavis at refractions.net> wrote:
>> > So are you going to test the NULL case?
>> >
>> >
>> >
>> > Paul Ramsey wrote:
>> >
>> > > Right. So, unsurprisingly, the 2-param case returned the same timing,
>> > > since it *was* the same code line.
>> > >
>> > > The 3-param case I ran was ST_Contains(ed.the_geom, v.centroid,
>> > > ed.gid), so the numeric case, not the NULL case.
>> > >
>> > > P
>> > >
>> > > On Tue, Apr 1, 2008 at 5:26 PM, Chris Hodgson <chodgson at refractions.net>
>> > wrote:
>> > >
>> > >
>> > > > Hmm... good point, when you say "2-param" case do you mean passing a
>> > > > NULL to the 3-param version? Because I think the 2-param version IS the
>> > > > usual un-prepared approach, which would explain your results... unless
>> > > > I'm misunderstanding Ben's patch...
>> > > >
>> > > > Chris
>> > > >
>> > > >
>> > > >
>> > > > Ben Jubb wrote:
>> > > > > for the 3 param version, where you using an integer key, or NULL?
>> > > > > b
>> > > > >
>> > > > > Paul Ramsey wrote:
>> > > > >> I gave this a try, but in the three-parameter case it caused the
>> > > > >> backend to crash and in the two-parameter case provided the same
>> > speed
>> > > > >> as the usual un-prepared approach...
>> > > > >>
>> > > > >> I was testing with st_contains(polycolumn, pointcolumn), with 80
>> > polys
>> > > > >> and 7000 points.
>> > > > >>
>> > > > >> P
>> > > > >>
>> > > > >> On Mon, Mar 31, 2008 at 3:50 PM, Ben Jubb <benjubb at refractions.net>
>> > wrote:
>> > > > >>
>> > > > >>> Hiya,
>> > > > >>> I've attached a patch to lwgeom_geos_c.c, modifying its 1st arg
>> > caching
>> > > > >>> behaviour.
>> > > > >>> The third argument is used as before, as a surrogate key, and the
>> > caching
>> > > > >>> will use that as its key;
>> > > > >>> UNLESS the key is NULL.
>> > > > >>> If the key is NULL, the predicates use the memcmp technique to
>> > determine if
>> > > > >>> the cached prepared geometry is in sync with the first arg.
>> > > > >>> Note that the two caching approaches have essentially independent
>> > caches.
>> > > > >>> This patch is intended for testing purposes only.
>> > > > >>> enjoy
>> > > > >>> b
>> > > > >>>
>> > > > >>>
>> > > > >>>
>> > > > >>> Paul Ramsey wrote:
>> > > > >>> A unique-on-insert ID would be another approach. It would,
>> > however,
>> > > > >>> involve a disk-format change, so we're talking about pretty big
>> > > > >>> hammers here, regardless of whether we did a hash or a uuid.
>> > > > >>>
>> > > > >>> Ben, maybe just stick some small timing statements into your
>> > current
>> > > > >>> code... start time, end time, and then do a noop memcmp with
>> > start/end
>> > > > >>> times as well. That way we can compare the memcmp times to the
>> > total
>> > > > >>> times.
>> > > > >>>
>> > > > >>> P.
>> > > > >>>
>> > > > >>> On Mon, Mar 31, 2008 at 10:17 AM, Martin Davis
>> > <mbdavis at refractions.net>
>> > > > >>> wrote:
>> > > > >>>
>> > > > >>>
>> > > > >>> (renaming this thread, since the current one is way overloaded)
>> > > > >>>
>> > > > >>> I agree with Paul and Mark - there should be a simple function
>> > signature
>> > > > >>> for the fast preds. The more complex one can be provided as well,
>> > but
>> > > > >>> it will need to be VERY well documented, since it's a tricky thing
>> > to grok.
>> > > > >>>
>> > > > >>> re spatial hash - would you really trust a hash to confirm
>> > identity? I
>> > > > >>> don't think I would...
>> > > > >>>
>> > > > >>> Would another alternative would be to assign a hidden unique ID to
>> > each
>> > > > >>> geom entered into the DB. This could be a honking big integer, or
>> > maybe
>> > > > >>> a UUID.
>> > > > >>>
>> > > > >>> Paul Ramsey wrote:
>> > > > >>> > The problem is that the memcmp hit gets worse in exactly the
>> > cases
>> > > > >>> > were we expect better and better performance from the prepared
>> > > > >>> > algorithm... still, it would be nice to know what that hit is...
>> > > > >>> > compared to the original, unprepared time, it will be small, but
>> > > > >>> > compared to the prepared-with-id-API implementation... hard to
>> > say.
>> > > > >>> >
>> > > > >>> > Something to resolve before 1.4... It's unfortunate that all the
>> > > > >>> > *fast* tests can only falsify identity, not confirm it. I was
>> > talking
>> > > > >>> > to a fellow who has done a spatial db implementation on a
>> > proprietary
>> > > > >>> > system, and he was pleased with the idea of a "geographic hash"
>> > that
>> > > > >>> > he can calculate for each shape and use to test identity. If we
>> > were
>> > > > >>> > to do something like that, it would have to be optional, like
>> > the bbox
>> > > > >>> > calculation is currently.
>> > > > >>> >
>> > > > >>> > P.
>> > > > >>> >
>> > > > >>> > On Mon, Mar 31, 2008 at 2:51 AM, Mark Cave-Ayland
>> > > > >>> > <mark.cave-ayland at siriusit.co.uk> wrote:
>> > > > >>> >
>> > > > >>> >> On Friday 28 March 2008 23:53:53 Ben Jubb wrote:
>> > > > >>> >> > Howdy,
>> > > > >>> >> > In my testing, I did see a performance hit when using the
>> > memcmp test,
>> > > > >>> >> > although it was noticable only in the largest of my test
>> > geometries
>> > > > >>> >> > (5000 vertices or so).
>> > > > >>> >> > The three parameter form seemed like the best way to go
>> > because the
>> > > > >>> >> > whole point of the prepared version of the functions was to
>> > get the
>> > > > >>> best
>> > > > >>> >> > possible performance. The cases when the performance matters
>> > most is
>> > > > >>> >> > with large geoms, and then the cost of doing the memcmp is
>> > the
>> > > > >>> highest.
>> > > > >>> >> > Using a third argument seemed the simplest way to get the
>> > best
>> > > > >>> possible
>> > > > >>> >> > performance from the predicates, with a minimal increase in
>> > the
>> > > > >>> >> > complexity of the interface.
>> > > > >>> >> > I agree it would be nice to have a single form for those
>> > predicates
>> > > > >>> that
>> > > > >>> >> > automatically determines the most efficient manner to do the
>> > tests,
>> > > > >>> but
>> > > > >>> >> > there didn't seem to be any efficient way to accomplish that.
>> > > > >>> >> >
>> > > > >>> >> > b
>> > > > >>> >>
>> > > > >>> >>
>> > > > >>> >> Hi Ben,
>> > > > >>> >>
>> > > > >>> >> Well I think it really comes down to what exactly is the
>> > performance hit
>> > > > >>> and
>> > > > >>> >> how did you measure it? Which platform/OS/C library did you
>> > use?
>> > > > >>> Obviously
>> > > > >>> >> there will be *some* overhead having the extra memcmp() in
>> > there but
>> > > > >>> does it
>> > > > >>> >> matter? For example, if the overhead is just 1-2s on a 30s
>> > query then
>> > > > >>> that
>> > > > >>> >> doesn't really matter. Then again, if the overhead is 1s on a
>> > 3s query
>> > > > >>> then
>> > > > >>> >> that is significant.
>> > > > >>> >>
>> > > > >>> >> Since this is a new feature then I'd be inclined to say that
>> > for a first
>> > > > >>> cut
>> > > > >>> >> we should keep the standard API, and depending on the reports
>> > we get
>> > > > >>> back,
>> > > > >>> >> look at improving it later. That seems a lot more preferable to
>> > having a
>> > > > >>> >> fairly nasty API hack that will catch a lot of people out :(
>> > > > >>> >>
>> > > > >>> >>
>> > > > >>> >>
>> > > > >>> >> ATB,
>> > > > >>> >>
>> > > > >>> >> Mark.
>> > > > >>> >>
>> > > > >>> >> --
>> > > > >>> >> Mark Cave-Ayland
>> > > > >>> >> Sirius Corporation - The Open Source Experts
>> > > > >>> >> http://www.siriusit.co.uk
>> > > > >>> >> T: +44 870 608 0063
>> > > > >>> >> _______________________________________________
>> > > > >>> >> postgis-devel mailing list
>> > > > >>> >> postgis-devel at postgis.refractions.net
>> > > > >>> >> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>> > > > >>> >>
>> > > > >>> >>
>> > > > >>> > _______________________________________________
>> > > > >>> > postgis-devel mailing list
>> > > > >>> > postgis-devel at postgis.refractions.net
>> > > > >>> > http://postgis.refractions.net/mailman/listinfo/postgis-devel
>> > > > >>> >
>> > > > >>> >
>> > > > >>>
>> > > > >>> --
>> > > > >>> Martin Davis
>> > > > >>> Senior Technical Architect
>> > > > >>> Refractions Research, Inc.
>> > > > >>> (250) 383-3022
>> > > > >>>
>> > > > >>> _______________________________________________
>> > > > >>> postgis-devel mailing list
>> > > > >>> postgis-devel at postgis.refractions.net
>> > > > >>> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>> > > > >>>
>> > > > >>>
>> > > > >>> _______________________________________________
>> > > > >>> postgis-devel mailing list
>> > > > >>> postgis-devel at postgis.refractions.net
>> > > > >>> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>> > > > >>>
>> > > > >>>
>> > > > >>> _______________________________________________
>> > > > >>> postgis-devel mailing list
>> > > > >>> postgis-devel at postgis.refractions.net
>> > > > >>> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>> > > > >>>
>> > > > >>>
>> > > > >>>
>> > > > >> _______________________________________________
>> > > > >> postgis-devel mailing list
>> > > > >> postgis-devel at postgis.refractions.net
>> > > > >> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>> > > > >>
>> > > > > _______________________________________________
>> > > > > postgis-devel mailing list
>> > > > > postgis-devel at postgis.refractions.net
>> > > > > http://postgis.refractions.net/mailman/listinfo/postgis-devel
>> > > >
>> > > > _______________________________________________
>> > > > postgis-devel mailing list
>> > > > postgis-devel at postgis.refractions.net
>> > > > http://postgis.refractions.net/mailman/listinfo/postgis-devel
>> > > >
>> > > >
>> > > >
>> > > _______________________________________________
>> > > postgis-devel mailing list
>> > > postgis-devel at postgis.refractions.net
>> > > http://postgis.refractions.net/mailman/listinfo/postgis-devel
>> > >
>> > >
>> > >
>> >
>> > --
>> > Martin Davis
>> > Senior Technical Architect
>> > Refractions Research, Inc.
>> > (250) 383-3022
>> >
>> > _______________________________________________
>> > postgis-devel mailing list
>> > postgis-devel at postgis.refractions.net
>> > http://postgis.refractions.net/mailman/listinfo/postgis-devel
>> >
>>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>
More information about the postgis-devel
mailing list