API for optimized predicates (was Re: [postgis-devel] 1.3.3)

Mark Leslie mark.leslie at lisasoft.com
Wed Apr 2 15:40:11 PDT 2008


ST_Contains(geometry, geometry, integer) is defined as 'isstrict'.  This 
is a PostgreSQL optimization that will cause the function to return null 
anytime any argument is null.

I'm sure it's screaming fast though.

-- 
Mark Leslie
Geospatial Software Architect
LISAsoft Pty Ltd
+61 (0)2 8570 5050

Commercial Support for Geospatial Open Source Software
http://www.lisasoft.com/LISAsoft/SupportedProducts.html


Paul Ramsey wrote:
> Correction, it returns NULL all the time.
> 
> On Wed, Apr 2, 2008 at 9:45 AM, Paul Ramsey <pramsey at cleverelephant.ca> wrote:
>> Sure, it fails in a less catastrophic way, it just returns false all the time.
>>
>>
>>
>>  On Wed, Apr 2, 2008 at 9:43 AM, Martin Davis <mbdavis at refractions.net> wrote:
>>  > So are you going to test the NULL case?
>>  >
>>  >
>>  >
>>  >  Paul Ramsey wrote:
>>  >
>>  > > Right. So, unsurprisingly, the 2-param case returned the same timing,
>>  > > since it *was* the same code line.
>>  > >
>>  > > The 3-param case I ran was ST_Contains(ed.the_geom, v.centroid,
>>  > > ed.gid), so the numeric case, not the NULL case.
>>  > >
>>  > > P
>>  > >
>>  > > On Tue, Apr 1, 2008 at 5:26 PM, Chris Hodgson <chodgson at refractions.net>
>>  > wrote:
>>  > >
>>  > >
>>  > > > Hmm... good point, when you say "2-param" case do you mean passing a
>>  > > >  NULL to the 3-param version? Because I think the 2-param version IS the
>>  > > >  usual un-prepared approach, which would explain your results... unless
>>  > > >  I'm misunderstanding Ben's patch...
>>  > > >
>>  > > >  Chris
>>  > > >
>>  > > >
>>  > > >
>>  > > >  Ben Jubb wrote:
>>  > > >  > for the 3 param version, where you using an integer key, or NULL?
>>  > > >  > b
>>  > > >  >
>>  > > >  > Paul Ramsey wrote:
>>  > > >  >> I gave this a try, but in the three-parameter case it caused the
>>  > > >  >> backend to crash and in the two-parameter case provided the same
>>  > speed
>>  > > >  >> as the usual un-prepared approach...
>>  > > >  >>
>>  > > >  >> I was testing with st_contains(polycolumn, pointcolumn), with 80
>>  > polys
>>  > > >  >> and 7000 points.
>>  > > >  >>
>>  > > >  >> P
>>  > > >  >>
>>  > > >  >> On Mon, Mar 31, 2008 at 3:50 PM, Ben Jubb <benjubb at refractions.net>
>>  > wrote:
>>  > > >  >>
>>  > > >  >>>  Hiya,
>>  > > >  >>>  I've attached a patch to lwgeom_geos_c.c, modifying its 1st arg
>>  > caching
>>  > > >  >>> behaviour.
>>  > > >  >>>  The third argument is used as before, as a surrogate key, and the
>>  > caching
>>  > > >  >>> will use that as its key;
>>  > > >  >>>  UNLESS the key is NULL.
>>  > > >  >>>  If the key is NULL, the predicates use the memcmp technique to
>>  > determine if
>>  > > >  >>> the cached prepared geometry is in sync with the first arg.
>>  > > >  >>>  Note that the two caching approaches have essentially independent
>>  > caches.
>>  > > >  >>>  This patch is intended for testing purposes only.
>>  > > >  >>>  enjoy
>>  > > >  >>>  b
>>  > > >  >>>
>>  > > >  >>>
>>  > > >  >>>
>>  > > >  >>>  Paul Ramsey wrote:
>>  > > >  >>>  A unique-on-insert ID would be another approach. It would,
>>  > however,
>>  > > >  >>> involve a disk-format change, so we're talking about pretty big
>>  > > >  >>> hammers here, regardless of whether we did a hash or a uuid.
>>  > > >  >>>
>>  > > >  >>> Ben, maybe just stick some small timing statements into your
>>  > current
>>  > > >  >>> code... start time, end time, and then do a noop memcmp with
>>  > start/end
>>  > > >  >>> times as well. That way we can compare the memcmp times to the
>>  > total
>>  > > >  >>> times.
>>  > > >  >>>
>>  > > >  >>> P.
>>  > > >  >>>
>>  > > >  >>> On Mon, Mar 31, 2008 at 10:17 AM, Martin Davis
>>  > <mbdavis at refractions.net>
>>  > > >  >>> wrote:
>>  > > >  >>>
>>  > > >  >>>
>>  > > >  >>>  (renaming this thread, since the current one is way overloaded)
>>  > > >  >>>
>>  > > >  >>>  I agree with Paul and Mark - there should be a simple function
>>  > signature
>>  > > >  >>>  for the fast preds. The more complex one can be provided as well,
>>  > but
>>  > > >  >>>  it will need to be VERY well documented, since it's a tricky thing
>>  > to grok.
>>  > > >  >>>
>>  > > >  >>>  re spatial hash - would you really trust a hash to confirm
>>  > identity? I
>>  > > >  >>>  don't think I would...
>>  > > >  >>>
>>  > > >  >>>  Would another alternative would be to assign a hidden unique ID to
>>  > each
>>  > > >  >>>  geom entered into the DB. This could be a honking big integer, or
>>  > maybe
>>  > > >  >>>  a UUID.
>>  > > >  >>>
>>  > > >  >>>  Paul Ramsey wrote:
>>  > > >  >>>  > The problem is that the memcmp hit gets worse in exactly the
>>  > cases
>>  > > >  >>>  > were we expect better and better performance from the prepared
>>  > > >  >>>  > algorithm... still, it would be nice to know what that hit is...
>>  > > >  >>>  > compared to the original, unprepared time, it will be small, but
>>  > > >  >>>  > compared to the prepared-with-id-API implementation... hard to
>>  > say.
>>  > > >  >>>  >
>>  > > >  >>>  > Something to resolve before 1.4... It's unfortunate that all the
>>  > > >  >>>  > *fast* tests can only falsify identity, not confirm it. I was
>>  > talking
>>  > > >  >>>  > to a fellow who has done a spatial db implementation on a
>>  > proprietary
>>  > > >  >>>  > system, and he was pleased with the idea of a "geographic hash"
>>  > that
>>  > > >  >>>  > he can calculate for each shape and use to test identity. If we
>>  > were
>>  > > >  >>>  > to do something like that, it would have to be optional, like
>>  > the bbox
>>  > > >  >>>  > calculation is currently.
>>  > > >  >>>  >
>>  > > >  >>>  > P.
>>  > > >  >>>  >
>>  > > >  >>>  > On Mon, Mar 31, 2008 at 2:51 AM, Mark Cave-Ayland
>>  > > >  >>>  > <mark.cave-ayland at siriusit.co.uk> wrote:
>>  > > >  >>>  >
>>  > > >  >>>  >> On Friday 28 March 2008 23:53:53 Ben Jubb wrote:
>>  > > >  >>>  >> > Howdy,
>>  > > >  >>>  >> > In my testing, I did see a performance hit when using the
>>  > memcmp test,
>>  > > >  >>>  >> > although it was noticable only in the largest of my test
>>  > geometries
>>  > > >  >>>  >> > (5000 vertices or so).
>>  > > >  >>>  >> > The three parameter form seemed like the best way to go
>>  > because the
>>  > > >  >>>  >> > whole point of the prepared version of the functions was to
>>  > get the
>>  > > >  >>> best
>>  > > >  >>>  >> > possible performance. The cases when the performance matters
>>  > most is
>>  > > >  >>>  >> > with large geoms, and then the cost of doing the memcmp is
>>  > the
>>  > > >  >>> highest.
>>  > > >  >>>  >> > Using a third argument seemed the simplest way to get the
>>  > best
>>  > > >  >>> possible
>>  > > >  >>>  >> > performance from the predicates, with a minimal increase in
>>  > the
>>  > > >  >>>  >> > complexity of the interface.
>>  > > >  >>>  >> > I agree it would be nice to have a single form for those
>>  > predicates
>>  > > >  >>> that
>>  > > >  >>>  >> > automatically determines the most efficient manner to do the
>>  > tests,
>>  > > >  >>> but
>>  > > >  >>>  >> > there didn't seem to be any efficient way to accomplish that.
>>  > > >  >>>  >> >
>>  > > >  >>>  >> > b
>>  > > >  >>>  >>
>>  > > >  >>>  >>
>>  > > >  >>>  >> Hi Ben,
>>  > > >  >>>  >>
>>  > > >  >>>  >> Well I think it really comes down to what exactly is the
>>  > performance hit
>>  > > >  >>> and
>>  > > >  >>>  >> how did you measure it? Which platform/OS/C library did you
>>  > use?
>>  > > >  >>> Obviously
>>  > > >  >>>  >> there will be *some* overhead having the extra memcmp() in
>>  > there but
>>  > > >  >>> does it
>>  > > >  >>>  >> matter? For example, if the overhead is just 1-2s on a 30s
>>  > query then
>>  > > >  >>> that
>>  > > >  >>>  >> doesn't really matter. Then again, if the overhead is 1s on a
>>  > 3s query
>>  > > >  >>> then
>>  > > >  >>>  >> that is significant.
>>  > > >  >>>  >>
>>  > > >  >>>  >> Since this is a new feature then I'd be inclined to say that
>>  > for a first
>>  > > >  >>> cut
>>  > > >  >>>  >> we should keep the standard API, and depending on the reports
>>  > we get
>>  > > >  >>> back,
>>  > > >  >>>  >> look at improving it later. That seems a lot more preferable to
>>  > having a
>>  > > >  >>>  >> fairly nasty API hack that will catch a lot of people out :(
>>  > > >  >>>  >>
>>  > > >  >>>  >>
>>  > > >  >>>  >>
>>  > > >  >>>  >> ATB,
>>  > > >  >>>  >>
>>  > > >  >>>  >> Mark.
>>  > > >  >>>  >>
>>  > > >  >>>  >> --
>>  > > >  >>>  >> Mark Cave-Ayland
>>  > > >  >>>  >> Sirius Corporation - The Open Source Experts
>>  > > >  >>>  >> http://www.siriusit.co.uk
>>  > > >  >>>  >> T: +44 870 608 0063
>>  > > >  >>>  >> _______________________________________________
>>  > > >  >>>  >> postgis-devel mailing list
>>  > > >  >>>  >> postgis-devel at postgis.refractions.net
>>  > > >  >>>  >> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>>  > > >  >>>  >>
>>  > > >  >>>  >>
>>  > > >  >>>  > _______________________________________________
>>  > > >  >>>  > postgis-devel mailing list
>>  > > >  >>>  > postgis-devel at postgis.refractions.net
>>  > > >  >>>  > http://postgis.refractions.net/mailman/listinfo/postgis-devel
>>  > > >  >>>  >
>>  > > >  >>>  >
>>  > > >  >>>
>>  > > >  >>>  --
>>  > > >  >>>  Martin Davis
>>  > > >  >>>  Senior Technical Architect
>>  > > >  >>>  Refractions Research, Inc.
>>  > > >  >>>  (250) 383-3022
>>  > > >  >>>
>>  > > >  >>>  _______________________________________________
>>  > > >  >>>  postgis-devel mailing list
>>  > > >  >>>  postgis-devel at postgis.refractions.net
>>  > > >  >>>  http://postgis.refractions.net/mailman/listinfo/postgis-devel
>>  > > >  >>>
>>  > > >  >>>
>>  > > >  >>>  _______________________________________________
>>  > > >  >>> postgis-devel mailing list
>>  > > >  >>> postgis-devel at postgis.refractions.net
>>  > > >  >>> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>>  > > >  >>>
>>  > > >  >>>
>>  > > >  >>> _______________________________________________
>>  > > >  >>>  postgis-devel mailing list
>>  > > >  >>>  postgis-devel at postgis.refractions.net
>>  > > >  >>>  http://postgis.refractions.net/mailman/listinfo/postgis-devel
>>  > > >  >>>
>>  > > >  >>>
>>  > > >  >>>
>>  > > >  >> _______________________________________________
>>  > > >  >> postgis-devel mailing list
>>  > > >  >> postgis-devel at postgis.refractions.net
>>  > > >  >> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>>  > > >  >>
>>  > > >  > _______________________________________________
>>  > > >  > postgis-devel mailing list
>>  > > >  > postgis-devel at postgis.refractions.net
>>  > > >  > http://postgis.refractions.net/mailman/listinfo/postgis-devel
>>  > > >
>>  > > >  _______________________________________________
>>  > > >  postgis-devel mailing list
>>  > > >  postgis-devel at postgis.refractions.net
>>  > > >  http://postgis.refractions.net/mailman/listinfo/postgis-devel
>>  > > >
>>  > > >
>>  > > >
>>  > > _______________________________________________
>>  > > postgis-devel mailing list
>>  > > postgis-devel at postgis.refractions.net
>>  > > http://postgis.refractions.net/mailman/listinfo/postgis-devel
>>  > >
>>  > >
>>  > >
>>  >
>>  >  --
>>  >  Martin Davis
>>  >  Senior Technical Architect
>>  >  Refractions Research, Inc.
>>  >  (250) 383-3022
>>  >
>>  >  _______________________________________________
>>  >  postgis-devel mailing list
>>  >  postgis-devel at postgis.refractions.net
>>  >  http://postgis.refractions.net/mailman/listinfo/postgis-devel
>>  >
>>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-devel
> 



More information about the postgis-devel mailing list