[postgis-devel] Caching function calls with CTEs

nicklas.aven at jordogskog.no nicklas.aven at jordogskog.no
Mon Jul 20 07:08:31 PDT 2009


I have tried to make an example of what I mean but I think there is som way to do it more efficient. the goal is to get a dynamic value for st_dwithin in a query like:SELECT DISTINCT ON(g1.gid)  g1.gid As gref_gid, g1.description As gref_description, g2.gid As gnn_gid, 
        g2.description As gnn_description  
    FROM sometable As g1, sometable As g2   
    WHERE g1.gid <> g2.gid AND ST_DWithin(g1.the_geom, g2.the_geom, 300)   
    ORDER BY g1.gid, ST_Distance(g1.the_geom,g2.the_geom) 
to get the smallest maxdistance:
SELECT DISTINCT ON(g1.gid)  g1.gid As gref_gid, g2.gid As gnn_gid 
    FROM sometable  As g1, sometable  As g2   
    WHERE g1.gid <> g2.gid
    ORDER BY g1.gid, ST_max_Distance(st_envelope(g1.the_geom),st_envelope(g2.the_geom)) but I think it's worth the effort to go one step further and look for st_distance to the found geometry:
Select gref_gid, st_distance(t1.geom1, t1.geom2) as dist from
(SELECT DISTINCT ON(g1.gid)  g1.gid As gref_gid, g2.gid As gnn_gid , g1.the_geom as geom1, g2.the_geom as geom2
    FROM sometable As g1, sometable   As g2   
    WHERE g1.gid <> g2.gid
    ORDER BY g1.gid, ST_max_Distance(st_envelope(g1.the_geom),st_envelope(g2.the_geom)) ) t1
I tried SELECT DISTINCT ON(g1.gid)  g1.gid As gref_gid, g2.gid As gnn_gid , st_distance(g1.the_geom, g2.the_geom)
    FROM sometable As g1, sometable As g2   
    WHERE g1.gid <> g2.gid
    ORDER BY g1.gid, ST_max_Distance(st_envelope(g1.the_geom),st_envelope(g2.the_geom))but then it calculates st_distance for ALL combinations not just the distinct. now we have got our dynamic value for st_dwithin and I use it like this:
with dl as
(select gref_gid, st_distance(t1.geom1, t1.geom2) as dist from
(SELECT DISTINCT ON(g1.gid)  g1.gid As gref_gid, g2.gid As gnn_gid , g1.the_geom as geom1, g2.the_geom as geom2
    FROM sometable As g1, sometable As g2   
    WHERE g1.gid <> g2.gid
    ORDER BY g1.gid, ST_max_Distance(st_envelope(g1.the_geom),st_envelope(g2.the_geom)) ) t1)   
SELECT DISTINCT ON(g1.gid)  g1.gid As gref_gid, g2.gid As gnn_gid  
    FROM sometable As g1, sometable As g2, dl   
    WHERE dl.gref_gid= g1.gid and g1.gid <> g2.gid AND ST_DWithin(g1.the_geom, g2.the_geom, dl.dist)   
    ORDER BY g1.gid, ST_Distance(g1.the_geom,g2.the_geom) ; It seems to work and it seems fast, but there is something very strange about the behavior of st_dwithin in my functions. My functions is faster than the ones used today in every test I have done except one clean point to point where it was som percent slower and that is understandable.Butst_dwithin is slower but _st_dwithin is fasterI can't explain it and I think it has to do with some bad handling of memory because I don't know enough.I will make a separate opst of that, but I wanted to mention it here because I think when that is working the above will be quite a lot faster. /Nicklas  

2009-07-19 nicklas.aven at jordogskog.no wrote:

No, dfullywithin will giva a too narrow selection since there could be bigger geometrie that wouldn't be "fullywithin" but anyway be closer.
>
> The idea is if I use your example fron bostongis, to find a approx for 300 without knowing anything about the dataset. 
>  SELECT DISTINCT ON(g1.gid)  g1.gid As gref_gid, g1.description As gref_description, g2.gid As gnn_gid, 
>g2.description As gnn_description 
>FROM sometable As g1, sometable As g2 
>WHERE g1.gid <> g2.gid AND ST_DWithin(g1.the_geom, g2.the_geom, 300) 
>ORDER BY g1.gid, ST_Distance(g1.the_geom,g2.the_geom) 
>The first step could then be to find the one with smalest max_distance to the boundingbox (because the boundingbox probably is much faster to calculate than the geometry itself). Then we take the st_distance from the geometry in that box and I think we get the smallest value in st_dwithin that we can use without knowing anything about the data.
>
> But I don't know if it's worth the effort. I guess it is if it is a widely spread dataset, with complex geometries.
>
> /Nicklas
>
>
> 2009-07-19 Paragon Corporation wrote:
>
> > >Nicklas,> > > > I can't quite see how max distance would help ST_DWithin.  It seems it would help the Fully_DWithin, but don't see how it helps the ST_DWithin.  Granted I haven't given it much thought.  > > > >Thanks,> >Regina
> >> >From: postgis-devel-bounces at postgis.refractions.net [mailto:postgis-devel-bounces at postgis.refractions.net] On Behalf Of nicklas.aven at jordogskog.no
> >Sent: Sunday, July 19, 2009 2:39 PM
> >To: PostGIS Development Discussion
> >Subject: Re: [postgis-devel] Caching function calls with CTEs
> >
> >> >
 Yes,that one I can see.
> >
> >About the nearest neighbour I have thought about using st_max_distance against the bounding boxes to find the one with smallest max_distance. That value ought to be good to use in the st_dwithin part of the query. 
> >
> >Maybe it would even be worth the effort to use st_distance from the geometry found with the "max_distance" boundingbox to usi in in st_dwithin. 
> >
> >Hope it is possible to understand what I mean. It is at least one release ahead but possible to test :-)
> >
> >
> >/Nicklas
> >
> >
>>
> >2009-07-19 Paragon Corporation wrote:
> >
> >> > >Windowing functions come in especially handy for nearest neighbor search getting top n per location.  We are going to demonstrate that in our OSCON conference.  Slides and materials will of course will be available after the conference is over.
> >>> > >From: postgis-devel-bounces at postgis.refractions.net [mailto:postgis-devel-bounces at postgis.refractions.net] On Behalf Of nicklas.aven at jordogskog.no
> >>Sent: Sunday, July 19, 2009 1:52 PM
> >>To: PostGIS Development Discussion
> >>Subject: Re: [postgis-devel] Caching function calls with CTEs
> >>
> >>> > >
 This is really cool Regina and Leo
> >>
> >>It's a new way of thinking for me to to use this recursive thing, but I can see that it is very useful.
> >>I hope I will get some time in the near future to try to understand it better.
> >>
> >>Also the windowing functions together with postgis aggregate function will be fun to try.
> >>
> >>/Nicklas
> >>
> >>
> >>
> >>
> >>
> >>2009-07-17 Paragon Corporation wrote:
> >>
> >>Nicklas,
> >>>
> >>>We tried CTEs against that annoying issue of functions not being cached in
> >>>the same row. Preliminary tests seem to suggest it might be the ideal
> >>>solution for this.
> >>>We suspect it will also be good in many other scenarios where the planner
> >>>refuses to use indexes in the right order etc.
> >>>
> >>>www.postgresonline.com/journal/index.php?/archives/127-guid.html
> >>>
> >>>We are going to test with real spatial queries next in preparation for our
> >>>OSCON presentation next week.
> >>>
> >>>en.oreilly.com/oscon2009/public/schedule/detail/7859
> >>>Sadly it looks like we probably won't have 1.4 out in time for our
> >>>presentation - oh well :(
> >>>
> >>>
> >>>Anyrate take a look and test with your data to see if it speeds up your
> >>>queries.
> >>>
> >>>L & R
> >>>
> >>>
> >>>_______________________________________________
> >>>postgis-devel mailing list
> >>>postgis-devel at postgis.refractions.net
> >>>postgis.refractions.net/mailman/listinfo/postgis-devel
> >>>
> >>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20090720/e35a4f4c/attachment.html>


More information about the postgis-devel mailing list