[postgis-users] Improving performance of query

Kevin Neufeld kneufeld at refractions.net
Thu May 15 23:55:27 PDT 2008


Hi Bryan,

I don't think this query will do what you are expecting it to do.  I 
think you are missing a reference from your outer table to your inner 
query (as well as a geometry index reference).
ie.

insert into tptemp (id, the_geom)
  select a.gid,
    (select geomunion(b.the_geom) from parcels x, trees y
     where x.the_geom && y.the_geom
     and overlaps(x.the_geom, y.the_geom)
     and a.gid = x.gid
    )
  from parcels a group by gid;

Are you convinced the sub-query is the way to go here?  What does the 
query planner say about changing this to a simple join query?
i.e.
insert into tptemp (id, the_geom)
  select a.gid, geomunion(b.the_geom)
  from parcels a, trees b
  where a.the_geom && b.the_geom
  and overlaps(a.the_geom, b.the_geom)
  group by a.gid;


Also, about many and how large are your 'trees' dataset?  In the past, I 
found that geomunion has serious performance issues when merging more 
than a few geometries at a time.  What are the first few results of this 
query? (If this runs relatively fast, then at least you know that your 
performance loss if happening in the geomunion aggregation).

  select a.gid, count(*)
  from parcels a, trees b
  where a.the_geom && b.the_geom
  and overlaps(a.the_geom, b.the_geom)
  group by a.gid
  order by count desc;

Have you tried using the buffer(geom, 0) hack/trick that's out there?  
This is probably not the best approach since buffering by zero is really 
exploiting a side-effect of the method (which doesn't work in all 
cases), but if it works for you, great.

insert into tptemp (id, the_geom)
  select a.gid, buffer(collect(b.the_geom), 0)
  from parcels a, trees b
  where a.the_geom && b.the_geom
  and overlaps(a.the_geom, b.the_geom)
  group by a.gid;


And yes, ST_Distance=0 may give you different results than ST_Overlaps() 
in the cases where the boundaries of two geometries touch (ie. at a 
point or on a line) and not actually overlap.

To answer your question, yes, when in doubt, it is better to invoke the 
&& operator and use the index for spatial queries.  The query planner 
should properly decide whether to use the index or not.

Hope this helps,
Kevin


Bryan K R Manuel wrote:
> Hi
>
> I have been working on improving the performance of the following query:
>
> insert into tptemp (id, the_geom)
>   select a.gid,
>     (select geomunion(b.the_geom) from parcels a, trees b
>       where overlaps(a.the_geom, b.the_geom)
>     )
>   from parcels a group by gid;
>
> I have tried changing the where clause to ST_Distance(a.the_geom, b.the_geom) = 0, which gave me slightly different results and a very slightly better query time.  
>
> Looking through the PostGIS Reference, ST_Distance()=0, ST_Intersects(), ST_Touches() and ST_Overlaps() all appear to be able to work in the where clause.  Will any of these work faster or better?  My results have been mixed.
>
> I also see that ST_Distance() does not use indexes, whereas the other do use indexes - is it better to use indexes or not to use indexes for larger spatial queries?
>
> Thanks
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>   



More information about the postgis-users mailing list