[postgis-users] Query performance issue

Fri Oct 14 07:55:35 PDT 2005

I left out query 4:
select a.lrseg, b.coseg, b.stseg, b.segment, b.subbasin,
   intersects(b.the_geom, a.the_geom) as ovp
from p5lrsegs as a join p4cosegs as b
ON (a.the_geom && b.the_geom)
and a.landseg like 'A24%';

TIME: 1.5 minutes


On Fri, 2005-10-14 at 10:45, Robert Burgholzer wrote:
> Stephen,
> My tests have been on two polygons, not a polygon with a line, but I
> believe they may be valid. I believe the majority of the performance hit
> is on the distance function.   The query planner (EXPLAIN) cannot
> predict the performance hit that will be caused by calling another
> function, which is why it becomes less than revealing. In a nutshell,
> the my findings were as follows: intersects() is far superior to
> distance(). If you want to read more, or see if I screwed up my analysis
> in some way, read on.
> 
> I tested 4 queries (shown below):
> 1) the distance function plus a spatial overlap &&, 
> 2) only a spatial overlap &&,
> 3) a spatial overlap && and an area2d(intersection())/area2d (to
> determine if the percent overlap was above some threshold)
> 4) using a spatial overlap &&  and a call to intersects()
> 
> My findings were that:
> 1) using the distance function took 10 times longer than the combined
> area2d(intersection())/area2d, and was 250 times slower than the &&
> alone
> 2) the combined area/intersection function took 25 times longer than the
> indexed spatial && overlap by itself 
> 3) the intersects() function was about 2 X faster than the
> area2d(intersection)
> 
> These results suggest that you may speed up your query several fold by
> dumping the distance() function for an intersects() (if that applies for
> a line and polygon, which I don;t know).
> Clearly the distance function itself is cumbersome, however, the
> intersection functions are also slowin comparison to only the spatial
> overlap. I wonder if:
> 
> 1) the pl/pgsql function interface causes an overhead hit?
> 2) Since many people have suggested that the distance function is the
> way to go, has some change crept into this function that has degraded
> its performnance substantially?
> 
> 
> TEST RESULTS:
> Result set has about 2,000 rows.
> 
> QUERIES:
> Query #1:
> select a.lrseg, b.coseg, b.stseg, b.segment, b.subbasin,
>     distance(b.the_geom, a.the_geom) as ovp
> from p5lrsegs as a join p4cosegs as b
> ON (a.the_geom && b.the_geom)
> and a.landseg like 'A24%';
> 
> TIME: 3 minutes
> 
> Query #2:
> select a.lrseg, b.coseg, b.stseg, b.segment, b.subbasin
> from p5lrsegs as a join p4cosegs as b
> ON (a.the_geom && b.the_geom)
> and a.landseg like 'A24%';
> 
> TIME: 7 seconds
> 
> Query #3:
> select a.lrseg, b.coseg, b.stseg, b.segment, b.subbasin,
>    distance(b.the_geom, a.the_geom) as dist
> from p5lrsegs as a join p4cosegs as b
> ON (a.the_geom && b.the_geom)
> and a.landseg like 'A24%';
> 
> TIME: 34 minutes
> 
> Query 4:
> 
> 
> On Thu, 2005-10-13 at 17:47, Stephen Woodbridge wrote:
> > So here are the variations on the UPDATE query and the respective query 
> > plans. Given that Plan 1: ran for over 16 hrs and the estimated runtime 
> > based on number of rows that had been updated is 261 hrs, does anyone 
> > have any suggestion as to which might be the next best to try?
> > 
> > -Steve
> > 
> > ### Plan 1: ran for 16 hrs and estimate runtime is 261 hrs
> > 
> > explain update streets set prov=prov_ab from province b where
> > streets.the_geom && b.the_geom and distance(streets.the_geom, 
> > b.the_geom) = 0.0
> > 
> > Nested Loop  (cost=0.00..581.99 rows=17990 width=307)
> >    Join Filter: (distance("inner".the_geom, "outer".the_geom) = 
> > 0::double precision)
> >    ->  Seq Scan on province b  (cost=0.00..1.14 rows=14 width=92196)
> >    ->  Index Scan using streets_gidx on streets  (cost=0.00..38.12 
> > rows=9 width=301)
> >          Index Cond: (streets.the_geom && "outer".the_geom)
> > 
> > ### Plan 2:
> > 
> > explain UPDATE streets SET prov = (
> >    SELECT prov_ab
> >    FROM province b
> >    WHERE streets.the_geom && b.the_geom AND intersects(streets.the_geom,
> >    b.the_geom)
> >    ORDER BY prov_ab
> >    LIMIT 1
> > );
> > 
> > Seq Scan on streets  (cost=0.00..2296390.04 rows=1798897 width=301)
> >    SubPlan
> >      ->  Limit  (cost=1.22..1.22 rows=1 width=6)
> >            ->  Sort  (cost=1.22..1.22 rows=1 width=6)
> >                  Sort Key: prov_ab
> >                  ->  Seq Scan on province b  (cost=0.00..1.21 rows=1 
> > width=6)
> >                        Filter: (($0 && the_geom) AND intersects($0, 
> > the_geom))
> > 
> > ### Plan 3:
> > 
> > explain UPDATE streets SET prov = (
> >    SELECT prov_ab
> >    FROM streets JOIN province b
> >    ON streets.the_geom && b.the_geom WHERE intersects(streets.the_geom,
> >    b.the_geom)
> >    ORDER BY prov_ab
> >    LIMIT 1
> > );
> > 
> > 
> > Seq Scan on streets  (cost=171335.34..264076.56 rows=1798897 width=301)
> >    InitPlan
> >      ->  Limit  (cost=171335.34..171335.34 rows=1 width=6)
> >            ->  Sort  (cost=171335.34..174333.52 rows=1199270 width=6)
> >                  Sort Key: b.prov_ab
> >                  ->  Nested Loop  (cost=0.00..536.70 rows=1199270 width=6)
> >                        Join Filter: intersects("inner".the_geom, 
> > "outer".the_geom)
> >                        ->  Seq Scan on province b  (cost=0.00..1.14 
> > rows=14 width=92196)
> >                        ->  Index Scan using streets_gidx on streets 
> > (cost=0.00..38.12 rows=9 width=160)
> >                              Index Cond: (streets.the_geom && 
> > "outer".the_geom)
> > 
> > 
> > ### Plan 4:
> > 
> > explain update streets set prov=prov_ab from province b where
> >    ( select count(*) from streets, province
> >       where streets.the_geom && province.the_geom )=1
> > or
> >    ( streets.the_geom && b.the_geom
> >      and distance(streets.the_geom, b.the_geom) = 0.0 );
> > 
> > Nested Loop  (cost=9530.91..1748998.34 rows=143822 width=307)
> >    Join Filter: (($0 = 1) OR (("inner".the_geom && "outer".the_geom) AND 
> > (distance("inner".the_geom, "outer".the_geom) = 0::double precision)))
> >    InitPlan
> >      ->  Aggregate  (cost=9530.91..9530.91 rows=1 width=0)
> >            ->  Nested Loop  (cost=0.00..536.38 rows=3597808 width=0)
> >                  ->  Seq Scan on province  (cost=0.00..1.14 rows=14 
> > width=92190)
> >                  ->  Index Scan using streets_gidx on streets 
> > (cost=0.00..38.12 rows=9 width=160)
> >                        Index Cond: (streets.the_geom && "outer".the_geom)
> >    ->  Seq Scan on province b  (cost=0.00..1.14 rows=14 width=92196)
> >    ->  Seq Scan on streets  (cost=0.00..88243.97 rows=1798897 width=301)
> > 
> > _______________________________________________
> > postgis-users mailing list
> > postgis-users at postgis.refractions.net
> > http://postgis.refractions.net/mailman/listinfo/postgis-users
-- 
Non-point Source Data Analyst
University of Maryland, College Park
Chesapeake Bay Program Office
410 Severn Avenue, Suite 305B
Annapolis, MD, 21403
Phone: (410) 267-5779

rburghol at chesapeakebay.net