[postgis-users] Query performance issue

Robert Burgholzer rburghol at chesapeakebay.net
Fri Oct 14 07:45:52 PDT 2005


Stephen,
My tests have been on two polygons, not a polygon with a line, but I
believe they may be valid. I believe the majority of the performance hit
is on the distance function.   The query planner (EXPLAIN) cannot
predict the performance hit that will be caused by calling another
function, which is why it becomes less than revealing. In a nutshell,
the my findings were as follows: intersects() is far superior to
distance(). If you want to read more, or see if I screwed up my analysis
in some way, read on.

I tested 4 queries (shown below):
1) the distance function plus a spatial overlap &&, 
2) only a spatial overlap &&,
3) a spatial overlap && and an area2d(intersection())/area2d (to
determine if the percent overlap was above some threshold)
4) using a spatial overlap &&  and a call to intersects()

My findings were that:
1) using the distance function took 10 times longer than the combined
area2d(intersection())/area2d, and was 250 times slower than the &&
alone
2) the combined area/intersection function took 25 times longer than the
indexed spatial && overlap by itself 
3) the intersects() function was about 2 X faster than the
area2d(intersection)

These results suggest that you may speed up your query several fold by
dumping the distance() function for an intersects() (if that applies for
a line and polygon, which I don;t know).
Clearly the distance function itself is cumbersome, however, the
intersection functions are also slowin comparison to only the spatial
overlap. I wonder if:

1) the pl/pgsql function interface causes an overhead hit?
2) Since many people have suggested that the distance function is the
way to go, has some change crept into this function that has degraded
its performnance substantially?


TEST RESULTS:
Result set has about 2,000 rows.

QUERIES:
Query #1:
select a.lrseg, b.coseg, b.stseg, b.segment, b.subbasin,
    distance(b.the_geom, a.the_geom) as ovp
from p5lrsegs as a join p4cosegs as b
ON (a.the_geom && b.the_geom)
and a.landseg like 'A24%';

TIME: 3 minutes

Query #2:
select a.lrseg, b.coseg, b.stseg, b.segment, b.subbasin
from p5lrsegs as a join p4cosegs as b
ON (a.the_geom && b.the_geom)
and a.landseg like 'A24%';

TIME: 7 seconds

Query #3:
select a.lrseg, b.coseg, b.stseg, b.segment, b.subbasin,
   distance(b.the_geom, a.the_geom) as dist
from p5lrsegs as a join p4cosegs as b
ON (a.the_geom && b.the_geom)
and a.landseg like 'A24%';

TIME: 34 minutes

Query 4:


On Thu, 2005-10-13 at 17:47, Stephen Woodbridge wrote:
> So here are the variations on the UPDATE query and the respective query 
> plans. Given that Plan 1: ran for over 16 hrs and the estimated runtime 
> based on number of rows that had been updated is 261 hrs, does anyone 
> have any suggestion as to which might be the next best to try?
> 
> -Steve
> 
> ### Plan 1: ran for 16 hrs and estimate runtime is 261 hrs
> 
> explain update streets set prov=prov_ab from province b where
> streets.the_geom && b.the_geom and distance(streets.the_geom, 
> b.the_geom) = 0.0
> 
> Nested Loop  (cost=0.00..581.99 rows=17990 width=307)
>    Join Filter: (distance("inner".the_geom, "outer".the_geom) = 
> 0::double precision)
>    ->  Seq Scan on province b  (cost=0.00..1.14 rows=14 width=92196)
>    ->  Index Scan using streets_gidx on streets  (cost=0.00..38.12 
> rows=9 width=301)
>          Index Cond: (streets.the_geom && "outer".the_geom)
> 
> ### Plan 2:
> 
> explain UPDATE streets SET prov = (
>    SELECT prov_ab
>    FROM province b
>    WHERE streets.the_geom && b.the_geom AND intersects(streets.the_geom,
>    b.the_geom)
>    ORDER BY prov_ab
>    LIMIT 1
> );
> 
> Seq Scan on streets  (cost=0.00..2296390.04 rows=1798897 width=301)
>    SubPlan
>      ->  Limit  (cost=1.22..1.22 rows=1 width=6)
>            ->  Sort  (cost=1.22..1.22 rows=1 width=6)
>                  Sort Key: prov_ab
>                  ->  Seq Scan on province b  (cost=0.00..1.21 rows=1 
> width=6)
>                        Filter: (($0 && the_geom) AND intersects($0, 
> the_geom))
> 
> ### Plan 3:
> 
> explain UPDATE streets SET prov = (
>    SELECT prov_ab
>    FROM streets JOIN province b
>    ON streets.the_geom && b.the_geom WHERE intersects(streets.the_geom,
>    b.the_geom)
>    ORDER BY prov_ab
>    LIMIT 1
> );
> 
> 
> Seq Scan on streets  (cost=171335.34..264076.56 rows=1798897 width=301)
>    InitPlan
>      ->  Limit  (cost=171335.34..171335.34 rows=1 width=6)
>            ->  Sort  (cost=171335.34..174333.52 rows=1199270 width=6)
>                  Sort Key: b.prov_ab
>                  ->  Nested Loop  (cost=0.00..536.70 rows=1199270 width=6)
>                        Join Filter: intersects("inner".the_geom, 
> "outer".the_geom)
>                        ->  Seq Scan on province b  (cost=0.00..1.14 
> rows=14 width=92196)
>                        ->  Index Scan using streets_gidx on streets 
> (cost=0.00..38.12 rows=9 width=160)
>                              Index Cond: (streets.the_geom && 
> "outer".the_geom)
> 
> 
> ### Plan 4:
> 
> explain update streets set prov=prov_ab from province b where
>    ( select count(*) from streets, province
>       where streets.the_geom && province.the_geom )=1
> or
>    ( streets.the_geom && b.the_geom
>      and distance(streets.the_geom, b.the_geom) = 0.0 );
> 
> Nested Loop  (cost=9530.91..1748998.34 rows=143822 width=307)
>    Join Filter: (($0 = 1) OR (("inner".the_geom && "outer".the_geom) AND 
> (distance("inner".the_geom, "outer".the_geom) = 0::double precision)))
>    InitPlan
>      ->  Aggregate  (cost=9530.91..9530.91 rows=1 width=0)
>            ->  Nested Loop  (cost=0.00..536.38 rows=3597808 width=0)
>                  ->  Seq Scan on province  (cost=0.00..1.14 rows=14 
> width=92190)
>                  ->  Index Scan using streets_gidx on streets 
> (cost=0.00..38.12 rows=9 width=160)
>                        Index Cond: (streets.the_geom && "outer".the_geom)
>    ->  Seq Scan on province b  (cost=0.00..1.14 rows=14 width=92196)
>    ->  Seq Scan on streets  (cost=0.00..88243.97 rows=1798897 width=301)
> 
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
-- 
Non-point Source Data Analyst
University of Maryland, College Park
Chesapeake Bay Program Office
410 Severn Avenue, Suite 305B
Annapolis, MD, 21403
Phone: (410) 267-5779

rburghol at chesapeakebay.net




More information about the postgis-users mailing list