[postgis-users] Query performance issue
Robert Burgholzer
rburghol at chesapeakebay.net
Fri Oct 14 07:55:35 PDT 2005
I left out query 4:
select a.lrseg, b.coseg, b.stseg, b.segment, b.subbasin,
intersects(b.the_geom, a.the_geom) as ovp
from p5lrsegs as a join p4cosegs as b
ON (a.the_geom && b.the_geom)
and a.landseg like 'A24%';
TIME: 1.5 minutes
On Fri, 2005-10-14 at 10:45, Robert Burgholzer wrote:
> Stephen,
> My tests have been on two polygons, not a polygon with a line, but I
> believe they may be valid. I believe the majority of the performance hit
> is on the distance function. The query planner (EXPLAIN) cannot
> predict the performance hit that will be caused by calling another
> function, which is why it becomes less than revealing. In a nutshell,
> the my findings were as follows: intersects() is far superior to
> distance(). If you want to read more, or see if I screwed up my analysis
> in some way, read on.
>
> I tested 4 queries (shown below):
> 1) the distance function plus a spatial overlap &&,
> 2) only a spatial overlap &&,
> 3) a spatial overlap && and an area2d(intersection())/area2d (to
> determine if the percent overlap was above some threshold)
> 4) using a spatial overlap && and a call to intersects()
>
> My findings were that:
> 1) using the distance function took 10 times longer than the combined
> area2d(intersection())/area2d, and was 250 times slower than the &&
> alone
> 2) the combined area/intersection function took 25 times longer than the
> indexed spatial && overlap by itself
> 3) the intersects() function was about 2 X faster than the
> area2d(intersection)
>
> These results suggest that you may speed up your query several fold by
> dumping the distance() function for an intersects() (if that applies for
> a line and polygon, which I don;t know).
> Clearly the distance function itself is cumbersome, however, the
> intersection functions are also slowin comparison to only the spatial
> overlap. I wonder if:
>
> 1) the pl/pgsql function interface causes an overhead hit?
> 2) Since many people have suggested that the distance function is the
> way to go, has some change crept into this function that has degraded
> its performnance substantially?
>
>
> TEST RESULTS:
> Result set has about 2,000 rows.
>
> QUERIES:
> Query #1:
> select a.lrseg, b.coseg, b.stseg, b.segment, b.subbasin,
> distance(b.the_geom, a.the_geom) as ovp
> from p5lrsegs as a join p4cosegs as b
> ON (a.the_geom && b.the_geom)
> and a.landseg like 'A24%';
>
> TIME: 3 minutes
>
> Query #2:
> select a.lrseg, b.coseg, b.stseg, b.segment, b.subbasin
> from p5lrsegs as a join p4cosegs as b
> ON (a.the_geom && b.the_geom)
> and a.landseg like 'A24%';
>
> TIME: 7 seconds
>
> Query #3:
> select a.lrseg, b.coseg, b.stseg, b.segment, b.subbasin,
> distance(b.the_geom, a.the_geom) as dist
> from p5lrsegs as a join p4cosegs as b
> ON (a.the_geom && b.the_geom)
> and a.landseg like 'A24%';
>
> TIME: 34 minutes
>
> Query 4:
>
>
> On Thu, 2005-10-13 at 17:47, Stephen Woodbridge wrote:
> > So here are the variations on the UPDATE query and the respective query
> > plans. Given that Plan 1: ran for over 16 hrs and the estimated runtime
> > based on number of rows that had been updated is 261 hrs, does anyone
> > have any suggestion as to which might be the next best to try?
> >
> > -Steve
> >
> > ### Plan 1: ran for 16 hrs and estimate runtime is 261 hrs
> >
> > explain update streets set prov=prov_ab from province b where
> > streets.the_geom && b.the_geom and distance(streets.the_geom,
> > b.the_geom) = 0.0
> >
> > Nested Loop (cost=0.00..581.99 rows=17990 width=307)
> > Join Filter: (distance("inner".the_geom, "outer".the_geom) =
> > 0::double precision)
> > -> Seq Scan on province b (cost=0.00..1.14 rows=14 width=92196)
> > -> Index Scan using streets_gidx on streets (cost=0.00..38.12
> > rows=9 width=301)
> > Index Cond: (streets.the_geom && "outer".the_geom)
> >
> > ### Plan 2:
> >
> > explain UPDATE streets SET prov = (
> > SELECT prov_ab
> > FROM province b
> > WHERE streets.the_geom && b.the_geom AND intersects(streets.the_geom,
> > b.the_geom)
> > ORDER BY prov_ab
> > LIMIT 1
> > );
> >
> > Seq Scan on streets (cost=0.00..2296390.04 rows=1798897 width=301)
> > SubPlan
> > -> Limit (cost=1.22..1.22 rows=1 width=6)
> > -> Sort (cost=1.22..1.22 rows=1 width=6)
> > Sort Key: prov_ab
> > -> Seq Scan on province b (cost=0.00..1.21 rows=1
> > width=6)
> > Filter: (($0 && the_geom) AND intersects($0,
> > the_geom))
> >
> > ### Plan 3:
> >
> > explain UPDATE streets SET prov = (
> > SELECT prov_ab
> > FROM streets JOIN province b
> > ON streets.the_geom && b.the_geom WHERE intersects(streets.the_geom,
> > b.the_geom)
> > ORDER BY prov_ab
> > LIMIT 1
> > );
> >
> >
> > Seq Scan on streets (cost=171335.34..264076.56 rows=1798897 width=301)
> > InitPlan
> > -> Limit (cost=171335.34..171335.34 rows=1 width=6)
> > -> Sort (cost=171335.34..174333.52 rows=1199270 width=6)
> > Sort Key: b.prov_ab
> > -> Nested Loop (cost=0.00..536.70 rows=1199270 width=6)
> > Join Filter: intersects("inner".the_geom,
> > "outer".the_geom)
> > -> Seq Scan on province b (cost=0.00..1.14
> > rows=14 width=92196)
> > -> Index Scan using streets_gidx on streets
> > (cost=0.00..38.12 rows=9 width=160)
> > Index Cond: (streets.the_geom &&
> > "outer".the_geom)
> >
> >
> > ### Plan 4:
> >
> > explain update streets set prov=prov_ab from province b where
> > ( select count(*) from streets, province
> > where streets.the_geom && province.the_geom )=1
> > or
> > ( streets.the_geom && b.the_geom
> > and distance(streets.the_geom, b.the_geom) = 0.0 );
> >
> > Nested Loop (cost=9530.91..1748998.34 rows=143822 width=307)
> > Join Filter: (($0 = 1) OR (("inner".the_geom && "outer".the_geom) AND
> > (distance("inner".the_geom, "outer".the_geom) = 0::double precision)))
> > InitPlan
> > -> Aggregate (cost=9530.91..9530.91 rows=1 width=0)
> > -> Nested Loop (cost=0.00..536.38 rows=3597808 width=0)
> > -> Seq Scan on province (cost=0.00..1.14 rows=14
> > width=92190)
> > -> Index Scan using streets_gidx on streets
> > (cost=0.00..38.12 rows=9 width=160)
> > Index Cond: (streets.the_geom && "outer".the_geom)
> > -> Seq Scan on province b (cost=0.00..1.14 rows=14 width=92196)
> > -> Seq Scan on streets (cost=0.00..88243.97 rows=1798897 width=301)
> >
> > _______________________________________________
> > postgis-users mailing list
> > postgis-users at postgis.refractions.net
> > http://postgis.refractions.net/mailman/listinfo/postgis-users
--
Non-point Source Data Analyst
University of Maryland, College Park
Chesapeake Bay Program Office
410 Severn Avenue, Suite 305B
Annapolis, MD, 21403
Phone: (410) 267-5779
rburghol at chesapeakebay.net
More information about the postgis-users
mailing list