[postgis-users] Performance tuning PostGIS and understanding the explain statement

Thu Jun 12 16:25:56 PDT 2003

Hi list,

Before I start, please excuse what might be a trivial question. It's
regarding the explain statement. I've never really understood how to
read it correctly. I know that it shows the query execution plan, but
how am I meant to understand the output? For example could someone give
me some pointers as to what the execution plan is on following queries?

EXPLAIN

explain SELECT feature.id,feature.name 

FROM feature,region 

WHERE region.id=28 

AND (feature.the_geom && region.the_geom AND distance(feature.the_geom,
region.the_geom)=0);

NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..9.04 rows=44 width=80)

  ->  Index Scan using region_pkey on region  (cost=0.00..3.01 rows=1
width=32)

  ->  Index Scan using feature_the_geom_idx on feature  (cost=0.00..6.01
rows=1 width=48)

EXPLAIN ANALYZE

explain analyze SELECT feature.id,feature.name 

FROM feature,region 

WHERE region.id=28 

AND (feature.the_geom && region.the_geom AND distance(feature.the_geom,
region.the_geom)=0);

NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..9.04 rows=44 width=80) (actual
time=114.00..30210.00 rows=4992 loops=1)

  ->  Index Scan using region_pkey on region  (cost=0.00..3.01 rows=1
width=32) (actual time=0.00..0.00 rows=1 loops=1)

  ->  Index Scan using feature_the_geom_idx on feature  (cost=0.00..6.01
rows=1 width=48) (actual time=105.00..21179.00 rows=5133 loops=1)

Total runtime: 30238.00 msec

Is this saying that it uses a loop and each loop it gets the region and
compares it to a feature? Is the time killing in the comparison or the
features?

If I change the query slightly by removing some brackets and adding an
order by I get the following results:

EXPLAIN

explain SELECT feature.id,feature.name 

FROM feature,region 

WHERE feature.the_geom && region.the_geom AND distance(feature.the_geom,
region.the_geom)=0 

AND region.id=28 

ORDER BY feature.name;

NOTICE:  QUERY PLAN:

Sort  (cost=10.25..10.25 rows=44 width=80)

  ->  Nested Loop  (cost=0.00..9.04 rows=44 width=80)

        ->  Index Scan using region_pkey on region  (cost=0.00..3.01
rows=1 width=32)

        ->  Index Scan using feature_the_geom_idx on feature
(cost=0.00..6.01 rows=1 width=48)

EXPLAIN ANALYZE

explain analyze SELECT feature.id,feature.name 

FROM feature,region

WHERE feature.the_geom && region.the_geom AND distance(feature.the_geom,
region.the_geom)=0 

AND region.id=28 

ORDER BY feature.name;

NOTICE:  QUERY PLAN:

Sort  (cost=10.25..10.25 rows=44 width=80) (actual
time=30720.00..30721.00 rows=4992 loops=1)

  ->  Nested Loop  (cost=0.00..9.04 rows=44 width=80) (actual
time=113.00..30678.00 rows=4992 loops=1)

        ->  Index Scan using region_pkey on region  (cost=0.00..3.01
rows=1 width=32) (actual time=0.00..0.00 rows=1 loops=1)

        ->  Index Scan using feature_the_geom_idx on feature
(cost=0.00..6.01 rows=1 width=48) (actual time=105.00..21584.00
rows=5133 loops=1)

Total runtime: 30725.00 msec

What is the order of things here? Is it first of all sorting all the
features by there name then looping through each feature comparing it to
the region?

Thanks,

Benjamin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20030613/b5078b1a/attachment.html>