[Mapserver-users] RE: [postgis-users] Performance tuning PostGIS and understanding the explain statement

Benjamin Wragg bwragg at tpg.com.au
Tue Jun 17 03:26:58 EDT 2003


This is a multi-part message in MIME format.

------=_NextPart_000_000A_01C334F1.784A3E00
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: 7bit

No I don't think it did. Below is the outcomes compared. Here's the
query that I ran:

explain analyze SELECT feature.id,feature.name 
FROM feature,region
WHERE feature.the_geom && region.the_geom 
AND distance(feature.the_geom, region.the_geom)=0 
AND region.id=28 ORDER BY feature.name;

And the explain analyze without the spatial index on the feature
geometry, as you suggested:

NOTICE:  QUERY PLAN:
Sort  (cost=4612.10..4612.10 rows=44 width=80) (actual
time=203328.00..203329.00 rows=4992 loops=1)
  ->  Nested Loop  (cost=0.00..4610.89 rows=44 width=80) (actual
time=11074.00..203264.00 rows=4992loops=1)
        ->  Index Scan using region_pkey on region  (cost=0.00..3.13
rows=1 width=32) (actual time=0.00..0.00 rows=1 loops=1)
        ->  Seq Scan on feature  (cost=0.00..3063.46 rows=88246
width=48) (actual time=0.00..3023.00 rows=88246 loops=1)
Total runtime: 203332.00 msec

with a spatial index on the feature geometry

NOTICE:  QUERY PLAN:
Sort  (cost=10.36..10.36 rows=44 width=80) (actual
time=38627.00..38628.00 rows=4992 loops=1)
  ->  Nested Loop  (cost=0.00..9.16 rows=44 width=80) (actual
time=286.00..38502.00 rows=4992 loops=1)
        ->  Index Scan using region_pkey on region  (cost=0.00..3.13
rows=1 width=32) (actual time=0.00..0.00 rows=1 loops=1)
        ->  Index Scan using feature_the_geom_idx on feature
(cost=0.00..6.01 rows=1 width=48) (actual time=264.00..27370.00
rows=5133 loops=1)
Total runtime: 38637.00 msec

after running select update_geometry_stats();

NOTICE:  QUERY PLAN:
Sort  (cost=10.36..10.36 rows=44 width=80) (actual
time=39572.00..39573.00 rows=4992 loops=1)
  ->  Nested Loop  (cost=0.00..9.16 rows=44 width=80) (actual
time=145.00..39518.00 rows=4992 loops=1)
        ->  Index Scan using region_pkey on region  (cost=0.00..3.13
rows=1 width=32) (actual time=0.00..0.00 rows=1 loops=1)
        ->  Index Scan using feature_the_geom_idx on feature
(cost=0.00..6.01 rows=1 width=48) (actual time=136.00..27282.00
rows=5133 loops=1)
Total runtime: 39577.00 msec

As a test I inserted the region I'm using, which is quite complex, into
its own table and changed the query as follows:

explain analyze SELECT feature.id,feature.name 
FROM feature,region2
WHERE feature.the_geom && region2.the_geom 
AND distance(feature.the_geom, region2.the_geom)=0 
AND region2.id=0 ORDER BY feature.name;

Here is the explain analyze report:

NOTICE:  QUERY PLAN:
Sort  (cost=12.06..12.06 rows=44 width=80) (actual
time=36331.00..36331.00 rows=4992 loops=1)
  ->  Nested Loop  (cost=0.00..10.85 rows=44 width=80) (actual
time=167.00..36277.00 rows=4992 loops=1)
        ->  Index Scan using region2_pkey on region2  (cost=0.00..4.82
rows=1 width=32) (actual time=0.00..0.00 rows=1 loops=1)
        ->  Index Scan using feature_the_geom_idx on feature
(cost=0.00..6.01 rows=1 width=48) (actual time=149.00..25494.00
rows=5133 loops=1)
Total runtime: 36338.00 msec

So even with only one region it runs slow. Could it simple be that the
region is to complex?

Thanks,

Benjamin Wragg

-----Original Message-----
From: David Blasby [ <mailto:dblasby at refractions.net>
mailto:dblasby at refractions.net]
Sent: Tuesday, 17 June 2003 8:57 AM
To: bwragg at tpg.com.au
Subject: Re: [postgis-users] Performance tuning PostGIS and
understanding the explain statement


Benjamin Wragg wrote:
> I checked the list for the histogram2d and I found the treads I think
> you were talking about. So the statement SELECT
> UPDATE_GEOMETRY_STATS(); runs a function which fills the content of
> the histogram2d field in the geometry_columns table. This is then used
> by the query planner when a query is executed to better estimate which
> path to take? E.g which index to use first

Exactly right.  Did you run it and did it do a better job?

dave




------=_NextPart_000_000A_01C334F1.784A3E00
Content-Type: text/html;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Dus-ascii">
<TITLE>Message</TITLE>

<META content=3D"MSHTML 6.00.2800.1106" name=3DGENERATOR></HEAD>
<BODY><!-- Converted from text/plain format -->
<P><FONT face=3DArial size=3D2>No I don't think it did. Below is the =
outcomes=20
compared. Here's the query that I ran:<BR><BR></FONT><FONT face=3DArial=20
size=3D2>explain analyze SELECT feature.id,feature.name <BR></FONT><FONT =

face=3DArial size=3D2>FROM feature,region<BR>WHERE feature.the_geom =
&amp;&amp;=20
region.the_geom <BR>AND distance(feature.the_geom, region.the_geom)=3D0 =
<BR>AND=20
region.id=3D28 ORDER BY feature.name;</FONT></P>
<P><FONT face=3DArial size=3D2>And the explain analyze without the =
spatial index on=20
the feature geometry, as you suggested:</FONT></P>
<BLOCKQUOTE dir=3Dltr style=3D"MARGIN-RIGHT: 0px">
  <P><FONT face=3DArial size=3D2>NOTICE:&nbsp; QUERY =
PLAN:<BR></FONT><FONT=20
  face=3DArial size=3D2>Sort&nbsp; (cost=3D4612.10..4612.10 rows=3D44 =
width=3D80) (actual=20
  time=3D203328.00..203329.00 rows=3D4992 loops=3D1)<BR>&nbsp; =
-&gt;&nbsp; Nested=20
  Loop&nbsp; (cost=3D0.00..4610.89 rows=3D44 width=3D80) (actual=20
  time=3D11074.00..203264.00=20
  rows=3D4992loops=3D1)<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
-&gt;&nbsp;=20
  Index Scan using region_pkey on region&nbsp; (cost=3D0.00..3.13 =
rows=3D1 width=3D32)=20
  (actual time=3D0.00..0.00 rows=3D1=20
  loops=3D1)<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -&gt;&nbsp; =
Seq Scan on=20
  feature&nbsp; (cost=3D0.00..3063.46 rows=3D88246 width=3D48) (actual=20
  time=3D0.00..3023.00&nbsp;rows=3D88246 loops=3D1)<BR>Total runtime: =
203332.00=20
  msec</FONT></P></BLOCKQUOTE>
<P><FONT face=3DArial size=3D2>with a spatial index on the feature=20
geometry</FONT></P>
<BLOCKQUOTE dir=3Dltr style=3D"MARGIN-RIGHT: 0px">
  <P><FONT face=3DArial size=3D2>NOTICE:&nbsp; QUERY PLAN:<BR>Sort&nbsp; =

  (cost=3D10.36..10.36 rows=3D44 width=3D80) (actual =
time=3D38627.00..38628.00 rows=3D4992=20
  loops=3D1)<BR>&nbsp; -&gt;&nbsp; Nested Loop&nbsp; (cost=3D0.00..9.16 =
rows=3D44=20
  width=3D80) (actual time=3D286.00..38502.00 rows=3D4992=20
  loops=3D1)<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -&gt;&nbsp; =
Index Scan=20
  using region_pkey on region&nbsp; (cost=3D0.00..3.13 rows=3D1 =
width=3D32) (actual=20
  time=3D0.00..0.00 rows=3D1 =
loops=3D1)<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=20
  -&gt;&nbsp; Index Scan using feature_the_geom_idx on feature&nbsp;=20
  (cost=3D0.00..6.01 rows=3D1 width=3D48) (actual =
time=3D264.00..27370.00 rows=3D5133=20
  loops=3D1)<BR>Total runtime: 38637.00 msec</FONT></P></BLOCKQUOTE>
<DIV><FONT face=3DArial size=3D2>after running select=20
update_geometry_stats();</FONT></DIV>
<BLOCKQUOTE dir=3Dltr style=3D"MARGIN-RIGHT: 0px">
  <DIV><FONT face=3DArial size=3D2>NOTICE:&nbsp; QUERY =
PLAN:</FONT></DIV>
  <DIV><FONT><FONT face=3DArial size=3D2>Sort&nbsp; (cost=3D10.36..10.36 =
rows=3D44=20
  width=3D80) (actual time=3D39572.00..39573.00 rows=3D4992 =
loops=3D1)<BR>&nbsp;=20
  -&gt;&nbsp; Nested Loop&nbsp; (cost=3D0.00..9.16 rows=3D44 width=3D80) =
(actual=20
  time=3D145.00..39518.00 rows=3D4992=20
  loops=3D1)<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -&gt;&nbsp; =
Index Scan=20
  using region_pkey on region&nbsp; (cost=3D0.00..3.13 rows=3D1 =
width=3D32) (actual=20
  time=3D0.00..0.00 rows=3D1 =
loops=3D1)<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=20
  -&gt;&nbsp; Index Scan using feature_the_geom_idx on feature&nbsp;=20
  (cost=3D0.00..6.01 rows=3D1 width=3D48) (actual =
time=3D136.00..27282.00 rows=3D5133=20
  loops=3D1)<BR>Total runtime: 39577.00 msec</FONT></DIV></BLOCKQUOTE>
<P><FONT face=3DArial size=3D2>As a test I inserted the region I'm =
using, which is=20
quite complex, into its own table and changed the query as =
follows:</FONT></P>
<BLOCKQUOTE dir=3Dltr style=3D"MARGIN-RIGHT: 0px">
  <P><FONT face=3DArial size=3D2>explain analyze SELECT =
feature.id,feature.name=20
  <BR></FONT><FONT face=3DArial size=3D2>FROM feature,region2<BR>WHERE=20
  feature.the_geom &amp;&amp; region2.the_geom <BR>AND=20
  distance(feature.the_geom, region2.the_geom)=3D0 <BR>AND =
region2.id=3D0 ORDER BY=20
  feature.name;</FONT></P></BLOCKQUOTE>
<P dir=3Dltr><FONT face=3DArial size=3D2>Here is the explain analyze=20
report:</FONT></P>
<BLOCKQUOTE dir=3Dltr style=3D"MARGIN-RIGHT: 0px"><FONT face=3DArial=20
  size=3D2></FONT><FONT face=3DArial size=3D2>
  <P>NOTICE:&nbsp; QUERY PLAN:<BR></FONT><FONT face=3DArial =
size=3D2>Sort&nbsp;=20
  (cost=3D12.06..12.06 rows=3D44 width=3D80) (actual =
time=3D36331.00..36331.00 rows=3D4992=20
  loops=3D1)<BR>&nbsp; -&gt;&nbsp; Nested Loop&nbsp; (cost=3D0.00..10.85 =
rows=3D44=20
  width=3D80) (actual time=3D167.00..36277.00 rows=3D4992=20
  loops=3D1)<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -&gt;&nbsp; =
Index Scan=20
  using region2_pkey on region2&nbsp; (cost=3D0.00..4.82 rows=3D1 =
width=3D32) (actual=20
  time=3D0.00..0.00 rows=3D1 =
loops=3D1)<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=20
  -&gt;&nbsp; Index Scan using feature_the_geom_idx on feature&nbsp;=20
  (cost=3D0.00..6.01 rows=3D1 width=3D48) (actual =
time=3D149.00..25494.00 rows=3D5133=20
  loops=3D1)<BR>Total runtime: 36338.00 msec</FONT></P></BLOCKQUOTE>
<P dir=3Dltr><FONT face=3DArial size=3D2>So even with only one region it =
runs slow.=20
Could it simple be that the region is to complex?</FONT></P>
<P dir=3Dltr><FONT face=3DArial size=3D2>Thanks,</FONT></P>
<DIV><FONT face=3DArial size=3D2>Benjamin Wragg</FONT></DIV>
<DIV><FONT face=3DArial color=3D#0000ff size=3D2></FONT></FONT><BR><FONT =
face=3DArial=20
size=3D2>-----Original Message-----<BR>From: David Blasby [</FONT><A=20
href=3D"mailto:dblasby at refractions.net"><FONT face=3DArial =
color=3D#000000=20
size=3D2>mailto:dblasby at refractions.net</FONT></A><FONT face=3DArial=20
size=3D2>]<BR>Sent: Tuesday, 17 June 2003 8:57 AM<BR>To:=20
bwragg at tpg.com.au<BR>Subject: Re: [postgis-users] Performance tuning =
PostGIS and=20
understanding the explain statement<BR><BR><BR>Benjamin Wragg =
wrote:<BR>&gt; I=20
checked the list for the histogram2d and I found the treads I =
think<BR>&gt; you=20
were talking about. So the statement SELECT<BR>&gt; =
UPDATE_GEOMETRY_STATS();=20
runs a function which fills the content of<BR>&gt; the histogram2d field =
in the=20
geometry_columns table. This is then used<BR>&gt; by the query planner =
when a=20
query is executed to better estimate which<BR>&gt; path to take? E.g =
which index=20
to use first<BR><BR>Exactly right.&nbsp; Did you run it and did it do a =
better=20
job?<BR><BR>dave<BR><BR><BR></FONT></DIV></BODY></HTML>

------=_NextPart_000_000A_01C334F1.784A3E00--




More information about the mapserver-users mailing list