[postgis-users] Query performance help

Wed Mar 23 08:33:04 PST 2005

Another thought on improving speed.  If you are joining on a field or
field(s) that you know has unique data - make sure to use a unique or
primary key index for that field (or a compound unique index if more than
one field) - not just a regular old index

E.g. if your logrecno in sf3geo is unique declare it as a unique or primary
index not just a regular old index.

I think this helps the optimizer to know when to stop an indexscan - as once
its got one record it knows it doesn't need to look for anymore.

-----Original Message-----
From: Stephen Woodbridge [mailto:woodbri at swoodbridge.com] 
Sent: Wednesday, March 23, 2005 9:55 AM
To: PostGIS Users Discussion
Subject: [postgis-users] Query performance help

Hi all,

This is slightly OT, but I just migrated from MySQL to postgres/postgis 
and I'm having a problem with query performance. Any thoughts would be 
appreciated as I feel this is just something dumb because don't know the 
ins and outs of postgres yet.

I have three tables abbreviated but like (if you are familiar with it, 
these are from the census summary file info) and my goal is to generate 
thematic maps. In this case of zipcode tabulation areas (zcta):

table: sf3geo
   zcta5
   sumlev
   logrecno

table: dp3
   logrecno
   lots of attribute cols

table: zcta
   zcta5
   the_geom

So I need to select all the sf3geo.sumlev=871 to just get the zcta 
records and join it to dp3 based on logrecno and join it to zcta based 
on zcta5.

Just trying to do the first join is a killer!

explain select b.zcta5, round(a.dp30063/200001*8) as income from sf3geo 
b, dp3 a where b.sumlev=871 and b.logrecno=a.logrecno;

Nested Loop  (cost=0.00..460665.14 rows=132237 width=8)
   ->  Index Scan using sf3geo_sumlev_idx on sf3geo b 
(cost=0.00..29652.79 rows=7491 width=15)
         Index Cond: ((sumlev)::text = '871'::text)
   ->  Index Scan using dp3_logrecno_idx on dp3 a  (cost=0.00..57.14 
rows=18 width=15)
         Index Cond: (("outer".logrecno)::text = (a.logrecno)::text)

In fact, just counting the rows is slow:

explain select count(*) from sf3geo where sumlev=871;

Aggregate  (cost=29671.52..29671.52 rows=1 width=0)
   ->  Index Scan using sf3geo_sumlev_idx on sf3geo 
(cost=0.00..29652.79 rows=7491 width=0)
         Index Cond: ((sumlev)::text = '871'::text)

So I am wondering what am I missing that make this go faster? Can using 
view improve performance? For example by pre-filtering the sf3geo to 
only show the sumlev=871 records?

I did vaccuum analyze all the tables, but I changed some indexes so may 
be I need to do that again.

-Steve

_______________________________________________
postgis-users mailing list postgis-users at postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users