[postgis-users] Query performance help

Wed Mar 23 08:10:11 PST 2005

Some potential gotchas:
- I have been suckered in the past when it turned out I had mismatched 
types in the columns I was joining. (an int8 on one side and an int4 on 
the other). This caused postgresql to choose a terrible query plan.
- select count(*) is hacked (as are many things) in mysql to make it 
"fast". if you want a fast row count in postgresql you have to use 
"select count(primary_key_column)".

Finally, useful testing:
- use "explain analyze <query>" to compare the expected and realized 
times for various parts of your query. If you find places the query 
planner is making bad choices, it will either (a) show you a place you 
screwed up your data load or (b) hints to optimize around.

P

Stephen Woodbridge wrote:

> This is slightly OT, but I just migrated from MySQL to postgres/postgis 
> and I'm having a problem with query performance. Any thoughts would be 
> appreciated as I feel this is just something dumb because don't know the 
> ins and outs of postgres yet.
> 
> I have three tables abbreviated but like (if you are familiar with it, 
> these are from the census summary file info) and my goal is to generate 
> thematic maps. In this case of zipcode tabulation areas (zcta):
> 
> table: sf3geo
>   zcta5
>   sumlev
>   logrecno
> 
> table: dp3
>   logrecno
>   lots of attribute cols
> 
> table: zcta
>   zcta5
>   the_geom
> 
> So I need to select all the sf3geo.sumlev=871 to just get the zcta 
> records and join it to dp3 based on logrecno and join it to zcta based 
> on zcta5.

> Just trying to do the first join is a killer!
> 
> explain select b.zcta5, round(a.dp30063/200001*8) as income from sf3geo 
> b, dp3 a where b.sumlev=871 and b.logrecno=a.logrecno;
> 
> Nested Loop  (cost=0.00..460665.14 rows=132237 width=8)
>   ->  Index Scan using sf3geo_sumlev_idx on sf3geo b 
> (cost=0.00..29652.79 rows=7491 width=15)
>         Index Cond: ((sumlev)::text = '871'::text)
>   ->  Index Scan using dp3_logrecno_idx on dp3 a  (cost=0.00..57.14 
> rows=18 width=15)
>         Index Cond: (("outer".logrecno)::text = (a.logrecno)::text)
> 
> In fact, just counting the rows is slow:
> 
> explain select count(*) from sf3geo where sumlev=871;
> 
> Aggregate  (cost=29671.52..29671.52 rows=1 width=0)
>   ->  Index Scan using sf3geo_sumlev_idx on sf3geo (cost=0.00..29652.79 
> rows=7491 width=0)
>         Index Cond: ((sumlev)::text = '871'::text)
> 
> So I am wondering what am I missing that make this go faster? Can using 
> view improve performance? For example by pre-filtering the sf3geo to 
> only show the sumlev=871 records?
> 
> I did vaccuum analyze all the tables, but I changed some indexes so may 
> be I need to do that again.
> 
> -Steve
> 
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users