[postgis-users] OT Understanding slow queries

Thu Aug 25 06:41:27 PDT 2011

If this is too off topic, please let me know and I'll sign up on a postgres list to get help. But this is related to my use of postgis and If anyone knows this stuff, it's you guys.

I have an example query that I expect to be much faster, but my main concern is we are about to do some visualization of historical congestion data which will require queries across much larger data sets - like 150 million records a day. We are about to test using partitions but the number per table will still be much larger than what I am dealing with now.

So here is a query I would think would be much faster than 43 seconds for two tables, one with about 97k rows, and the other 3.2 million.

explain select count(l.*) 
from links l, source_link ld where l.link_id = ld.link_id;
/*
'Aggregate  (cost=174731.72..174731.73 rows=1 width=32)'
'  ->  Hash Join  (cost=13024.27..166784.14 rows=3179029 width=32)'
'        Hash Cond: (ld.link_id = l.link_id)'
'        ->  Seq Scan on source_link ld  (cost=0.00..58282.29 rows=3179029 width=10)'
'        ->  Hash  (cost=10963.12..10963.12 rows=96812 width=42)'
'              ->  Seq Scan on links l  (cost=0.00..10963.12 rows=96812 width=42)'
*/

Each table has an index on link_id, defined like this

CREATE INDEX links_link_id_idx
  ON links
  USING btree
  (link_id);

CREATE INDEX source_link_link_id_idx
  ON source_link
  USING btree
  (link_id);

Shouldn't this index prevent these sequential scans, or am I misreading this?  Should this really take 43 seconds?

thanks for any advice,
charles