[postgis-users] OT Understanding slow queries
Charles Galpin
cgalpin at lhsw.com
Thu Aug 25 06:41:27 PDT 2011
If this is too off topic, please let me know and I'll sign up on a postgres list to get help. But this is related to my use of postgis and If anyone knows this stuff, it's you guys.
I have an example query that I expect to be much faster, but my main concern is we are about to do some visualization of historical congestion data which will require queries across much larger data sets - like 150 million records a day. We are about to test using partitions but the number per table will still be much larger than what I am dealing with now.
So here is a query I would think would be much faster than 43 seconds for two tables, one with about 97k rows, and the other 3.2 million.
explain select count(l.*)
from links l, source_link ld where l.link_id = ld.link_id;
/*
'Aggregate (cost=174731.72..174731.73 rows=1 width=32)'
' -> Hash Join (cost=13024.27..166784.14 rows=3179029 width=32)'
' Hash Cond: (ld.link_id = l.link_id)'
' -> Seq Scan on source_link ld (cost=0.00..58282.29 rows=3179029 width=10)'
' -> Hash (cost=10963.12..10963.12 rows=96812 width=42)'
' -> Seq Scan on links l (cost=0.00..10963.12 rows=96812 width=42)'
*/
Each table has an index on link_id, defined like this
CREATE INDEX links_link_id_idx
ON links
USING btree
(link_id);
CREATE INDEX source_link_link_id_idx
ON source_link
USING btree
(link_id);
Shouldn't this index prevent these sequential scans, or am I misreading this? Should this really take 43 seconds?
thanks for any advice,
charles
More information about the postgis-users
mailing list