[postgis-users] OT Understanding slow queries
Stephen Woodbridge
woodbri at swoodbridge.com
Thu Aug 25 09:01:27 PDT 2011
The issue here is the count(*) which forces a full table scan in
postgresql as fork mentioned. You need to look at a real query, unless
you are really doing a count(*).
-Steve
On 8/25/2011 11:49 AM, Ben Madin wrote:
> I'm no expert at this, but my understanding (which is limited) was
> that you are asking for the whole table, so indexing doesn't really
> get used (my understanding is that indexing helps to find the page
> for a subset of data more quickly than scanning through the whole
> lot).
>
> Also, you might be able to get some speed up by using a different
> join type (outer join and not null where clause)?
>
> cheers
>
> Ben
>
>
> On 25/08/2011, at 9:41 PM, Charles Galpin wrote:
>
>> If this is too off topic, please let me know and I'll sign up on a
>> postgres list to get help. But this is related to my use of postgis
>> and If anyone knows this stuff, it's you guys.
>>
>> I have an example query that I expect to be much faster, but my
>> main concern is we are about to do some visualization of historical
>> congestion data which will require queries across much larger data
>> sets - like 150 million records a day. We are about to test using
>> partitions but the number per table will still be much larger than
>> what I am dealing with now.
>>
>> So here is a query I would think would be much faster than 43
>> seconds for two tables, one with about 97k rows, and the other 3.2
>> million.
>>
>> explain select count(l.*) from links l, source_link ld where
>> l.link_id = ld.link_id; /* 'Aggregate (cost=174731.72..174731.73
>> rows=1 width=32)' ' -> Hash Join (cost=13024.27..166784.14
>> rows=3179029 width=32)' ' Hash Cond: (ld.link_id =
>> l.link_id)' ' -> Seq Scan on source_link ld
>> (cost=0.00..58282.29 rows=3179029 width=10)' ' -> Hash
>> (cost=10963.12..10963.12 rows=96812 width=42)' ' ->
>> Seq Scan on links l (cost=0.00..10963.12 rows=96812 width=42)' */
>>
>> Each table has an index on link_id, defined like this
>>
>> CREATE INDEX links_link_id_idx ON links USING btree (link_id);
>>
>> CREATE INDEX source_link_link_id_idx ON source_link USING btree
>> (link_id);
>>
>> Shouldn't this index prevent these sequential scans, or am I
>> misreading this? Should this really take 43 seconds?
>>
>> thanks for any advice, charles
>>
>> _______________________________________________ postgis-users
>> mailing list postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
> _______________________________________________ postgis-users mailing
> list postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
More information about the postgis-users
mailing list