[postgis-users] OT Understanding slow queries

Thu Aug 25 10:21:00 PDT 2011

Steve,

does this just apply to count(*), or is count(id) just as bad? I was originally a MySQL user and count(*) could be very efficient there.

cheers

Ben

On 26/08/2011, at 12:01 AM, Stephen Woodbridge wrote:

> The issue here is the count(*) which forces a full table scan in postgresql as fork mentioned. You need to look at a real query, unless you are really doing a count(*).
> 
> -Steve
> 
> On 8/25/2011 11:49 AM, Ben Madin wrote:
>> I'm no expert at this, but my understanding (which is limited) was
>> that you are asking for the whole table, so indexing doesn't really
>> get used (my understanding is that indexing helps to find the page
>> for a subset of data more quickly than scanning through the whole
>> lot).
>> 
>> Also, you might be able to get some speed up by using a different
>> join type (outer join and not null where clause)?
>> 
>> cheers
>> 
>> Ben
>> 
>> 
>> On 25/08/2011, at 9:41 PM, Charles Galpin wrote:
>> 
>>> If this is too off topic, please let me know and I'll sign up on a
>>> postgres list to get help. But this is related to my use of postgis
>>> and If anyone knows this stuff, it's you guys.
>>> 
>>> I have an example query that I expect to be much faster, but my
>>> main concern is we are about to do some visualization of historical
>>> congestion data which will require queries across much larger data
>>> sets - like 150 million records a day. We are about to test using
>>> partitions but the number per table will still be much larger than
>>> what I am dealing with now.
>>> 
>>> So here is a query I would think would be much faster than 43
>>> seconds for two tables, one with about 97k rows, and the other 3.2
>>> million.
>>> 
>>> explain select count(l.*) from links l, source_link ld where
>>> l.link_id = ld.link_id; /* 'Aggregate  (cost=174731.72..174731.73
>>> rows=1 width=32)' '  ->   Hash Join  (cost=13024.27..166784.14
>>> rows=3179029 width=32)' '        Hash Cond: (ld.link_id =
>>> l.link_id)' '        ->   Seq Scan on source_link ld
>>> (cost=0.00..58282.29 rows=3179029 width=10)' '        ->   Hash
>>> (cost=10963.12..10963.12 rows=96812 width=42)' '              ->
>>> Seq Scan on links l  (cost=0.00..10963.12 rows=96812 width=42)' */
>>> 
>>> Each table has an index on link_id, defined like this
>>> 
>>> CREATE INDEX links_link_id_idx ON links USING btree (link_id);
>>> 
>>> CREATE INDEX source_link_link_id_idx ON source_link USING btree
>>> (link_id);
>>> 
>>> Shouldn't this index prevent these sequential scans, or am I
>>> misreading this?  Should this really take 43 seconds?
>>> 
>>> thanks for any advice, charles
>>> 
>>> _______________________________________________ postgis-users
>>> mailing list postgis-users at postgis.refractions.net
>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>> 
>> _______________________________________________ postgis-users mailing
>> list postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
> 
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users