[postgis-users] OT Understanding slow queries
Ben Madin
lists at remoteinformation.com.au
Thu Aug 25 10:21:00 PDT 2011
Steve,
does this just apply to count(*), or is count(id) just as bad? I was originally a MySQL user and count(*) could be very efficient there.
cheers
Ben
On 26/08/2011, at 12:01 AM, Stephen Woodbridge wrote:
> The issue here is the count(*) which forces a full table scan in postgresql as fork mentioned. You need to look at a real query, unless you are really doing a count(*).
>
> -Steve
>
> On 8/25/2011 11:49 AM, Ben Madin wrote:
>> I'm no expert at this, but my understanding (which is limited) was
>> that you are asking for the whole table, so indexing doesn't really
>> get used (my understanding is that indexing helps to find the page
>> for a subset of data more quickly than scanning through the whole
>> lot).
>>
>> Also, you might be able to get some speed up by using a different
>> join type (outer join and not null where clause)?
>>
>> cheers
>>
>> Ben
>>
>>
>> On 25/08/2011, at 9:41 PM, Charles Galpin wrote:
>>
>>> If this is too off topic, please let me know and I'll sign up on a
>>> postgres list to get help. But this is related to my use of postgis
>>> and If anyone knows this stuff, it's you guys.
>>>
>>> I have an example query that I expect to be much faster, but my
>>> main concern is we are about to do some visualization of historical
>>> congestion data which will require queries across much larger data
>>> sets - like 150 million records a day. We are about to test using
>>> partitions but the number per table will still be much larger than
>>> what I am dealing with now.
>>>
>>> So here is a query I would think would be much faster than 43
>>> seconds for two tables, one with about 97k rows, and the other 3.2
>>> million.
>>>
>>> explain select count(l.*) from links l, source_link ld where
>>> l.link_id = ld.link_id; /* 'Aggregate (cost=174731.72..174731.73
>>> rows=1 width=32)' ' -> Hash Join (cost=13024.27..166784.14
>>> rows=3179029 width=32)' ' Hash Cond: (ld.link_id =
>>> l.link_id)' ' -> Seq Scan on source_link ld
>>> (cost=0.00..58282.29 rows=3179029 width=10)' ' -> Hash
>>> (cost=10963.12..10963.12 rows=96812 width=42)' ' ->
>>> Seq Scan on links l (cost=0.00..10963.12 rows=96812 width=42)' */
>>>
>>> Each table has an index on link_id, defined like this
>>>
>>> CREATE INDEX links_link_id_idx ON links USING btree (link_id);
>>>
>>> CREATE INDEX source_link_link_id_idx ON source_link USING btree
>>> (link_id);
>>>
>>> Shouldn't this index prevent these sequential scans, or am I
>>> misreading this? Should this really take 43 seconds?
>>>
>>> thanks for any advice, charles
>>>
>>> _______________________________________________ postgis-users
>>> mailing list postgis-users at postgis.refractions.net
>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>
>> _______________________________________________ postgis-users mailing
>> list postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
More information about the postgis-users
mailing list