[postgis-users] [GENERAL] How to improve this query?

Tue Feb 8 04:12:42 PST 2011

On Tue, Feb 8, 2011 at 12:42 PM, Mark Cave-Ayland
<mark.cave-ayland at siriusit.co.uk> wrote:
> On 08/02/11 11:05, Jorge Arévalo wrote:
>
>> 2011/2/8 Stefan Keller<sfkeller at gmail.com>:
>>>
>>> "  Hash Cond: (TABLE_C.STR_FIELD = TABLE_B.STR_FIELD)"
>>> tells me that these two longish strings are compared first.
>>> Did you already try to index TABLE_C.STR_FIELD and TABLE_B.STR_FIELD
>>> as hash instead btree?
>>
>> I thought hash indexes were discouraged over btree. Anyway, I'll test it.
>
> Absolutely. You do not want to be using Hash Indexes these days - stick with
> B-Tree. The hash refers to the method by which the executor matches the two
> column fields and is not related to the underlying index type.
>

Sorry, buy I'm a bit lost here. Then, should I create indexes using
btree as I did? How could I intervene in the way executor matches
columns?

Apart from this, I've modified some config parameters in postgresql.conf:

- autovacuum set to OFF
- shared_buffers set to 256MB
- random_page_cost set to 2.0
- effective_cache_size set to 1024MB

For changing shared buffers I needed to set the kernel.shmmax
parameter to 1024MB (was 32MB) in my OS (Ubuntu 10.04) with sysctl. My
machine has 4GB RAM, and I pretend to use it as a postgresql dedicated
server.

With these changes, now the query plan looks like this:

"Hash Join  (cost=3.98..211606.88 rows=209816 width=497)"
"  Hash Cond: (TABLE_C.STR_FIELD = TABLE_B.STR_FIELD)"
"  Join Filter: ((TABLE_A.INT_FIELD2 >= TABLE_B.min) AND
(TABLE_A.INT_FIELD2 <= TABLE_B.max))"
"  ->  Nested Loop  (cost=0.00..179737.10 rows=472086 width=543)"
"        Join Filter: _st_intersects(TABLE_A.wkb_geometry, TABLE_C.the_geom)"
"        ->  Seq Scan using TABLE_C_string_idx on TABLE_C
(cost=0.00..3067.52 rows=69352 width=1275)"
"        ->  Index Scan using TABLE_A_geom_idx on TABLE_A
(cost=0.00..1.50 rows=4 width=493)"
"              Index Cond: (TABLE_A.wkb_geometry && TABLE_C.the_geom)"
"  ->  Hash  (cost=2.88..2.88 rows=88 width=63)"
"        ->  Seq Scan on TABLE_B  (cost=0.00..2.88 rows=88 width=63)"

So, the total cost in join and scans is lower (I guess due to
random_page_cost changing), and now a seq scan is planned in the
string comparison against the index scan planned before.

I don't know if I'm really improving something or I'm flying blind...
The query is running again.

Thanks!

-- 
Jorge Arévalo
Internet & Mobilty Division, DEIMOS
jorge.arevalo at deimos-space.com
http://es.linkedin.com/in/jorgearevalo80
http://mobility.grupodeimos.com/
http://www.twitter.com/jorgeas80
http://gis4free.wordpress.com
http://geohash.org/ezjqgrgzz0g

>
> ATB,
>
> Mark.
>
> --
> Mark Cave-Ayland - Senior Technical Architect
> PostgreSQL - PostGIS
> Sirius Corporation plc - control through freedom
> http://www.siriusit.co.uk
> t: +44 870 608 0063
>
> Sirius Labs: http://www.siriusit.co.uk/labs
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>