[postgis-users] How could I improve this query?

Wed Nov 24 10:31:28 PST 2010

If it's difficult to read the analyze output, here a screenshot:

http://dl.dropbox.com/u/6599273/explain_anlyze.png

2010/11/24 Jorge Arévalo <jorge.arevalo at deimos-space.com>:
> On Tue, Nov 23, 2010 at 7:26 PM, Kevin Neufeld <kneufeld at refractions.net> wrote:
>> Interesting thought.  Note though, Benjamin, that Jorge specified "a.min",
>> not "b.min" in the filter clause.
>> -- Kevin
>>
>>
>> On 11/23/2010 10:07 AM, Benjamin Juhn wrote:
>>
>> You're current query is likely exploding your result set when table_a
>> records intersect with more than one record in table_b.  Assuming there is
>> no overlap of min & max values in table_b the following query will stop
>> testing for intersection between a & c records after find one so it should
>> run faster:
>> UPDATE table_a a SET integer_field = b.integer_field
>> FROM table_b b
>> WHERE a.integer_field2 <= b.max AND a.integer_field2 >= b.min
>>     AND EXISTS (SELECT true FROM table_c c WHERE st_intersects(a.geom,
>> c.geom) LIMIT 1);
>> also I'm assuming you meant to reference b.min here:
>>>
>>> table_a.integer_field2 >=
>>> table_a.min
>>
>> On Nov 23, 2010, at 9:48 AM, David William Bitner wrote:
>>
>> Jorge,
>> What does running "explain analyze" give you for that query?
>> db
>>
>
> Hello,
>
> The explain analyze output for one example query with that structure is:
>
>  Nested Loop  (cost=2.58..3167.07 rows=378524 width=496) (actual
> time=0.789..1098006.873 rows=1165412 loops=1)
>   Join Filter: ((table_a.integer_field2 >= table_b.min) AND
> (table_a.integer_field2 <= table_b.max) AND
> _st_intersects(table_a.geom, table_c.geom))
>   ->  Hash Join  (cost=2.58..9.59 rows=104 width=10227) (actual
> time=0.066..0.235 rows=104 loops=1)
>         Hash Cond: (table_b.string_field = table_c.string_field)
>         ->  Seq Scan on table_b  (cost=0.00..2.88 rows=88 width=63)
> (actual time=0.009..0.033 rows=88 loops=1)
>         ->  Hash  (cost=2.26..2.26 rows=26 width=10266) (actual
> time=0.031..0.031 rows=26 loops=1)
>               ->  Seq Scan on table_c  (cost=0.00..2.26 rows=26
> width=10266) (actual time=0.006..0.014 rows=26 loops=1)
>   ->  Index Scan using geom_idx on table_a  (cost=0.00..28.76 rows=6
> width=492) (actual time=0.061..61.337 rows=49146 loops=104)
>         Index Cond: (table_a.geom && table_c.geom)
>  Total runtime: 1147446.858 ms
>
> So, the query took 19min. And maybe I'm misunderstanding the analyze
> output but seems that I need an index on string fields, to avoid
> sequential scan. Am I right? What's the best index type for string
> fields?
>
> Benjamin, I'm going to test your query. Many thanks. And yes, it was
> b.min, instead of a.min. A typo.
>
>
> --
> Jorge Arévalo
> Internet & Mobilty Division, DEIMOS
> jorge.arevalo at deimos-space.com
> http://mobility.grupodeimos.com/
> http://gis4free.wordpress.com
>

-- 
Jorge Arévalo
Internet & Mobilty Division, DEIMOS
jorge.arevalo at deimos-space.com
http://mobility.grupodeimos.com/
http://gis4free.wordpress.com