[postgis-users] How could I improve this query?

Jorge Arévalo jorge.arevalo at deimos-space.com
Wed Nov 24 11:09:40 PST 2010


And here the explain analyze of the second query (the one suggested by Benjamin)

http://dl.dropbox.com/u/6599273/explain_anlyze2.png

As you can see, it took 21min. Is strange, I thought it would take less time...

2010/11/24 Jorge Arévalo <jorge.arevalo at deimos-space.com>:
> If it's difficult to read the analyze output, here a screenshot:
>
> http://dl.dropbox.com/u/6599273/explain_anlyze.png
>
> 2010/11/24 Jorge Arévalo <jorge.arevalo at deimos-space.com>:
>> On Tue, Nov 23, 2010 at 7:26 PM, Kevin Neufeld <kneufeld at refractions.net> wrote:
>>> Interesting thought.  Note though, Benjamin, that Jorge specified "a.min",
>>> not "b.min" in the filter clause.
>>> -- Kevin
>>>
>>>
>>> On 11/23/2010 10:07 AM, Benjamin Juhn wrote:
>>>
>>> You're current query is likely exploding your result set when table_a
>>> records intersect with more than one record in table_b.  Assuming there is
>>> no overlap of min & max values in table_b the following query will stop
>>> testing for intersection between a & c records after find one so it should
>>> run faster:
>>> UPDATE table_a a SET integer_field = b.integer_field
>>> FROM table_b b
>>> WHERE a.integer_field2 <= b.max AND a.integer_field2 >= b.min
>>>     AND EXISTS (SELECT true FROM table_c c WHERE st_intersects(a.geom,
>>> c.geom) LIMIT 1);
>>> also I'm assuming you meant to reference b.min here:
>>>>
>>>> table_a.integer_field2 >=
>>>> table_a.min
>>>
>>> On Nov 23, 2010, at 9:48 AM, David William Bitner wrote:
>>>
>>> Jorge,
>>> What does running "explain analyze" give you for that query?
>>> db
>>>
>>
>> Hello,
>>
>> The explain analyze output for one example query with that structure is:
>>
>>  Nested Loop  (cost=2.58..3167.07 rows=378524 width=496) (actual
>> time=0.789..1098006.873 rows=1165412 loops=1)
>>   Join Filter: ((table_a.integer_field2 >= table_b.min) AND
>> (table_a.integer_field2 <= table_b.max) AND
>> _st_intersects(table_a.geom, table_c.geom))
>>   ->  Hash Join  (cost=2.58..9.59 rows=104 width=10227) (actual
>> time=0.066..0.235 rows=104 loops=1)
>>         Hash Cond: (table_b.string_field = table_c.string_field)
>>         ->  Seq Scan on table_b  (cost=0.00..2.88 rows=88 width=63)
>> (actual time=0.009..0.033 rows=88 loops=1)
>>         ->  Hash  (cost=2.26..2.26 rows=26 width=10266) (actual
>> time=0.031..0.031 rows=26 loops=1)
>>               ->  Seq Scan on table_c  (cost=0.00..2.26 rows=26
>> width=10266) (actual time=0.006..0.014 rows=26 loops=1)
>>   ->  Index Scan using geom_idx on table_a  (cost=0.00..28.76 rows=6
>> width=492) (actual time=0.061..61.337 rows=49146 loops=104)
>>         Index Cond: (table_a.geom && table_c.geom)
>>  Total runtime: 1147446.858 ms
>>
>> So, the query took 19min. And maybe I'm misunderstanding the analyze
>> output but seems that I need an index on string fields, to avoid
>> sequential scan. Am I right? What's the best index type for string
>> fields?
>>
>> Benjamin, I'm going to test your query. Many thanks. And yes, it was
>> b.min, instead of a.min. A typo.
>>
>>
>> --
>> Jorge Arévalo
>> Internet & Mobilty Division, DEIMOS
>> jorge.arevalo at deimos-space.com
>> http://mobility.grupodeimos.com/
>> http://gis4free.wordpress.com
>>
>
>
>
> --
> Jorge Arévalo
> Internet & Mobilty Division, DEIMOS
> jorge.arevalo at deimos-space.com
> http://mobility.grupodeimos.com/
> http://gis4free.wordpress.com
>



-- 
Jorge Arévalo
Internet & Mobilty Division, DEIMOS
jorge.arevalo at deimos-space.com
http://mobility.grupodeimos.com/
http://gis4free.wordpress.com



More information about the postgis-users mailing list