[postgis-users] How could I improve this query?

Wed Nov 24 10:27:21 PST 2010

On Tue, Nov 23, 2010 at 7:26 PM, Kevin Neufeld <kneufeld at refractions.net> wrote:
> Interesting thought.  Note though, Benjamin, that Jorge specified "a.min",
> not "b.min" in the filter clause.
> -- Kevin
>
>
> On 11/23/2010 10:07 AM, Benjamin Juhn wrote:
>
> You're current query is likely exploding your result set when table_a
> records intersect with more than one record in table_b.  Assuming there is
> no overlap of min & max values in table_b the following query will stop
> testing for intersection between a & c records after find one so it should
> run faster:
> UPDATE table_a a SET integer_field = b.integer_field
> FROM table_b b
> WHERE a.integer_field2 <= b.max AND a.integer_field2 >= b.min
>     AND EXISTS (SELECT true FROM table_c c WHERE st_intersects(a.geom,
> c.geom) LIMIT 1);
> also I'm assuming you meant to reference b.min here:
>>
>> table_a.integer_field2 >=
>> table_a.min
>
> On Nov 23, 2010, at 9:48 AM, David William Bitner wrote:
>
> Jorge,
> What does running "explain analyze" give you for that query?
> db
>

Hello,

The explain analyze output for one example query with that structure is:

 Nested Loop  (cost=2.58..3167.07 rows=378524 width=496) (actual
time=0.789..1098006.873 rows=1165412 loops=1)
   Join Filter: ((table_a.integer_field2 >= table_b.min) AND
(table_a.integer_field2 <= table_b.max) AND
_st_intersects(table_a.geom, table_c.geom))
   ->  Hash Join  (cost=2.58..9.59 rows=104 width=10227) (actual
time=0.066..0.235 rows=104 loops=1)
         Hash Cond: (table_b.string_field = table_c.string_field)
         ->  Seq Scan on table_b  (cost=0.00..2.88 rows=88 width=63)
(actual time=0.009..0.033 rows=88 loops=1)
         ->  Hash  (cost=2.26..2.26 rows=26 width=10266) (actual
time=0.031..0.031 rows=26 loops=1)
               ->  Seq Scan on table_c  (cost=0.00..2.26 rows=26
width=10266) (actual time=0.006..0.014 rows=26 loops=1)
   ->  Index Scan using geom_idx on table_a  (cost=0.00..28.76 rows=6
width=492) (actual time=0.061..61.337 rows=49146 loops=104)
         Index Cond: (table_a.geom && table_c.geom)
 Total runtime: 1147446.858 ms

So, the query took 19min. And maybe I'm misunderstanding the analyze
output but seems that I need an index on string fields, to avoid
sequential scan. Am I right? What's the best index type for string
fields?

Benjamin, I'm going to test your query. Many thanks. And yes, it was
b.min, instead of a.min. A typo.

-- 
Jorge Arévalo
Internet & Mobilty Division, DEIMOS
jorge.arevalo at deimos-space.com
http://mobility.grupodeimos.com/
http://gis4free.wordpress.com