[postgis-users] Why does this take almost 3 hours?

Mon Aug 29 14:21:59 PDT 2005

On Mon, 29 Aug 2005, Ethan Alpert wrote:
...
> 
> nga_reg is basically the world admin divided up into 11 distinct
> regions. My new_snaps table has 13543 records.
> 
> 2 hours and 39 minutes later pgsql2shp of the above query finished
> 
> The above query's explain analyze is:
> 
> spatialdb_sil0=> explain analyze  SELECT a.catalogid, b.cii_region,
> a.the_geom  FROM new_snaps a, nga_reg b WHERE a.the_geom && b.the_geom
> AND intersects(a.the_geom, b.the_geom);
>                                                                   QUERY
> PLAN                                                                   
> ------------------------------------------------------------------------
> -----------------------------------------------------------------------
>  Nested Loop  (cost=0.00..67.41 rows=4966 width=273) (actual
> time=11.040..2397944.885 rows=13416 loops=1)
>    Join Filter: intersects("inner".the_geom, "outer".the_geom)
>    ->  Seq Scan on nga_reg b  (cost=0.00..1.11 rows=11 width=37) (actual
> time=0.004..147.586 rows=11 loops=1)
>    ->  Index Scan using new_snaps_spatial on new_snaps a
> (cost=0.00..6.01 rows=1 width=268) (actual time=68.748..9599.432
> rows=3892 loops=11)
>          Index Cond: (a.the_geom && "outer".the_geom)
>          Filter: (a.the_geom && "outer".the_geom)
>  Total runtime: 2397998.833 ms
> 

How many shapes are in nga_reg?  When you say "basically the world admin 
divided into 11 districts", do you mean there are only 11 shapes?  If so, 
how complex are the shapes?

There are two possibilities I can see where PostGIS would fail you in this 
way.

One is that the nga_reg table is huge (lots of shapes). If that's the 
case, then PostGIS should be driving the other direction (doing a seq scan 
on new_snaps and using the index on nga_reg).  If it's not, it is a poor 
query plan and perhaps the PostGIS developers can help track that down.  
An easy way to test this is to remove the index on new_snaps and redo your 
explain (and your timing test).

The alternative is that you have 11 (or some other small number of) very 
large, very complicated shapes.  If that's the case, your indexes are not 
going to help you since you'll end up doing an expensive intersect() 
operation for each pair, and most new_snaps's bouding boxes will fall 
within the large shape's bounding boxes.  If the shapes are Collections, 
you might try "exploding" them so that you can get better use of the 
bounding boxes.

I hope this helps
Bill