[postgis-users] Why does this take almost 3 hours?

Ethan Alpert ealpert at digitalglobe.com
Mon Aug 29 13:57:05 PDT 2005



I have two shapefiles I want to interesect. I shp2pgsql'ed them, loaded
them into the database and created spatial indexes for them (vacuum
analyze'ed for good measure):

                                  Table "public.nga_reg"
   Column   |       Type        |                        Modifiers

------------+-------------------+---------------------------------------
-------------------
 gid        | integer           | not null default
nextval('public.nga_reg_gid_seq'::text)
 cii_region | character varying | 
 the_geom   | geometry          | 
Indexes:
    "nga_reg_pkey" primary key, btree (gid)
    "nga_reg_spatial" gist (the_geom)
Check constraints:
    "enforce_srid_the_geom" CHECK (srid(the_geom) = 4326)
    "enforce_geotype_the_geom" CHECK (geometrytype(the_geom) =
'MULTIPOLYGON'::text OR the_geom IS NULL)

                                  Table "public.new_snaps"
   Column   |       Type        |                         Modifiers

------------+-------------------+---------------------------------------
---------------------
 gid        | integer           | not null default
nextval('public.new_snaps_gid_seq'::text)
 catalogid  | character varying | 
 the_geom   | geometry          | 
Indexes:
    "new_snaps_pkey" primary key, btree (gid)
    "new_snaps_spatial" gist (the_geom)
Check constraints:
    "enforce_srid_the_geom" CHECK (srid(the_geom) = 4326)
    "enforce_geotype_the_geom" CHECK (geometrytype(the_geom) =
'MULTIPOLYGON'::text OR the_geom IS NULL)

 Then called the following query:


 SELECT a.catalogid, b.cii_region, a.the_geom
   FROM new_snaps a, nga_reg b
  WHERE a.the_geom && b.the_geom AND intersects(a.the_geom, b.the_geom);



nga_reg is basically the world admin divided up into 11 distinct
regions. My new_snaps table has 13543 records.

2 hours and 39 minutes later pgsql2shp of the above query finished

The above query's explain analyze is:

spatialdb_sil0=> explain analyze  SELECT a.catalogid, b.cii_region,
a.the_geom  FROM new_snaps a, nga_reg b WHERE a.the_geom && b.the_geom
AND intersects(a.the_geom, b.the_geom);
                                                                  QUERY
PLAN                                                                   
------------------------------------------------------------------------
-----------------------------------------------------------------------
 Nested Loop  (cost=0.00..67.41 rows=4966 width=273) (actual
time=11.040..2397944.885 rows=13416 loops=1)
   Join Filter: intersects("inner".the_geom, "outer".the_geom)
   ->  Seq Scan on nga_reg b  (cost=0.00..1.11 rows=11 width=37) (actual
time=0.004..147.586 rows=11 loops=1)
   ->  Index Scan using new_snaps_spatial on new_snaps a
(cost=0.00..6.01 rows=1 width=268) (actual time=68.748..9599.432
rows=3892 loops=11)
         Index Cond: (a.the_geom && "outer".the_geom)
         Filter: (a.the_geom && "outer".the_geom)
 Total runtime: 2397998.833 ms


In that 2+ hours I had time to write a script that uses mapserver's
mapscript. It pulls each of the 13500 record's geometries, with DBI, out
of postgres, converts the WKT of the geometry to a mapserver shapObj and
calls mapscripts queryByShape() method. This script runs in 2 minutes!!!
How is it possible that postgis with spatial indexes is so bad?

I surely must be doing something wrong.

-e



More information about the postgis-users mailing list