[postgis-users] GIST index speed

Fri Jun 6 09:37:16 PDT 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Steve Kondik wrote:
> Mark Cave-Ayland wrote:
>> Steve Kondik wrote:
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> I'm seeing this same kind of slowness.  I am in the process of migrating
>>> an application from Oracle to Postgres.  One query that seems to take
>>> longer than it should is a very simple point-in-bbox query which takes 3
>>> seconds.  There are 32000 rows in the table, and the query is as simple
>>> as it can be:
>>>
>>> select count(*) from geography where type='Z' and centroid && (select
>>> geometry from geography where id=69495);
>>>
>>> The second geography is the USA, so this query gets all zipcodes inside
>>> the bbox, using precalculated centroids.  The query plan looks fine as
>>> well:
>>>
>>>  Aggregate  (cost=16.56..16.57 rows=1 width=0) (actual
>>> time=2407.123..2407.125 rows=1 loops=1)
>>>    InitPlan
>>>      ->  Index Scan using geography_pkey on geography  (cost=0.00..8.28
>>> rows=1 width=4528) (actual time=0.027..0.030 rows=1 loops=1)
>>>            Index Cond: (id = 69495)
>>>    ->  Index Scan using idx_geography_centroid_z on geography
>>> (cost=0.00..8.28 rows=1 width=0) (actual time=1.417..2355.049 rows=29687
>>> loops=1)
>>>          Index Cond: (centroid && $0)
>>>          Filter: (centroid && $0)
>>>  Total runtime: 2407.194 ms
>>> (8 rows)
>>>
>>>
>>> Any thoughts?  This query takes 200ms on Oracle.  It seems like
>>> something is wrong for it to take 3 seconds on PostGIS.  It's faster for
>>> smaller geometries (like a state), but I don't understand why since its
>>> just point-inside-bbox.
> 
>> Hi Steve,
> 
>> Well from the above you can see fairly easily that the time is
>> disappearing into the second index scan. The above output doesn't look
>> quite right somehow (it's missing at least one node indicator) - can you
>> try cutting and pasting the exact output from EXPLAIN into an email
>> again? Also we'll need the output of "\d geography" from within psql.
> 
> 
>> ATB,
> 
>> Mark.
> 
> 
> 
> Sure thing.  It's taking quite a bit longer now because we're doing some
> bulk loading.  When nothing else is running, the query takes about 4
> seconds.
> 
> 
> health_central_22=> explain analyze select count(*) from geography where
> type='Z' and centroid && (select geometry from geography where id=69495);
> 
> QUERY PLAN
> 
> -
> ---------------------------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=16.56..16.57 rows=1 width=0) (actual
> time=16534.739..16534.741 rows=1 loops=1)
>    InitPlan
>      ->  Index Scan using geography_pkey on geography  (cost=0.00..8.28
> rows=1 width=4528) (actual time=0.076..0.080 rows=1 loops=1)
>            Index Cond: (id = 69495)
>    ->  Index Scan using idx_geography_centroid_z on geography
> (cost=0.00..8.28 rows=1 width=0) (actual time=25.453..16192.427
> rows=29687 loops=1)
>          Index Cond: (centroid && $0)
>          Filter: (centroid && $0)
>  Total runtime: 16534.809 ms
> (8 rows)
> 
> health_central_22=> \d geography
>              Table "public.geography"
>     Column    |          Type          | Modifiers
> --------------+------------------------+-----------
>  type         | character(1)           | not null
>  id           | bigint                 | not null
>  centroid     | geometry               |
>  geometry     | geometry               |
>  name         | character varying(32)  | not null
>  population   | bigint                 |
>  abbreviation | character varying(2)   |
>  po_name      | character varying(100) |
>  id_geo_state | bigint                 |
> Indexes:
>     "geography_pkey" PRIMARY KEY, btree (id)
>     "idx_geography_2" btree (name)
>     "idx_geography_3" btree (abbreviation)
>     "idx_geography_4" btree (id_geo_state)
>     "idx_geography_centroid" gist (centroid)
>     "idx_geography_centroid_c" gist (centroid) WHERE type = 'C'::bpchar
>     "idx_geography_centroid_s" gist (centroid) WHERE type = 'S'::bpchar
>     "idx_geography_centroid_z" gist (centroid) WHERE type = 'Z'::bpchar
>     "idx_geography_geometry" gist (geometry) CLUSTER
>     "idx_geography_geometry_c" gist (geometry) WHERE type = 'C'::bpchar
>     "idx_geography_geometry_r" gist (geometry) WHERE type = 'R'::bpchar
>     "idx_geography_geometry_s" gist (geometry) WHERE type = 'S'::bpchar
>     "idx_geography_geometry_z" gist (geometry) WHERE type = 'Z'::bpchar
> Foreign-key constraints:
>     "fk_geography_1" FOREIGN KEY (id_geo_state) REFERENCES geography(id)
> 

Adding force_2d speeds this up immensely.  Not sure I understand why
this is necessary, but I'm still getting used to PostGIS.

health_central_22=> explain analyze select id,name from geography where
type='Z' and centroid && (select force_2d(geometry) from geography where
id=69495);
                                                                QUERY
PLAN
-
------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_geography_centroid on geography  (cost=8.29..16.57
rows=1 width=14) (actual time=0.263..125.338 rows=29687 loops=1)
   Index Cond: (centroid && $0)
   Filter: ((centroid && $0) AND (type = 'Z'::bpchar))
   InitPlan
     ->  Index Scan using geography_pkey on geography  (cost=0.00..8.29
rows=1 width=4441) (actual time=0.158..0.161 rows=1 loops=1)
           Index Cond: (id = 69495)
 Total runtime: 171.229 ms
(7 rows)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFISWe8MrBfzfMVwMcRAkUBAJ9uLeK7kLotz6+CUuKd2Tg5pNj2XwCeLVXV
RlUY5e2Dwyx1EWa3ln0Ky6M=
=B0/W
-----END PGP SIGNATURE-----