[postgis-users] GIST index speed

Paul Ramsey pramsey at cleverelephant.ca
Fri Jun 6 15:58:47 PDT 2008


18% of the time is spent in heap_tuple_untoast_attr
18% of the time is spent in IndexNext
much of the remaining time is in kernel functions servicing the data
being pushed through the pipeline
it's like something in your query plan ended up forcing the USA
polygon to be read out over and over and over (once for each centroid
tested, perhaps?) so the query was being driven by a very expensive
route

P.

On Fri, Jun 6, 2008 at 3:52 PM, Paul Ramsey <pramsey at cleverelephant.ca> wrote:
> OK, I'm seeing the same problem here...
>
> I can fix it by re-writing your query:
>
> select count(*) from geography a join geography b on (b.the_geom &&
> b.centroid) where a.id = 69495 and b.type='Z';
>
> However, reading the profile is interesting... all the time is in
> memcpy, something about the way your query executes forces a lot more
> reading than my way...
>
> My explain is this:
>                                                           QUERY PLAN
> --------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=3724.07..3724.08 rows=1 width=0) (actual
> time=178.402..178.402 rows=1 loops=1)
>   ->  Nested Loop  (cost=0.00..3724.06 rows=1 width=0) (actual
> time=0.210..172.890 rows=29687 loops=1)
>         ->  Index Scan using id_idx on geography a  (cost=0.00..8.27
> rows=1 width=0) (actual time=0.036..0.038 rows=1 loops=1)
>               Index Cond: (id = 69495)
>         ->  Seq Scan on geography b  (cost=0.00..3715.78 rows=1
> width=0) (actual time=0.168..165.689 rows=29687 loops=1)
>               Filter: ((b.the_geom && b.centroid) AND ((b.type)::text
> = 'Z'::text))
>
> which is also interesting, since my query allows the plan to AVOID the
> spatial index, which is wise, since so many rows are being fetched.
>
> P.
>
> On Fri, Jun 6, 2008 at 12:30 PM, Paul Ramsey <pramsey at cleverelephant.ca> wrote:
>> I just want to see if I can duplicate the result, having the original
>> data makes that a faster process, since I can copy your use case
>> exactly.
>>
>> P
>>
>> On Fri, Jun 6, 2008 at 10:53 AM, Steve Kondik <shade at chemlab.org> wrote:
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>>
>>> Paul Ramsey wrote:
>>>> Steve,
>>>>
>>>> Any chance you could share your data with the developers? this is a
>>>> very strange result.
>>>>
>>>> P
>>>>
>>>
>>> Sure, I exported the table with pgsql2shp.
>>>
>>> http://n0rp.chemlab.org/geography.tbz2
>>>
>>> There's nothing spectacular about the data, it's just
>>> state/county/zipcode from the US census.
>>>
>>>
>>>> On Fri, Jun 6, 2008 at 9:37 AM, Steve Kondik <shade at chemlab.org> wrote:
>>>> Steve Kondik wrote:
>>>>>>> Mark Cave-Ayland wrote:
>>>>>>>> Steve Kondik wrote:
>>>>>>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>>>>>>> Hash: SHA1
>>>>>>>>>
>>>>>>>>> I'm seeing this same kind of slowness.  I am in the process of migrating
>>>>>>>>> an application from Oracle to Postgres.  One query that seems to take
>>>>>>>>> longer than it should is a very simple point-in-bbox query which takes 3
>>>>>>>>> seconds.  There are 32000 rows in the table, and the query is as simple
>>>>>>>>> as it can be:
>>>>>>>>>
>>>>>>>>> select count(*) from geography where type='Z' and centroid && (select
>>>>>>>>> geometry from geography where id=69495);
>>>>>>>>>
>>>>>>>>> The second geography is the USA, so this query gets all zipcodes inside
>>>>>>>>> the bbox, using precalculated centroids.  The query plan looks fine as
>>>>>>>>> well:
>>>>>>>>>
>>>>>>>>>  Aggregate  (cost=16.56..16.57 rows=1 width=0) (actual
>>>>>>>>> time=2407.123..2407.125 rows=1 loops=1)
>>>>>>>>>    InitPlan
>>>>>>>>>      ->  Index Scan using geography_pkey on geography  (cost=0.00..8.28
>>>>>>>>> rows=1 width=4528) (actual time=0.027..0.030 rows=1 loops=1)
>>>>>>>>>            Index Cond: (id = 69495)
>>>>>>>>>    ->  Index Scan using idx_geography_centroid_z on geography
>>>>>>>>> (cost=0.00..8.28 rows=1 width=0) (actual time=1.417..2355.049 rows=29687
>>>>>>>>> loops=1)
>>>>>>>>>          Index Cond: (centroid && $0)
>>>>>>>>>          Filter: (centroid && $0)
>>>>>>>>>  Total runtime: 2407.194 ms
>>>>>>>>> (8 rows)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Any thoughts?  This query takes 200ms on Oracle.  It seems like
>>>>>>>>> something is wrong for it to take 3 seconds on PostGIS.  It's faster for
>>>>>>>>> smaller geometries (like a state), but I don't understand why since its
>>>>>>>>> just point-inside-bbox.
>>>>>>>> Hi Steve,
>>>>>>>> Well from the above you can see fairly easily that the time is
>>>>>>>> disappearing into the second index scan. The above output doesn't look
>>>>>>>> quite right somehow (it's missing at least one node indicator) - can you
>>>>>>>> try cutting and pasting the exact output from EXPLAIN into an email
>>>>>>>> again? Also we'll need the output of "\d geography" from within psql.
>>>>>>>
>>>>>>>> ATB,
>>>>>>>> Mark.
>>>>>>>
>>>>>>>
>>>>>>> Sure thing.  It's taking quite a bit longer now because we're doing some
>>>>>>> bulk loading.  When nothing else is running, the query takes about 4
>>>>>>> seconds.
>>>>>>>
>>>>>>>
>>>>>>> health_central_22=> explain analyze select count(*) from geography where
>>>>>>> type='Z' and centroid && (select geometry from geography where id=69495);
>>>>>>>
>>>>>>> QUERY PLAN
>>>>>>>
>>>>>>> -
>>>>>>> ---------------------------------------------------------------------------------------------------------------------------------------------------
>>>>>>>  Aggregate  (cost=16.56..16.57 rows=1 width=0) (actual
>>>>>>> time=16534.739..16534.741 rows=1 loops=1)
>>>>>>>    InitPlan
>>>>>>>      ->  Index Scan using geography_pkey on geography  (cost=0.00..8.28
>>>>>>> rows=1 width=4528) (actual time=0.076..0.080 rows=1 loops=1)
>>>>>>>            Index Cond: (id = 69495)
>>>>>>>    ->  Index Scan using idx_geography_centroid_z on geography
>>>>>>> (cost=0.00..8.28 rows=1 width=0) (actual time=25.453..16192.427
>>>>>>> rows=29687 loops=1)
>>>>>>>          Index Cond: (centroid && $0)
>>>>>>>          Filter: (centroid && $0)
>>>>>>>  Total runtime: 16534.809 ms
>>>>>>> (8 rows)
>>>>>>>
>>>>>>> health_central_22=> \d geography
>>>>>>>              Table "public.geography"
>>>>>>>     Column    |          Type          | Modifiers
>>>>>>> --------------+------------------------+-----------
>>>>>>>  type         | character(1)           | not null
>>>>>>>  id           | bigint                 | not null
>>>>>>>  centroid     | geometry               |
>>>>>>>  geometry     | geometry               |
>>>>>>>  name         | character varying(32)  | not null
>>>>>>>  population   | bigint                 |
>>>>>>>  abbreviation | character varying(2)   |
>>>>>>>  po_name      | character varying(100) |
>>>>>>>  id_geo_state | bigint                 |
>>>>>>> Indexes:
>>>>>>>     "geography_pkey" PRIMARY KEY, btree (id)
>>>>>>>     "idx_geography_2" btree (name)
>>>>>>>     "idx_geography_3" btree (abbreviation)
>>>>>>>     "idx_geography_4" btree (id_geo_state)
>>>>>>>     "idx_geography_centroid" gist (centroid)
>>>>>>>     "idx_geography_centroid_c" gist (centroid) WHERE type = 'C'::bpchar
>>>>>>>     "idx_geography_centroid_s" gist (centroid) WHERE type = 'S'::bpchar
>>>>>>>     "idx_geography_centroid_z" gist (centroid) WHERE type = 'Z'::bpchar
>>>>>>>     "idx_geography_geometry" gist (geometry) CLUSTER
>>>>>>>     "idx_geography_geometry_c" gist (geometry) WHERE type = 'C'::bpchar
>>>>>>>     "idx_geography_geometry_r" gist (geometry) WHERE type = 'R'::bpchar
>>>>>>>     "idx_geography_geometry_s" gist (geometry) WHERE type = 'S'::bpchar
>>>>>>>     "idx_geography_geometry_z" gist (geometry) WHERE type = 'Z'::bpchar
>>>>>>> Foreign-key constraints:
>>>>>>>     "fk_geography_1" FOREIGN KEY (id_geo_state) REFERENCES geography(id)
>>>>>>>
>>>> Adding force_2d speeds this up immensely.  Not sure I understand why
>>>> this is necessary, but I'm still getting used to PostGIS.
>>>>
>>>>
>>>> health_central_22=> explain analyze select id,name from geography where
>>>> type='Z' and centroid && (select force_2d(geometry) from geography where
>>>> id=69495);
>>>>                                                                QUERY
>>>> PLAN
>>>> -
>>>> ------------------------------------------------------------------------------------------------------------------------------------------
>>>>  Index Scan using idx_geography_centroid on geography  (cost=8.29..16.57
>>>> rows=1 width=14) (actual time=0.263..125.338 rows=29687 loops=1)
>>>>   Index Cond: (centroid && $0)
>>>>   Filter: ((centroid && $0) AND (type = 'Z'::bpchar))
>>>>   InitPlan
>>>>     ->  Index Scan using geography_pkey on geography  (cost=0.00..8.29
>>>> rows=1 width=4441) (actual time=0.158..0.161 rows=1 loops=1)
>>>>           Index Cond: (id = 69495)
>>>>  Total runtime: 171.229 ms
>>>> (7 rows)
>>> _______________________________________________
>>> postgis-users mailing list
>>> postgis-users at postgis.refractions.net
>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>>>
>>>> _______________________________________________
>>>> postgis-users mailing list
>>>> postgis-users at postgis.refractions.net
>>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>
>>> -----BEGIN PGP SIGNATURE-----
>>> Version: GnuPG v1.4.6 (GNU/Linux)
>>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>>>
>>> iD8DBQFISXmUMrBfzfMVwMcRAk0/AJsHZwP3p8AUVgSvWpPHFoGW/KCnTQCfY1QN
>>> ZpmBnxOzQ3KQX2TG+03yVxM=
>>> =z/t9
>>> -----END PGP SIGNATURE-----
>>> _______________________________________________
>>> postgis-users mailing list
>>> postgis-users at postgis.refractions.net
>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>
>>
>



More information about the postgis-users mailing list