[postgis-users] GIST index speed

Paul Ramsey pramsey at cleverelephant.ca
Fri Jun 6 17:01:28 PDT 2008


Interesting side note: removing RECHECK changed the worst performance
(subquery, without the force_2d) from 8000ms to 4000ms. It changed the
performance of the subquery syntax with force_2d from 363ms to 318ms.
It didn't change the performance of the fast joined query syntax
measurably: still 180ms.

P.

On Fri, Jun 6, 2008 at 3:58 PM, Paul Ramsey <pramsey at cleverelephant.ca> wrote:
> 18% of the time is spent in heap_tuple_untoast_attr
> 18% of the time is spent in IndexNext
> much of the remaining time is in kernel functions servicing the data
> being pushed through the pipeline
> it's like something in your query plan ended up forcing the USA
> polygon to be read out over and over and over (once for each centroid
> tested, perhaps?) so the query was being driven by a very expensive
> route
>
> P.
>
> On Fri, Jun 6, 2008 at 3:52 PM, Paul Ramsey <pramsey at cleverelephant.ca> wrote:
>> OK, I'm seeing the same problem here...
>>
>> I can fix it by re-writing your query:
>>
>> select count(*) from geography a join geography b on (b.the_geom &&
>> b.centroid) where a.id = 69495 and b.type='Z';
>>
>> However, reading the profile is interesting... all the time is in
>> memcpy, something about the way your query executes forces a lot more
>> reading than my way...
>>
>> My explain is this:
>>                                                           QUERY PLAN
>> --------------------------------------------------------------------------------------------------------------------------------
>>  Aggregate  (cost=3724.07..3724.08 rows=1 width=0) (actual
>> time=178.402..178.402 rows=1 loops=1)
>>   ->  Nested Loop  (cost=0.00..3724.06 rows=1 width=0) (actual
>> time=0.210..172.890 rows=29687 loops=1)
>>         ->  Index Scan using id_idx on geography a  (cost=0.00..8.27
>> rows=1 width=0) (actual time=0.036..0.038 rows=1 loops=1)
>>               Index Cond: (id = 69495)
>>         ->  Seq Scan on geography b  (cost=0.00..3715.78 rows=1
>> width=0) (actual time=0.168..165.689 rows=29687 loops=1)
>>               Filter: ((b.the_geom && b.centroid) AND ((b.type)::text
>> = 'Z'::text))
>>
>> which is also interesting, since my query allows the plan to AVOID the
>> spatial index, which is wise, since so many rows are being fetched.
>>
>> P.
>>
>> On Fri, Jun 6, 2008 at 12:30 PM, Paul Ramsey <pramsey at cleverelephant.ca> wrote:
>>> I just want to see if I can duplicate the result, having the original
>>> data makes that a faster process, since I can copy your use case
>>> exactly.
>>>
>>> P
>>>
>>> On Fri, Jun 6, 2008 at 10:53 AM, Steve Kondik <shade at chemlab.org> wrote:
>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>> Hash: SHA1
>>>>
>>>> Paul Ramsey wrote:
>>>>> Steve,
>>>>>
>>>>> Any chance you could share your data with the developers? this is a
>>>>> very strange result.
>>>>>
>>>>> P
>>>>>
>>>>
>>>> Sure, I exported the table with pgsql2shp.
>>>>
>>>> http://n0rp.chemlab.org/geography.tbz2
>>>>
>>>> There's nothing spectacular about the data, it's just
>>>> state/county/zipcode from the US census.
>>>>
>>>>
>>>>> On Fri, Jun 6, 2008 at 9:37 AM, Steve Kondik <shade at chemlab.org> wrote:
>>>>> Steve Kondik wrote:
>>>>>>>> Mark Cave-Ayland wrote:
>>>>>>>>> Steve Kondik wrote:
>>>>>>>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>>>>>>>> Hash: SHA1
>>>>>>>>>>
>>>>>>>>>> I'm seeing this same kind of slowness.  I am in the process of migrating
>>>>>>>>>> an application from Oracle to Postgres.  One query that seems to take
>>>>>>>>>> longer than it should is a very simple point-in-bbox query which takes 3
>>>>>>>>>> seconds.  There are 32000 rows in the table, and the query is as simple
>>>>>>>>>> as it can be:
>>>>>>>>>>
>>>>>>>>>> select count(*) from geography where type='Z' and centroid && (select
>>>>>>>>>> geometry from geography where id=69495);
>>>>>>>>>>
>>>>>>>>>> The second geography is the USA, so this query gets all zipcodes inside
>>>>>>>>>> the bbox, using precalculated centroids.  The query plan looks fine as
>>>>>>>>>> well:
>>>>>>>>>>
>>>>>>>>>>  Aggregate  (cost=16.56..16.57 rows=1 width=0) (actual
>>>>>>>>>> time=2407.123..2407.125 rows=1 loops=1)
>>>>>>>>>>    InitPlan
>>>>>>>>>>      ->  Index Scan using geography_pkey on geography  (cost=0.00..8.28
>>>>>>>>>> rows=1 width=4528) (actual time=0.027..0.030 rows=1 loops=1)
>>>>>>>>>>            Index Cond: (id = 69495)
>>>>>>>>>>    ->  Index Scan using idx_geography_centroid_z on geography
>>>>>>>>>> (cost=0.00..8.28 rows=1 width=0) (actual time=1.417..2355.049 rows=29687
>>>>>>>>>> loops=1)
>>>>>>>>>>          Index Cond: (centroid && $0)
>>>>>>>>>>          Filter: (centroid && $0)
>>>>>>>>>>  Total runtime: 2407.194 ms
>>>>>>>>>> (8 rows)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Any thoughts?  This query takes 200ms on Oracle.  It seems like
>>>>>>>>>> something is wrong for it to take 3 seconds on PostGIS.  It's faster for
>>>>>>>>>> smaller geometries (like a state), but I don't understand why since its
>>>>>>>>>> just point-inside-bbox.
>>>>>>>>> Hi Steve,
>>>>>>>>> Well from the above you can see fairly easily that the time is
>>>>>>>>> disappearing into the second index scan. The above output doesn't look
>>>>>>>>> quite right somehow (it's missing at least one node indicator) - can you
>>>>>>>>> try cutting and pasting the exact output from EXPLAIN into an email
>>>>>>>>> again? Also we'll need the output of "\d geography" from within psql.
>>>>>>>>
>>>>>>>>> ATB,
>>>>>>>>> Mark.
>>>>>>>>
>>>>>>>>
>>>>>>>> Sure thing.  It's taking quite a bit longer now because we're doing some
>>>>>>>> bulk loading.  When nothing else is running, the query takes about 4
>>>>>>>> seconds.
>>>>>>>>
>>>>>>>>
>>>>>>>> health_central_22=> explain analyze select count(*) from geography where
>>>>>>>> type='Z' and centroid && (select geometry from geography where id=69495);
>>>>>>>>
>>>>>>>> QUERY PLAN
>>>>>>>>
>>>>>>>> -
>>>>>>>> ---------------------------------------------------------------------------------------------------------------------------------------------------
>>>>>>>>  Aggregate  (cost=16.56..16.57 rows=1 width=0) (actual
>>>>>>>> time=16534.739..16534.741 rows=1 loops=1)
>>>>>>>>    InitPlan
>>>>>>>>      ->  Index Scan using geography_pkey on geography  (cost=0.00..8.28
>>>>>>>> rows=1 width=4528) (actual time=0.076..0.080 rows=1 loops=1)
>>>>>>>>            Index Cond: (id = 69495)
>>>>>>>>    ->  Index Scan using idx_geography_centroid_z on geography
>>>>>>>> (cost=0.00..8.28 rows=1 width=0) (actual time=25.453..16192.427
>>>>>>>> rows=29687 loops=1)
>>>>>>>>          Index Cond: (centroid && $0)
>>>>>>>>          Filter: (centroid && $0)
>>>>>>>>  Total runtime: 16534.809 ms
>>>>>>>> (8 rows)
>>>>>>>>
>>>>>>>> health_central_22=> \d geography
>>>>>>>>              Table "public.geography"
>>>>>>>>     Column    |          Type          | Modifiers
>>>>>>>> --------------+------------------------+-----------
>>>>>>>>  type         | character(1)           | not null
>>>>>>>>  id           | bigint                 | not null
>>>>>>>>  centroid     | geometry               |
>>>>>>>>  geometry     | geometry               |
>>>>>>>>  name         | character varying(32)  | not null
>>>>>>>>  population   | bigint                 |
>>>>>>>>  abbreviation | character varying(2)   |
>>>>>>>>  po_name      | character varying(100) |
>>>>>>>>  id_geo_state | bigint                 |
>>>>>>>> Indexes:
>>>>>>>>     "geography_pkey" PRIMARY KEY, btree (id)
>>>>>>>>     "idx_geography_2" btree (name)
>>>>>>>>     "idx_geography_3" btree (abbreviation)
>>>>>>>>     "idx_geography_4" btree (id_geo_state)
>>>>>>>>     "idx_geography_centroid" gist (centroid)
>>>>>>>>     "idx_geography_centroid_c" gist (centroid) WHERE type = 'C'::bpchar
>>>>>>>>     "idx_geography_centroid_s" gist (centroid) WHERE type = 'S'::bpchar
>>>>>>>>     "idx_geography_centroid_z" gist (centroid) WHERE type = 'Z'::bpchar
>>>>>>>>     "idx_geography_geometry" gist (geometry) CLUSTER
>>>>>>>>     "idx_geography_geometry_c" gist (geometry) WHERE type = 'C'::bpchar
>>>>>>>>     "idx_geography_geometry_r" gist (geometry) WHERE type = 'R'::bpchar
>>>>>>>>     "idx_geography_geometry_s" gist (geometry) WHERE type = 'S'::bpchar
>>>>>>>>     "idx_geography_geometry_z" gist (geometry) WHERE type = 'Z'::bpchar
>>>>>>>> Foreign-key constraints:
>>>>>>>>     "fk_geography_1" FOREIGN KEY (id_geo_state) REFERENCES geography(id)
>>>>>>>>
>>>>> Adding force_2d speeds this up immensely.  Not sure I understand why
>>>>> this is necessary, but I'm still getting used to PostGIS.
>>>>>
>>>>>
>>>>> health_central_22=> explain analyze select id,name from geography where
>>>>> type='Z' and centroid && (select force_2d(geometry) from geography where
>>>>> id=69495);
>>>>>                                                                QUERY
>>>>> PLAN
>>>>> -
>>>>> ------------------------------------------------------------------------------------------------------------------------------------------
>>>>>  Index Scan using idx_geography_centroid on geography  (cost=8.29..16.57
>>>>> rows=1 width=14) (actual time=0.263..125.338 rows=29687 loops=1)
>>>>>   Index Cond: (centroid && $0)
>>>>>   Filter: ((centroid && $0) AND (type = 'Z'::bpchar))
>>>>>   InitPlan
>>>>>     ->  Index Scan using geography_pkey on geography  (cost=0.00..8.29
>>>>> rows=1 width=4441) (actual time=0.158..0.161 rows=1 loops=1)
>>>>>           Index Cond: (id = 69495)
>>>>>  Total runtime: 171.229 ms
>>>>> (7 rows)
>>>> _______________________________________________
>>>> postgis-users mailing list
>>>> postgis-users at postgis.refractions.net
>>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>>>>
>>>>> _______________________________________________
>>>>> postgis-users mailing list
>>>>> postgis-users at postgis.refractions.net
>>>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>>
>>>> -----BEGIN PGP SIGNATURE-----
>>>> Version: GnuPG v1.4.6 (GNU/Linux)
>>>> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>>>>
>>>> iD8DBQFISXmUMrBfzfMVwMcRAk0/AJsHZwP3p8AUVgSvWpPHFoGW/KCnTQCfY1QN
>>>> ZpmBnxOzQ3KQX2TG+03yVxM=
>>>> =z/t9
>>>> -----END PGP SIGNATURE-----
>>>> _______________________________________________
>>>> postgis-users mailing list
>>>> postgis-users at postgis.refractions.net
>>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>>
>>>
>>
>



More information about the postgis-users mailing list