[postgis-users] Index Use Craziness

Burgholzer,Robert rwburgholzer at deq.virginia.gov
Thu Apr 2 06:42:34 PDT 2009


Just an update (refresher), and perhaps a thought or two about the mis-use of the index by the query planner, and finally, a question:

1. My query was performing a contains using a complex multipolygon and a set of points (a watershed and summary of rainfall on a given day)
2. There were GIST indices on both polygon and point tables
3. The index that was fooling the planner into thinking that the spatial was more important than the temporal was on the point layer
4. The query bogged down as it employed the spatial index on the point layer of several million points (multiple years worth of daily values) and ignored the index on the date column - a bad deal since I was only asking for one day, which is always less than 19,000 points in this context
5. So, with the above in mind, I dropped the spatial index on the point table and voila, the query speeded up about =/- 100X (10-15 seconds versus 20-30 minutes)

This makes me ask a question: is the GIST index substantially faster or SLOWER on a point layer?  Doesn't a bounding box query entail the comparison of 2 points (two corners) when the point layer is really only about a single point??

Dumb proposition/question perhaps, asked in total ignorance of the behavior of the GIST indexing process.

Assail me (or enlighten me) if you will!

Robert W. Burgholzer
Surface Water Modeler
Office of Water Supply and Planning
Virginia Department of Environmental Quality
rwburgholzer at deq.virginia.gov
804-698-4405
Open Source Modeling Tools:
http://sourceforge.net/projects/npsource/

-----Original Message-----
From: postgis-users-bounces at postgis.refractions.net [mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of Paul Ramsey
Sent: Thursday, March 19, 2009 12:27 PM
To: PostGIS Users Discussion
Subject: Re: [postgis-users] Index Use Craziness

Mose,

Thanks for that crystal clear analysis and perfect test case (I just
ran it now, I love test cases I can run with copy-paste). Could you
put an issue in the tracker and reference your mail archive link?

http://postgis.refractions.net/pipermail/postgis-devel/2008-September/003547.html

MCA is the selectivity expert, so I'll leave it to him to describe the
difficulty of figuring join selectivity versus constant selectivity.

P.

2009/3/18 Mose <mose.andre at gmail.com>:
> Robert,
>
> I have been haunted by similar issues but I haven't had time to figure out
> some explanation I really believe in.
>
> Does this post with a reproducible case seem similar?  It seems like it to
> me.
>
> http://postgis.refractions.net/pipermail/postgis-devel/2008-September/003547.html
>
> For now when these situations aren't working (the planner favors the spatial
> index due to a weird estimate when it is really going to be lots) I am using
> a constant geometry [e.g. geometryfromwkt(wkt geometry string)] for the join
> condition.  I realize the feasibility of this depends on your application,
> or your desire to write a user-defined function.
>
> Why does the planner get different estimates for the same geometry being
> used in a constraint based on it coming from a table, or from a string in
> the query?  I think the reason this is happening might be apparent in
> lwgeom_estimate.c, but I'm open to being totally wrong :).
>
> -Mose
>
>
>
>
>
> On Wed, Mar 18, 2009 at 11:32 AM, Paragon Corporation <lr at pcorp.us> wrote:
>>
>> Robert,
>>
>>  Which version of PostgreSQL are you running?  I wander if you are being
>> bitten by one of these bugs
>>
>> --the description doesn't quite match any of these 2 - but it could be
>> another side effect of these
>> http://www.postgresql.org/docs/8.3/static/release-8-3-7.html
>>
>> Fix planner problem with sub-SELECT in the output list of a larger
>> subquery
>> (Tom)
>>
>> Or
>> http://www.postgresql.org/docs/8.3/static/release-8-3-6.html
>> Fix planner misestimation of selectivity when transitive equality is
>> applied
>> to an outer-join clause (Tom)
>>
>> This could result in bad plans for queries like ... from a left join b on
>> a.a1 = b.b1 where a.a1 = 42 ...
>>
>>
>>
>> Hope that helps,
>> Regina
>>
>> -----Original Message-----
>> From: postgis-users-bounces at postgis.refractions.net
>> [mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of
>> Burgholzer,Robert
>> Sent: Wednesday, March 18, 2009 9:05 AM
>> To: PostGIS Users Discussion
>> Subject: [postgis-users] Index Use Craziness
>>
>> I am having much trouble getting postgres to use the proper index.
>>
>> I have a table with about 20 million daily rainfall observation points,
>> with
>> between 0 and 19,000 points on any given day.  In order to speed up
>> querying, I have created a GIST index on the geometry column, and an index
>> on the date column.
>>
>> What I want to do is to summarize the observed rainfall in another shape
>> table on any given day.  If I just want to query points by date, this uses
>> the date index, and performs very quickly (see "CASE 1" below).
>>
>> When I try to introduce a geometry operator, the GIST index takes over,
>> and
>> the index on the "thisdate" column is ignored.  The weird thing is, I can
>> hear it churning the heck out of my hard drive checking on all the points.
>> (See "CASE 2" below).
>>
>> I even tried to sub-query the points for the given date, but the index did
>> not seem to be fooled ("CASE 3").  It gave me the exact same query plan!!
>>  I
>> am unsure if I really understand the way a sub-query functions given this
>> result.
>>
>> Any help would be appreciated!
>>
>> Robert
>>
>> CASE 1:
>> wsp=# explain select count(*) from precip_gridded where thisdate =
>> '2009-01-12';
>>
>>                              QUERY PLAN
>> ------------------------------------------------------------------------
>> ----
>>  Aggregate  (cost=479.43..479.44 rows=1 width=0)
>>   ->  Index Scan using pg_tdix on precip_gridded  (cost=0.00..451.92
>> rows=11004
>>  width=0)
>>         Index Cond: (thisdate = '2009-01-12 00:00:00'::timestamp without
>> time zone)
>> (3 rows)
>>
>>
>> CASE 2:
>> explain select count(*) from precip_gridded where thisdate = '2009-01-12'
>> and the_geom && setsrid((select extent(the_geom) from proj_seggroups where
>> gid = 448),4326);
>>
>>                              QUERY PLAN
>> ------------------------------------------------------------------------
>> ----
>>  Aggregate  (cost=100.23..100.24 rows=1 width=0)
>>   InitPlan
>>     ->  Aggregate  (cost=5.87..5.88 rows=1 width=2807)
>>           ->  Index Scan using psg_giix on proj_seggroups
>> (cost=0.00..5.87 rows=1 width=2807)
>>                 Index Cond: (gid = 448)
>>   ->  Index Scan using pg_gix on precip_gridded  (cost=0.00..94.35
>> rows=1 width=0)
>>         Index Cond: (the_geom && setsrid(($0)::geometry, 4326))
>>         Filter: ((thisdate = '2009-01-12 00:00:00'::timestamp without time
>> zone) AND (the_geom && setsrid(($0)::geometry, 4326)))
>> (8 rows)
>>
>>
>> CASE 3: Sub-query by date to try and force the use of date index wsp=#
>> explain select count(a.*) from (select * from precip_gridded where
>> thisdate
>> = '2009-01-12') as a where a.the_geom && setsrid((select
>> extent(the_geom) from proj_seggroups where gid = 448),4326);
>>
>>                              QUERY PLAN
>>
>> ------------------------------------------------------------------------
>> ----
>>  Aggregate  (cost=100.23..100.24 rows=1 width=81)
>>   InitPlan
>>     ->  Aggregate  (cost=5.87..5.88 rows=1 width=2807)
>>           ->  Index Scan using psg_giix on proj_seggroups
>> (cost=0.00..5.87 rows=1 width=2807)
>>                 Index Cond: (gid = 448)
>>   ->  Index Scan using pg_gix on precip_gridded  (cost=0.00..94.35
>> rows=1 width=81)
>>         Index Cond: (the_geom && setsrid(($0)::geometry, 4326))
>>         Filter: ((thisdate = '2009-01-12 00:00:00'::timestamp without time
>> zone) AND (the_geom && setsrid(($0)::geometry, 4326)))
>> (8 rows)
>>
>>
>> Robert W. Burgholzer
>> Surface Water Modeler
>> Office of Water Supply and Planning
>> Virginia Department of Environmental Quality rwburgholzer at deq.virginia.gov
>> 804-698-4405
>> Open Source Modeling Tools:
>> http://sourceforge.net/projects/npsource/
>>
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>
>>
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
>
_______________________________________________
postgis-users mailing list
postgis-users at postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users




More information about the postgis-users mailing list