[postgis-users] Index Use Craziness
Burgholzer,Robert
rwburgholzer at deq.virginia.gov
Wed Mar 18 11:30:41 PDT 2009
Bruce,
Thanks, this is certainly relevant to my case -- I tried to CLUSTER the
database on date to achieve some economics, but this seems to be an even
more explicit way to achieve this. I will experiment with this for
certain.
Still, though, it seems as if the index is not functioning correctly, or
at least not functioning the way that I want it to. Depending on the
size of the partitions, say if I did it monthly as you suggest, I still
will end up scanning many more records than I need to.
For a test, I did my query as written before (just looking at a bounding
box containment), and it took 15 minutes (900,000 ms). This is with
spatial indices on the rainfall point table. Now, I dumped the records
on the day of interest into a temp table (took only 156 ms), and did a
full-blown Within() query on the temp records, and it only took 2.5
seconds (2,579 ms).
My impression is that if the indexing were working as desired, it should
only take my query less than 3 seconds to complete.
Does anyone know if I am missing something with regards to my
understanding of index behavior?
Robert W. Burgholzer
Surface Water Modeler
Office of Water Supply and Planning
Virginia Department of Environmental Quality
rwburgholzer at deq.virginia.gov
804-698-4405
Open Source Modeling Tools:
http://sourceforge.net/projects/npsource/
-----Original Message-----
From: postgis-users-bounces at postgis.refractions.net
[mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of
Bruce Rindahl
Sent: Wednesday, March 18, 2009 11:22 AM
To: PostGIS Users Discussion
Subject: Re: [postgis-users] Index Use Craziness
Robert
I don't know how far back your data goes but you might look at this:
http://www.postgresql.org/docs/8.2/interactive/ddl-partitioning.html
Another application similar to yours used this inherited tables idea to
partition by months and a query for rainfall 15 years ago is now the
same as within the last month. Before it was several orders of
magnitude slower.
Bruce
Burgholzer,Robert wrote:
> I am having much trouble getting postgres to use the proper index.
>
> I have a table with about 20 million daily rainfall observation
points,
> with between 0 and 19,000 points on any given day. In order to speed
up
> querying, I have created a GIST index on the geometry column, and an
> index on the date column.
>
> What I want to do is to summarize the observed rainfall in another
shape
> table on any given day. If I just want to query points by date, this
> uses the date index, and performs very quickly (see "CASE 1" below).
>
> When I try to introduce a geometry operator, the GIST index takes
over,
> and the index on the "thisdate" column is ignored. The weird thing
is,
> I can hear it churning the heck out of my hard drive checking on all
the
> points. (See "CASE 2" below).
>
> I even tried to sub-query the points for the given date, but the index
> did not seem to be fooled ("CASE 3"). It gave me the exact same query
> plan!! I am unsure if I really understand the way a sub-query
functions
> given this result.
>
> Any help would be appreciated!
>
> Robert
>
> CASE 1:
> wsp=# explain select count(*) from precip_gridded where thisdate =
> '2009-01-12';
>
> QUERY PLAN
>
------------------------------------------------------------------------
> ----
> Aggregate (cost=479.43..479.44 rows=1 width=0)
> -> Index Scan using pg_tdix on precip_gridded (cost=0.00..451.92
> rows=11004
> width=0)
> Index Cond: (thisdate = '2009-01-12 00:00:00'::timestamp
> without time zone)
> (3 rows)
>
>
> CASE 2:
> explain select count(*) from precip_gridded where thisdate =
> '2009-01-12' and the_geom && setsrid((select extent(the_geom) from
> proj_seggroups where gid = 448),4326);
>
> QUERY PLAN
>
------------------------------------------------------------------------
> ----
> Aggregate (cost=100.23..100.24 rows=1 width=0)
> InitPlan
> -> Aggregate (cost=5.87..5.88 rows=1 width=2807)
> -> Index Scan using psg_giix on proj_seggroups
> (cost=0.00..5.87 rows=1 width=2807)
> Index Cond: (gid = 448)
> -> Index Scan using pg_gix on precip_gridded (cost=0.00..94.35
> rows=1 width=0)
> Index Cond: (the_geom && setsrid(($0)::geometry, 4326))
> Filter: ((thisdate = '2009-01-12 00:00:00'::timestamp without
> time zone) AND (the_geom && setsrid(($0)::geometry, 4326)))
> (8 rows)
>
>
> CASE 3: Sub-query by date to try and force the use of date index
> wsp=# explain select count(a.*) from (select * from precip_gridded
where
> thisdate = '2009-01-12') as a where a.the_geom && setsrid((select
> extent(the_geom) from proj_seggroups where gid = 448),4326);
>
> QUERY PLAN
>
>
------------------------------------------------------------------------
> ----
> Aggregate (cost=100.23..100.24 rows=1 width=81)
> InitPlan
> -> Aggregate (cost=5.87..5.88 rows=1 width=2807)
> -> Index Scan using psg_giix on proj_seggroups
> (cost=0.00..5.87 rows=1 width=2807)
> Index Cond: (gid = 448)
> -> Index Scan using pg_gix on precip_gridded (cost=0.00..94.35
> rows=1 width=81)
> Index Cond: (the_geom && setsrid(($0)::geometry, 4326))
> Filter: ((thisdate = '2009-01-12 00:00:00'::timestamp without
> time zone) AND (the_geom && setsrid(($0)::geometry, 4326)))
> (8 rows)
>
>
> Robert W. Burgholzer
> Surface Water Modeler
> Office of Water Supply and Planning
> Virginia Department of Environmental Quality
> rwburgholzer at deq.virginia.gov
> 804-698-4405
> Open Source Modeling Tools:
> http://sourceforge.net/projects/npsource/
>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
>
>
_______________________________________________
postgis-users mailing list
postgis-users at postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users
More information about the postgis-users
mailing list