[postgis-users] PostGIS INDEX Performance Intermittent

Mon Nov 19 09:39:18 PST 2012

Dear All,

I'm hoping someone can help me with some performance issues I'm having
related to the use of an INDEX on my geometry column.

I'm using PostgreSQL 9.1.4 on x86_64-unknown-linux-gnu, compiled by
gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5.1) 4.4.3, 64-bit and PostGIS 1.5

My query is joining the areas table ( ~1000 polygons ) in this query
looking for data points contained within 35 of them to the data table ( ~55
million data points in the table ) with this example query only returning
767 of them.

Query and its explain result working well

select * FROM data, area WHERE st_intersects ( data.point, area.polygon)
AND
(readingdatetime BETWEEN '2009-01-01' AND '2012-01-19') AND datasetid
IN(1,3) AND
 "polysetID" = 1 AND
area.idIN(28,29,30,31,32,33,25,26,27,18,19,20,21,12,13,14,15,16,17,34,35,1,2,3,4,5,6,22,23,24,7,8,9,10,11)

Nested Loop  (cost=0.00..791.38 rows=20 width=2244) (actual
time=2149.383..26059.562 rows=767 loops=1)
  Join Filter: _st_intersects(data.point, area.polygon)
  ->  Index Scan using "area_polysetID_index" on area  (cost=0.00..16.56
rows=1 width=1947) (actual time=0.017..0.193 rows=35 loops=1)
        Index Cond: ("polysetID" = 1)
        Filter: (id = ANY
('{28,29,30,31,32,33,25,26,27,18,19,20,21,12,13,14,15,16,17,34,35,1,2,3,4,5,6,22,23,24,7,8,9,10,11}'::integer[]))
  ->  Index Scan using data_point_index on data  (cost=0.00..774.57 rows=1
width=297) (actual time=588.910..744.472 rows=33 loops=35)
        Index Cond: (point && area.polygon)
        Filter: ((readingdatetime >= '2009-01-01 00:00:00'::timestamp
without time zone) AND (readingdatetime <= '2012-01-19 00:00:00'::timestamp
without time zone) AND (datasetid = ANY ('{1,3}'::integer[])))
Total runtime: 26059.677 ms

as you can see takes about 26 seconds

Sometimes the query will take a lot longer and get a result like this.
Normally at this point I have to VACCUM ANALYSE the table before it will
behave properly again

select * FROM data, area WHERE st_intersects ( data.point, area.polygon)
AND
(readingdatetime BETWEEN '2009-01-01' AND '2012-01-19') AND datasetid
IN(1,3) AND
 "polysetID" = 1 AND
area.idIN(28,29,30,31,32,33,25,26,27,18,19,20,21,12,13,14,15,16,17,34,35,1,2,3,4,5,6,22,23,24,7,8,9,10,11)

Nested Loop  (cost=0.00..1153.60 rows=35 width=2246) (actual
time=86835.883..803363.979 rows=767 loops=1)
  Join Filter: _st_intersects(data.point, area.polygon)
  ->  Index Scan using "area_polysetID_index" on area  (cost=0.00..20.04
rows=1 width=1949) (actual time=0.021..16.287 rows=35 loops=1)
       Index Cond: ("polysetID" = 1)
        Filter: (id = ANY
('{28,29,30,31,32,33,25,26,27,18,19,20,21,12,13,14,15,16,17,34,35,1,2,3,4,5,6,22,23,24,7,8,9,10,11}'::integer[]))
  ->  Index Scan using data_point_index on data  (cost=0.00..1133.30 rows=1
width=297) (actual time=17202.126..22952.706 rows=33 loops=35)
        Index Cond: (point && area.polygon)
        Filter: ((readingdatetime >= '2009-01-01 00:00:00'::timestamp
without time zone) AND (readingdatetime <= '2012-01-19 00:00:00'::timestamp
without time zone) AND (datasetid = ANY ('{1,3}'::integer[])))
Total runtime: 803364.120 ms

Very very slow as you can see taking nearly 14 minutes

Exact copy of the table with the point INDEX remove.

select * FROM data_new2, area WHERE st_intersects ( data_new2.point,
area.polygon) AND
(readingdatetime BETWEEN '2009-01-01' AND '2012-01-19') AND datasetid
IN(1,3) AND
 "polysetID" = 1 AND
area.idIN(28,29,30,31,32,33,25,26,27,18,19,20,21,12,13,14,15,16,17,34,35,1,2,3,4,5,6,22,23,24,7,8,9,10,11)

Nested Loop  (cost=2576.91..284972.54 rows=34 width=2246) (actual
time=126.977..181.832 rows=767 loops=1)
  Join Filter: ((data_new2.point && area.polygon) AND
_st_intersects(data_new2.point, area.polygon))
  ->  Index Scan using "area_polysetID_index" on area  (cost=0.00..20.04
rows=1 width=1949) (actual time=0.059..0.111 rows=35 loops=1)
        Index Cond: ("polysetID" = 1)
        Filter: (id = ANY
('{28,29,30,31,32,33,25,26,27,18,19,20,21,12,13,14,15,16,17,34,35,1,2,3,4,5,6,22,23,24,7,8,9,10,11}'::integer[]))
 ->  Bitmap Heap Scan on data_new2  (cost=2576.91..261072.36 rows=90972
width=297) (actual time=2.531..4.057 rows=2247 loops=35)
        Recheck Cond: ((datasetid = ANY ('{1,3}'::integer[])) AND
(readingdatetime >= '2009-01-01 00:00:00'::timestamp without time zone) AND
(readingdatetime <= '2012-01-19 00:00:00'::timestamp without time zone))
        ->  Bitmap Index Scan on
"data_new2_datasetid_readingDatetime_index"  (cost=0.00..2554.16 rows=90972
width=0) (actual time=2.089..2.089 rows=2247 loops=35)
              Index Cond: ((datasetid = ANY ('{1,3}'::integer[])) AND
(readingdatetime >= '2009-01-01 00:00:00'::timestamp without time zone) AND
(readingdatetime <= '2012-01-19 00:00:00'::timestamp without time zone))
Total runtime: 181.930 ms

Really fast in this case under a second.

As you can see removing the INDEX seems to make it a lot faster but my
question is why? Also why does it sometimes slow down till I do a VACCUM
I've not changed the database in anyway its only been READ from.

Would be really grateful for any input if any more information is required
about the data or database structure please let me know.

Thanks,

Mark Davidson
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20121119/43917be8/attachment.html>