[postgis-users] PostGIS INDEX Performance Intermittent

Mon Nov 19 14:04:28 PST 2012

Hack solution (maybe) here

http://gis.stackexchange.com/questions/41199/postgis-intermittent-index-performance

P

On Mon, Nov 19, 2012 at 9:58 AM, Paul Ramsey <pramsey at opengeo.org> wrote:
> The first two queries have the same plan, so I can't explain why they
> would have such radically different timings, perhaps there's something
> else going on that the plan printout is not expressing. The *stats* in
> the plans *do* differ slightly, so the "reason" for the flip-over in
> behaviour would be that as the autovacuumer slowly increases the
> number of samples in the stats over time, at some point an inflection
> point is reached, and the behavior radically changes: I would have
> expected that to show up in the plan though.
>
> There's nothing we can do about this, it's in the guts of PgSQL somewhere.
>
> The final query is showing the best performance, and it's where you've
> removed the spatial index. The problem is clearly that our join stats
> aren't giving a good enough answer, so you're getting the bad plans in
> the previous two examples. That's something we can do something about,
> but it's not trivial work, unfortunately, so having someone with the
> time to actually do it is problematical, absent some funder pushing
> for the work.
>
> We have a few other open tickets on stats and selectivity that may or
> may not be similar in nature (they too deal with point tables)
>
> P.
>
> On Mon, Nov 19, 2012 at 9:39 AM, Mark Davidson <mark at 4each.co.uk> wrote:
>> Dear All,
>>
>> I'm hoping someone can help me with some performance issues I'm having
>> related to the use of an INDEX on my geometry column.
>>
>> I'm using PostgreSQL 9.1.4 on x86_64-unknown-linux-gnu, compiled by
>> gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5.1) 4.4.3, 64-bit and PostGIS 1.5
>>
>> My query is joining the areas table ( ~1000 polygons ) in this query looking
>> for data points contained within 35 of them to the data table ( ~55 million
>> data points in the table ) with this example query only returning 767 of
>> them.
>>
>> Query and its explain result working well
>>
>> select * FROM data, area WHERE st_intersects ( data.point, area.polygon) AND
>> (readingdatetime BETWEEN '2009-01-01' AND '2012-01-19') AND datasetid
>> IN(1,3) AND
>>  "polysetID" = 1 AND area.id
>> IN(28,29,30,31,32,33,25,26,27,18,19,20,21,12,13,14,15,16,17,34,35,1,2,3,4,5,6,22,23,24,7,8,9,10,11)
>>
>> Nested Loop  (cost=0.00..791.38 rows=20 width=2244) (actual
>> time=2149.383..26059.562 rows=767 loops=1)
>>   Join Filter: _st_intersects(data.point, area.polygon)
>>   ->  Index Scan using "area_polysetID_index" on area  (cost=0.00..16.56
>> rows=1 width=1947) (actual time=0.017..0.193 rows=35 loops=1)
>>         Index Cond: ("polysetID" = 1)
>>         Filter: (id = ANY
>> ('{28,29,30,31,32,33,25,26,27,18,19,20,21,12,13,14,15,16,17,34,35,1,2,3,4,5,6,22,23,24,7,8,9,10,11}'::integer[]))
>>   ->  Index Scan using data_point_index on data  (cost=0.00..774.57 rows=1
>> width=297) (actual time=588.910..744.472 rows=33 loops=35)
>>         Index Cond: (point && area.polygon)
>>         Filter: ((readingdatetime >= '2009-01-01 00:00:00'::timestamp
>> without time zone) AND (readingdatetime <= '2012-01-19 00:00:00'::timestamp
>> without time zone) AND (datasetid = ANY ('{1,3}'::integer[])))
>> Total runtime: 26059.677 ms
>>
>> as you can see takes about 26 seconds
>>
>> Sometimes the query will take a lot longer and get a result like this.
>> Normally at this point I have to VACCUM ANALYSE the table before it will
>> behave properly again
>>
>> select * FROM data, area WHERE st_intersects ( data.point, area.polygon) AND
>> (readingdatetime BETWEEN '2009-01-01' AND '2012-01-19') AND datasetid
>> IN(1,3) AND
>>  "polysetID" = 1 AND area.id
>> IN(28,29,30,31,32,33,25,26,27,18,19,20,21,12,13,14,15,16,17,34,35,1,2,3,4,5,6,22,23,24,7,8,9,10,11)
>>
>> Nested Loop  (cost=0.00..1153.60 rows=35 width=2246) (actual
>> time=86835.883..803363.979 rows=767 loops=1)
>>   Join Filter: _st_intersects(data.point, area.polygon)
>>   ->  Index Scan using "area_polysetID_index" on area  (cost=0.00..20.04
>> rows=1 width=1949) (actual time=0.021..16.287 rows=35 loops=1)
>>        Index Cond: ("polysetID" = 1)
>>         Filter: (id = ANY
>> ('{28,29,30,31,32,33,25,26,27,18,19,20,21,12,13,14,15,16,17,34,35,1,2,3,4,5,6,22,23,24,7,8,9,10,11}'::integer[]))
>>   ->  Index Scan using data_point_index on data  (cost=0.00..1133.30 rows=1
>> width=297) (actual time=17202.126..22952.706 rows=33 loops=35)
>>         Index Cond: (point && area.polygon)
>>         Filter: ((readingdatetime >= '2009-01-01 00:00:00'::timestamp
>> without time zone) AND (readingdatetime <= '2012-01-19 00:00:00'::timestamp
>> without time zone) AND (datasetid = ANY ('{1,3}'::integer[])))
>> Total runtime: 803364.120 ms
>>
>> Very very slow as you can see taking nearly 14 minutes
>>
>> Exact copy of the table with the point INDEX remove.
>>
>> select * FROM data_new2, area WHERE st_intersects ( data_new2.point,
>> area.polygon) AND
>> (readingdatetime BETWEEN '2009-01-01' AND '2012-01-19') AND datasetid
>> IN(1,3) AND
>>  "polysetID" = 1 AND area.id
>> IN(28,29,30,31,32,33,25,26,27,18,19,20,21,12,13,14,15,16,17,34,35,1,2,3,4,5,6,22,23,24,7,8,9,10,11)
>>
>> Nested Loop  (cost=2576.91..284972.54 rows=34 width=2246) (actual
>> time=126.977..181.832 rows=767 loops=1)
>>   Join Filter: ((data_new2.point && area.polygon) AND
>> _st_intersects(data_new2.point, area.polygon))
>>   ->  Index Scan using "area_polysetID_index" on area  (cost=0.00..20.04
>> rows=1 width=1949) (actual time=0.059..0.111 rows=35 loops=1)
>>         Index Cond: ("polysetID" = 1)
>>         Filter: (id = ANY
>> ('{28,29,30,31,32,33,25,26,27,18,19,20,21,12,13,14,15,16,17,34,35,1,2,3,4,5,6,22,23,24,7,8,9,10,11}'::integer[]))
>>  ->  Bitmap Heap Scan on data_new2  (cost=2576.91..261072.36 rows=90972
>> width=297) (actual time=2.531..4.057 rows=2247 loops=35)
>>         Recheck Cond: ((datasetid = ANY ('{1,3}'::integer[])) AND
>> (readingdatetime >= '2009-01-01 00:00:00'::timestamp without time zone) AND
>> (readingdatetime <= '2012-01-19 00:00:00'::timestamp without time zone))
>>         ->  Bitmap Index Scan on "data_new2_datasetid_readingDatetime_index"
>> (cost=0.00..2554.16 rows=90972 width=0) (actual time=2.089..2.089 rows=2247
>> loops=35)
>>               Index Cond: ((datasetid = ANY ('{1,3}'::integer[])) AND
>> (readingdatetime >= '2009-01-01 00:00:00'::timestamp without time zone) AND
>> (readingdatetime <= '2012-01-19 00:00:00'::timestamp without time zone))
>> Total runtime: 181.930 ms
>>
>> Really fast in this case under a second.
>>
>> As you can see removing the INDEX seems to make it a lot faster but my
>> question is why? Also why does it sometimes slow down till I do a VACCUM
>> I've not changed the database in anyway its only been READ from.
>>
>> Would be really grateful for any input if any more information is required
>> about the data or database structure please let me know.
>>
>> Thanks,
>>
>> Mark Davidson
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at lists.osgeo.org
>> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
>>