[postgis-devel] [postgis-users] PostGIS INDEX Performance Intermittent

Mon Nov 19 09:58:27 PST 2012

The first two queries have the same plan, so I can't explain why they
would have such radically different timings, perhaps there's something
else going on that the plan printout is not expressing. The *stats* in
the plans *do* differ slightly, so the "reason" for the flip-over in
behaviour would be that as the autovacuumer slowly increases the
number of samples in the stats over time, at some point an inflection
point is reached, and the behavior radically changes: I would have
expected that to show up in the plan though.

There's nothing we can do about this, it's in the guts of PgSQL somewhere.

The final query is showing the best performance, and it's where you've
removed the spatial index. The problem is clearly that our join stats
aren't giving a good enough answer, so you're getting the bad plans in
the previous two examples. That's something we can do something about,
but it's not trivial work, unfortunately, so having someone with the
time to actually do it is problematical, absent some funder pushing
for the work.

We have a few other open tickets on stats and selectivity that may or
may not be similar in nature (they too deal with point tables)

P.

On Mon, Nov 19, 2012 at 9:39 AM, Mark Davidson <mark at 4each.co.uk> wrote:
> Dear All,
>
> I'm hoping someone can help me with some performance issues I'm having
> related to the use of an INDEX on my geometry column.
>
> I'm using PostgreSQL 9.1.4 on x86_64-unknown-linux-gnu, compiled by
> gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5.1) 4.4.3, 64-bit and PostGIS 1.5
>
> My query is joining the areas table ( ~1000 polygons ) in this query looking
> for data points contained within 35 of them to the data table ( ~55 million
> data points in the table ) with this example query only returning 767 of
> them.
>
> Query and its explain result working well
>
> select * FROM data, area WHERE st_intersects ( data.point, area.polygon) AND
> (readingdatetime BETWEEN '2009-01-01' AND '2012-01-19') AND datasetid
> IN(1,3) AND
>  "polysetID" = 1 AND area.id
> IN(28,29,30,31,32,33,25,26,27,18,19,20,21,12,13,14,15,16,17,34,35,1,2,3,4,5,6,22,23,24,7,8,9,10,11)
>
> Nested Loop  (cost=0.00..791.38 rows=20 width=2244) (actual
> time=2149.383..26059.562 rows=767 loops=1)
>   Join Filter: _st_intersects(data.point, area.polygon)
>   ->  Index Scan using "area_polysetID_index" on area  (cost=0.00..16.56
> rows=1 width=1947) (actual time=0.017..0.193 rows=35 loops=1)
>         Index Cond: ("polysetID" = 1)
>         Filter: (id = ANY
> ('{28,29,30,31,32,33,25,26,27,18,19,20,21,12,13,14,15,16,17,34,35,1,2,3,4,5,6,22,23,24,7,8,9,10,11}'::integer[]))
>   ->  Index Scan using data_point_index on data  (cost=0.00..774.57 rows=1
> width=297) (actual time=588.910..744.472 rows=33 loops=35)
>         Index Cond: (point && area.polygon)
>         Filter: ((readingdatetime >= '2009-01-01 00:00:00'::timestamp
> without time zone) AND (readingdatetime <= '2012-01-19 00:00:00'::timestamp
> without time zone) AND (datasetid = ANY ('{1,3}'::integer[])))
> Total runtime: 26059.677 ms
>
> as you can see takes about 26 seconds
>
> Sometimes the query will take a lot longer and get a result like this.
> Normally at this point I have to VACCUM ANALYSE the table before it will
> behave properly again
>
> select * FROM data, area WHERE st_intersects ( data.point, area.polygon) AND
> (readingdatetime BETWEEN '2009-01-01' AND '2012-01-19') AND datasetid
> IN(1,3) AND
>  "polysetID" = 1 AND area.id
> IN(28,29,30,31,32,33,25,26,27,18,19,20,21,12,13,14,15,16,17,34,35,1,2,3,4,5,6,22,23,24,7,8,9,10,11)
>
> Nested Loop  (cost=0.00..1153.60 rows=35 width=2246) (actual
> time=86835.883..803363.979 rows=767 loops=1)
>   Join Filter: _st_intersects(data.point, area.polygon)
>   ->  Index Scan using "area_polysetID_index" on area  (cost=0.00..20.04
> rows=1 width=1949) (actual time=0.021..16.287 rows=35 loops=1)
>        Index Cond: ("polysetID" = 1)
>         Filter: (id = ANY
> ('{28,29,30,31,32,33,25,26,27,18,19,20,21,12,13,14,15,16,17,34,35,1,2,3,4,5,6,22,23,24,7,8,9,10,11}'::integer[]))
>   ->  Index Scan using data_point_index on data  (cost=0.00..1133.30 rows=1
> width=297) (actual time=17202.126..22952.706 rows=33 loops=35)
>         Index Cond: (point && area.polygon)
>         Filter: ((readingdatetime >= '2009-01-01 00:00:00'::timestamp
> without time zone) AND (readingdatetime <= '2012-01-19 00:00:00'::timestamp
> without time zone) AND (datasetid = ANY ('{1,3}'::integer[])))
> Total runtime: 803364.120 ms
>
> Very very slow as you can see taking nearly 14 minutes
>
> Exact copy of the table with the point INDEX remove.
>
> select * FROM data_new2, area WHERE st_intersects ( data_new2.point,
> area.polygon) AND
> (readingdatetime BETWEEN '2009-01-01' AND '2012-01-19') AND datasetid
> IN(1,3) AND
>  "polysetID" = 1 AND area.id
> IN(28,29,30,31,32,33,25,26,27,18,19,20,21,12,13,14,15,16,17,34,35,1,2,3,4,5,6,22,23,24,7,8,9,10,11)
>
> Nested Loop  (cost=2576.91..284972.54 rows=34 width=2246) (actual
> time=126.977..181.832 rows=767 loops=1)
>   Join Filter: ((data_new2.point && area.polygon) AND
> _st_intersects(data_new2.point, area.polygon))
>   ->  Index Scan using "area_polysetID_index" on area  (cost=0.00..20.04
> rows=1 width=1949) (actual time=0.059..0.111 rows=35 loops=1)
>         Index Cond: ("polysetID" = 1)
>         Filter: (id = ANY
> ('{28,29,30,31,32,33,25,26,27,18,19,20,21,12,13,14,15,16,17,34,35,1,2,3,4,5,6,22,23,24,7,8,9,10,11}'::integer[]))
>  ->  Bitmap Heap Scan on data_new2  (cost=2576.91..261072.36 rows=90972
> width=297) (actual time=2.531..4.057 rows=2247 loops=35)
>         Recheck Cond: ((datasetid = ANY ('{1,3}'::integer[])) AND
> (readingdatetime >= '2009-01-01 00:00:00'::timestamp without time zone) AND
> (readingdatetime <= '2012-01-19 00:00:00'::timestamp without time zone))
>         ->  Bitmap Index Scan on "data_new2_datasetid_readingDatetime_index"
> (cost=0.00..2554.16 rows=90972 width=0) (actual time=2.089..2.089 rows=2247
> loops=35)
>               Index Cond: ((datasetid = ANY ('{1,3}'::integer[])) AND
> (readingdatetime >= '2009-01-01 00:00:00'::timestamp without time zone) AND
> (readingdatetime <= '2012-01-19 00:00:00'::timestamp without time zone))
> Total runtime: 181.930 ms
>
> Really fast in this case under a second.
>
> As you can see removing the INDEX seems to make it a lot faster but my
> question is why? Also why does it sometimes slow down till I do a VACCUM
> I've not changed the database in anyway its only been READ from.
>
> Would be really grateful for any input if any more information is required
> about the data or database structure please let me know.
>
> Thanks,
>
> Mark Davidson
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
>