[postgis-users] Bad estimate from index on points
BladeOfLight16
bladeoflight16 at gmail.com
Wed Jun 12 02:29:38 PDT 2013
Forgive me. I forgot to VACUUM FULL ANALYZE;. Please add that command to
the end of the script. It doesn't change the outcome.
Thank you.
On Wed, Jun 12, 2013 at 4:23 AM, BladeOfLight16 <bladeoflight16 at gmail.com>wrote:
> I have a table that contains many point rows. The points all fall within a
> polygon in another table. There are roughly 5 to 20 thousand points per
> polygon. The polygons will frequently overlap. (The data is annual.) I have
> to calculate aggregate values on the point data, and the query planner is
> getting horrible estimates for the number of point rows returned by the
> spatial index. I have create a psql script that will generate a database
> with similar data. (My apologies for the length.)
>
>
> DROP DATABASE IF EXISTS test_db;
> CREATE DATABASE test_db;
>
> \c test_db
>
> CREATE EXTENSION postgis;
>
> -- SRID is UTM 15 with NAD 83 datum
>
> CREATE TABLE polygon
> (
> polygon_id SERIAL PRIMARY KEY,
> geom GEOMETRY(MULTIPOLYGON, 26915)
> );
>
> CREATE TABLE point
> (
> point_id SERIAL PRIMARY KEY,
> some_value DECIMAL(10,5),
> polygon_id INTEGER NOT NULL REFERENCES polygon (polygon_id),
> geom GEOMETRY(POINT, 26915)
> );
>
> -- Randomly generate 100 rectangles
> INSERT INTO polygon (geom)
> SELECT ST_Multi(ST_SetSRID(ST_Envelope(('LINESTRING('||x||'
> '||y||','||(x+side)||' '||(y+side)||')')::GEOMETRY), 26915))
> FROM (SELECT random()*(823749.9411-176250.0589) + 176250.0589 AS x
> , random()*(9106037.1690-1577463.0797) + 1577463.0797 AS y
> , random()*(5000-1000) + 1000 AS side
> FROM generate_series(1, 100)
> ) box_data
> ;
>
> -- Insert a lot of points contained by the rectangles
> INSERT INTO point (some_value, polygon_id, geom)
> SELECT random()*100 + 20
> , polygon_id
> , ST_SetSRID(('POINT('||x||' '||y||')')::GEOMETRY, 26915)
> FROM (SELECT polygon_id
> , random()*(ST_XMax(geom) - ST_XMin(geom)) + ST_XMin(geom) AS x
> , random()*(ST_YMax(geom) - ST_YMin(geom)) + ST_YMin(geom) AS y
> , generate_series(1,(random()*10000+5000)::INTEGER)
> FROM polygon) num_points
> ;
>
> CREATE INDEX polygon_index ON polygon USING GIST (geom);
> CREATE INDEX point_index ON point USING GIST (geom);
> CREATE INDEX fki_point_polygon_id_fkey ON point USING btree (polygon_id);
>
>
> Now connect to the database and run this query:
>
> EXPLAIN ANALYZE
> SELECT SUM(some_value)
> FROM point
> JOIN polygon ON ST_Contains(polygon.geom, point.geom)
> WHERE polygon.polygon_id = 50;
>
> Here is an example result:
>
> Aggregate (cost=45.91..45.92 rows=1 width=8) (actual
> time=303.788..303.793 rows=1 loops=1)
> -> Nested Loop (cost=4.50..29.77 rows=6457 width=8) (actual
> time=77.441..247.409 rows=10180 loops=1)
> -> Seq Scan on polygon (cost=0.00..4.25 rows=1 width=128)
> (actual time=0.024..0.039 rows=1 loops=1)
> Filter: (polygon_id = 50)
> Rows Removed by Filter: 99
> -> Bitmap Heap Scan on point (cost=4.50..25.50 rows=2 width=56)
> (actual time=77.382..151.578 rows=10180 loops=1)
> Recheck Cond: (polygon.geom && geom)
> Filter: _st_contains(polygon.geom, geom)
> -> Bitmap Index Scan on point_index (cost=0.00..4.50
> rows=5 width=0) (actual time=1.869..1.869 rows=10180 loops=1)
> Index Cond: (polygon.geom && geom)
> Total runtime: 304.766 ms
>
> Note the "Bitmap Index Scan on point_index" line. The query planner
> estimates 5 rows will come back. In reality, over 10000 (a 2000 times
> increase) are returned. Is this a bug? Is there anything I can do to
> improve the estimated? I've been having performance issues on some very
> complex queries with my real data, and while this might not be the cause of
> them, it certainly doesn't help the query planner find the best plan. On my
> real data, increasing statistics did not help; I haven't tested increasing
> statistics on this sample.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20130612/43bf010d/attachment.html>
More information about the postgis-users
mailing list