[postgis-users] Bad estimate from index on points

Paul Ramsey pramsey at boundlessgeo.com
Sun Nov 3 19:12:27 PST 2013


Coming back to this a bit late (hah!) ... what PostGIS version is this on?

P

On Wed, Jun 12, 2013 at 1:23 AM, BladeOfLight16
<bladeoflight16 at gmail.com> wrote:
> I have a table that contains many point rows. The points all fall within a
> polygon in another table. There are roughly 5 to 20 thousand points per
> polygon. The polygons will frequently overlap. (The data is annual.) I have
> to calculate aggregate values on the point data, and the query planner is
> getting horrible estimates for the number of point rows returned by the
> spatial index. I have create a psql script that will generate a database
> with similar data. (My apologies for the length.)
>
>
> DROP DATABASE IF EXISTS test_db;
> CREATE DATABASE test_db;
>
> \c test_db
>
> CREATE EXTENSION postgis;
>
> -- SRID is UTM 15 with NAD 83 datum
>
> CREATE TABLE polygon
> (
>     polygon_id SERIAL PRIMARY KEY,
>     geom GEOMETRY(MULTIPOLYGON, 26915)
> );
>
> CREATE TABLE point
> (
>     point_id SERIAL PRIMARY KEY,
>     some_value DECIMAL(10,5),
>     polygon_id INTEGER NOT NULL REFERENCES polygon (polygon_id),
>     geom GEOMETRY(POINT, 26915)
> );
>
> -- Randomly generate 100 rectangles
> INSERT INTO polygon (geom)
> SELECT ST_Multi(ST_SetSRID(ST_Envelope(('LINESTRING('||x||'
> '||y||','||(x+side)||' '||(y+side)||')')::GEOMETRY), 26915))
> FROM (SELECT random()*(823749.9411-176250.0589) + 176250.0589 AS x
>            , random()*(9106037.1690-1577463.0797) + 1577463.0797 AS y
>            , random()*(5000-1000) + 1000 AS side
>       FROM generate_series(1, 100)
>      ) box_data
> ;
>
> -- Insert a lot of points contained by the rectangles
> INSERT INTO point (some_value, polygon_id, geom)
> SELECT random()*100 + 20
>      , polygon_id
>      , ST_SetSRID(('POINT('||x||' '||y||')')::GEOMETRY, 26915)
> FROM (SELECT polygon_id
>            , random()*(ST_XMax(geom) - ST_XMin(geom)) + ST_XMin(geom) AS x
>            , random()*(ST_YMax(geom) - ST_YMin(geom)) + ST_YMin(geom) AS y
>            , generate_series(1,(random()*10000+5000)::INTEGER)
>       FROM polygon) num_points
> ;
>
> CREATE INDEX polygon_index ON polygon USING GIST (geom);
> CREATE INDEX point_index ON point USING GIST (geom);
> CREATE INDEX fki_point_polygon_id_fkey ON point USING btree (polygon_id);
>
>
> Now connect to the database and run this query:
>
> EXPLAIN ANALYZE
> SELECT SUM(some_value)
> FROM point
> JOIN polygon ON ST_Contains(polygon.geom, point.geom)
> WHERE polygon.polygon_id = 50;
>
> Here is an example result:
>
> Aggregate  (cost=45.91..45.92 rows=1 width=8) (actual time=303.788..303.793
> rows=1 loops=1)
>   ->  Nested Loop  (cost=4.50..29.77 rows=6457 width=8) (actual
> time=77.441..247.409 rows=10180 loops=1)
>         ->  Seq Scan on polygon  (cost=0.00..4.25 rows=1 width=128) (actual
> time=0.024..0.039 rows=1 loops=1)
>               Filter: (polygon_id = 50)
>               Rows Removed by Filter: 99
>         ->  Bitmap Heap Scan on point  (cost=4.50..25.50 rows=2 width=56)
> (actual time=77.382..151.578 rows=10180 loops=1)
>               Recheck Cond: (polygon.geom && geom)
>               Filter: _st_contains(polygon.geom, geom)
>               ->  Bitmap Index Scan on point_index  (cost=0.00..4.50 rows=5
> width=0) (actual time=1.869..1.869 rows=10180 loops=1)
>                     Index Cond: (polygon.geom && geom)
> Total runtime: 304.766 ms
>
> Note the "Bitmap Index Scan on point_index" line. The query planner
> estimates 5 rows will come back. In reality, over 10000 (a 2000 times
> increase) are returned. Is this a bug? Is there anything I can do to improve
> the estimated? I've been having performance issues on some very complex
> queries with my real data, and while this might not be the cause of them, it
> certainly doesn't help the query planner find the best plan. On my real
> data, increasing statistics did not help; I haven't tested increasing
> statistics on this sample.
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
>


More information about the postgis-users mailing list