[postgis-users] Bad estimate from index on points

Sandro Santilli strk at keybit.net
Wed Jun 12 02:32:31 PDT 2013

On Wed, Jun 12, 2013 at 04:23:18AM -0400, BladeOfLight16 wrote:

> the query planner is
> getting horrible estimates for the number of point rows returned by the
> spatial index. 


> -- Insert a lot of points contained by the rectangles
> INSERT INTO point (some_value, polygon_id, geom)
> SELECT random()*100 + 20
>      , polygon_id
>      , ST_SetSRID(('POINT('||x||' '||y||')')::GEOMETRY, 26915)
> FROM (SELECT polygon_id
>            , random()*(ST_XMax(geom) - ST_XMin(geom)) + ST_XMin(geom) AS x
>            , random()*(ST_YMax(geom) - ST_YMin(geom)) + ST_YMin(geom) AS y
>            , generate_series(1,(random()*10000+5000)::INTEGER)
>       FROM polygon) num_points
> ;
> CREATE INDEX polygon_index ON polygon USING GIST (geom);
> CREATE INDEX point_index ON point USING GIST (geom);


> SELECT SUM(some_value)
> FROM point
> JOIN polygon ON ST_Contains(polygon.geom, point.geom)
> WHERE polygon.polygon_id = 50;


>               ->  Bitmap Index Scan on point_index  (cost=0.00..4.50 rows=5
> width=0) (actual time=1.869..1.869 rows=10180 loops=1)
>                     Index Cond: (polygon.geom && geom)


> Note the "Bitmap Index Scan on point_index" line. The query planner
> estimates 5 rows will come back. In reality, over 10000 (a 2000 times
> increase) are returned. Is this a bug? Is there anything I can do to
> improve the estimated? 

I didn't see an ANALYZE run between the INSERT and the EXPLAIN,
does analyzing both the polygon table and the point table help
in any way ? What version of PostGIS are you running ?

Note that the JOIN selectivity estimator can't be that good (doesn't
have enough information about which polygon you're going to pick from
the polygons set, but makes a guess based on the whole table instead)
so if you can turn that single polygon into a constant it should help
the estimator.


More information about the postgis-users mailing list