[postgis-users] Bad estimate from index on points
Sandro Santilli
strk at keybit.net
Wed Jun 12 02:32:31 PDT 2013
On Wed, Jun 12, 2013 at 04:23:18AM -0400, BladeOfLight16 wrote:
> the query planner is
> getting horrible estimates for the number of point rows returned by the
> spatial index.
[...]
> -- Insert a lot of points contained by the rectangles
> INSERT INTO point (some_value, polygon_id, geom)
> SELECT random()*100 + 20
> , polygon_id
> , ST_SetSRID(('POINT('||x||' '||y||')')::GEOMETRY, 26915)
> FROM (SELECT polygon_id
> , random()*(ST_XMax(geom) - ST_XMin(geom)) + ST_XMin(geom) AS x
> , random()*(ST_YMax(geom) - ST_YMin(geom)) + ST_YMin(geom) AS y
> , generate_series(1,(random()*10000+5000)::INTEGER)
> FROM polygon) num_points
> ;
>
> CREATE INDEX polygon_index ON polygon USING GIST (geom);
> CREATE INDEX point_index ON point USING GIST (geom);
[...]
> EXPLAIN ANALYZE
> SELECT SUM(some_value)
> FROM point
> JOIN polygon ON ST_Contains(polygon.geom, point.geom)
> WHERE polygon.polygon_id = 50;
[...]
> -> Bitmap Index Scan on point_index (cost=0.00..4.50 rows=5
> width=0) (actual time=1.869..1.869 rows=10180 loops=1)
> Index Cond: (polygon.geom && geom)
[...]
> Note the "Bitmap Index Scan on point_index" line. The query planner
> estimates 5 rows will come back. In reality, over 10000 (a 2000 times
> increase) are returned. Is this a bug? Is there anything I can do to
> improve the estimated?
I didn't see an ANALYZE run between the INSERT and the EXPLAIN,
does analyzing both the polygon table and the point table help
in any way ? What version of PostGIS are you running ?
Note that the JOIN selectivity estimator can't be that good (doesn't
have enough information about which polygon you're going to pick from
the polygons set, but makes a guess based on the whole table instead)
so if you can turn that single polygon into a constant it should help
the estimator.
--strk;
More information about the postgis-users
mailing list