[postgis-users] Bad estimate from index on points

Sandro Santilli strk at keybit.net
Wed Jun 12 23:00:05 PDT 2013


On Wed, Jun 12, 2013 at 02:29:05PM -0400, BladeOfLight16 wrote:

> Re: JOIN selectivity
> 
> Not sure that's an option. In my real data, I'm joining to several
> polygons. Many of my queries go off to different tables that have polygons
> subdividing the main polygon table. So are you basically saying that JOIN
> and spatial data don't mesh well? Is there anything that can be done about
> that fact? Hm. My data is pretty regular; all the points in the point table
> fall within a polygon in the polygon table. If there were a way for the
> system to figure out some kind of average points per polygon or something
> even close to that, that would probably be within a factor of 10 or so. Of
> course, indexes don't usually track cross table relationships (ever, that
> I'm aware of).

I don't understand the factor of 10 here. Your case is really returning
10000 rows within a single polygon, right ? What's 10 here ?
I'm not even sure the join selectivity code is being engaged, consider
rebuilding postgis with --enable-debug to get traces of what the estimator
thinks about that. Sharing the dataset could be also interesting.

> I suppose what I really need is some kind of topology. Would the topology
> libraries allow me to inform the system about the relationship between the
> points and polygons and allow the query planner to take advantage of that
> information? Even if so, topology functionality confuses me. Do you know of
> a good introductory tutorial? I'm sure I won't have a problem with the
> concepts; I just need some direction on how the tools in PostGIS work.

Of course if you put all your points and your polygons into a topology
those relationships will be explicit so things would be faster, but building
the topology and maintaining it if changes occur will be more expensive.
The postgis manual contains all the information we have about it, the
closer to a tutorial is likely my slides, linked from the manual. 

--strk;

> 
> Thanks for the help.
> 
> 
> On Wed, Jun 12, 2013 at 5:32 AM, Sandro Santilli <strk at keybit.net> wrote:
> 
> > On Wed, Jun 12, 2013 at 04:23:18AM -0400, BladeOfLight16 wrote:
> >
> > > the query planner is
> > > getting horrible estimates for the number of point rows returned by the
> > > spatial index.
> >
> > [...]
> >
> > > -- Insert a lot of points contained by the rectangles
> > > INSERT INTO point (some_value, polygon_id, geom)
> > > SELECT random()*100 + 20
> > >      , polygon_id
> > >      , ST_SetSRID(('POINT('||x||' '||y||')')::GEOMETRY, 26915)
> > > FROM (SELECT polygon_id
> > >            , random()*(ST_XMax(geom) - ST_XMin(geom)) + ST_XMin(geom) AS
> > x
> > >            , random()*(ST_YMax(geom) - ST_YMin(geom)) + ST_YMin(geom) AS
> > y
> > >            , generate_series(1,(random()*10000+5000)::INTEGER)
> > >       FROM polygon) num_points
> > > ;
> > >
> > > CREATE INDEX polygon_index ON polygon USING GIST (geom);
> > > CREATE INDEX point_index ON point USING GIST (geom);
> >
> > [...]
> >
> > > EXPLAIN ANALYZE
> > > SELECT SUM(some_value)
> > > FROM point
> > > JOIN polygon ON ST_Contains(polygon.geom, point.geom)
> > > WHERE polygon.polygon_id = 50;
> >
> > [...]
> >
> > >               ->  Bitmap Index Scan on point_index  (cost=0.00..4.50
> > rows=5
> > > width=0) (actual time=1.869..1.869 rows=10180 loops=1)
> > >                     Index Cond: (polygon.geom && geom)
> >
> > [...]
> >
> > > Note the "Bitmap Index Scan on point_index" line. The query planner
> > > estimates 5 rows will come back. In reality, over 10000 (a 2000 times
> > > increase) are returned. Is this a bug? Is there anything I can do to
> > > improve the estimated?
> >
> > I didn't see an ANALYZE run between the INSERT and the EXPLAIN,
> > does analyzing both the polygon table and the point table help
> > in any way ? What version of PostGIS are you running ?
> >
> > Note that the JOIN selectivity estimator can't be that good (doesn't
> > have enough information about which polygon you're going to pick from
> > the polygons set, but makes a guess based on the whole table instead)
> > so if you can turn that single polygon into a constant it should help
> > the estimator.
> >
> > --strk;


More information about the postgis-users mailing list