Re: The factor of 10<br>There wasn't really anything substantive there. I was just musing about how the query planner could get a better estimate and why that wouldn't work. The number of points per polygon ranges from about 5,000 to 20,000. An average would probably be somewhere around 10-15 thousand, and all the point counts would fall well within average/10 to average*10. I recall reading somewhere that estimates off by single digit factors aren't a cause for concern when looking at query plans. So I was saying the average number of points would probably be a good enough estimate for the query planner to make intelligent decisions, but since that's a multi-table relationship, the database wouldn't track that information and couldn't use it for query planning. That's all.<br>


<br><div class="gmail_quote">On Thu, Jun 13, 2013 at 2:00 AM, Sandro Santilli <span dir="ltr"><<a href="mailto:strk@keybit.net" target="_blank">strk@keybit.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div class="im">On Wed, Jun 12, 2013 at 02:29:05PM -0400, BladeOfLight16 wrote:<br>

<br>

> Re: JOIN selectivity<br>

><br>

> Not sure that's an option. In my real data, I'm joining to several<br>

> polygons. Many of my queries go off to different tables that have polygons<br>

> subdividing the main polygon table. So are you basically saying that JOIN<br>

> and spatial data don't mesh well? Is there anything that can be done about<br>

> that fact? Hm. My data is pretty regular; all the points in the point table<br>

> fall within a polygon in the polygon table. If there were a way for the<br>

> system to figure out some kind of average points per polygon or something<br>

> even close to that, that would probably be within a factor of 10 or so. Of<br>

> course, indexes don't usually track cross table relationships (ever, that<br>

> I'm aware of).<br>

<br>

</div>I don't understand the factor of 10 here. Your case is really returning<br>

10000 rows within a single polygon, right ? What's 10 here ?<br>

I'm not even sure the join selectivity code is being engaged, consider<br>

rebuilding postgis with --enable-debug to get traces of what the estimator<br>

thinks about that. Sharing the dataset could be also interesting.<br>

<div class="im"><br>

> I suppose what I really need is some kind of topology. Would the topology<br>

> libraries allow me to inform the system about the relationship between the<br>

> points and polygons and allow the query planner to take advantage of that<br>

> information? Even if so, topology functionality confuses me. Do you know of<br>

> a good introductory tutorial? I'm sure I won't have a problem with the<br>

> concepts; I just need some direction on how the tools in PostGIS work.<br>

<br>

</div>Of course if you put all your points and your polygons into a topology<br>

those relationships will be explicit so things would be faster, but building<br>

the topology and maintaining it if changes occur will be more expensive.<br>

The postgis manual contains all the information we have about it, the<br>

closer to a tutorial is likely my slides, linked from the manual.<br>

<br>

--strk;<br>

<div class="HOEnZb"><div class="h5"><br>

><br>

> Thanks for the help.<br>

><br>

><br>

> On Wed, Jun 12, 2013 at 5:32 AM, Sandro Santilli <<a href="mailto:strk@keybit.net">strk@keybit.net</a>> wrote:<br>

><br>

> > On Wed, Jun 12, 2013 at 04:23:18AM -0400, BladeOfLight16 wrote:<br>

> ><br>

> > > the query planner is<br>

> > > getting horrible estimates for the number of point rows returned by the<br>

> > > spatial index.<br>

> ><br>

> > [...]<br>

> ><br>

> > > -- Insert a lot of points contained by the rectangles<br>

> > > INSERT INTO point (some_value, polygon_id, geom)<br>

> > > SELECT random()*100 + 20<br>

> > >      , polygon_id<br>

> > >      , ST_SetSRID(('POINT('||x||' '||y||')')::GEOMETRY, 26915)<br>

> > > FROM (SELECT polygon_id<br>

> > >            , random()*(ST_XMax(geom) - ST_XMin(geom)) + ST_XMin(geom) AS<br>

> > x<br>

> > >            , random()*(ST_YMax(geom) - ST_YMin(geom)) + ST_YMin(geom) AS<br>

> > y<br>

> > >            , generate_series(1,(random()*10000+5000)::INTEGER)<br>

> > >       FROM polygon) num_points<br>

> > > ;<br>

> > ><br>

> > > CREATE INDEX polygon_index ON polygon USING GIST (geom);<br>

> > > CREATE INDEX point_index ON point USING GIST (geom);<br>

> ><br>

> > [...]<br>

> ><br>

> > > EXPLAIN ANALYZE<br>

> > > SELECT SUM(some_value)<br>

> > > FROM point<br>

> > > JOIN polygon ON ST_Contains(polygon.geom, point.geom)<br>

> > > WHERE polygon.polygon_id = 50;<br>

> ><br>

> > [...]<br>

> ><br>

> > >               ->  Bitmap Index Scan on point_index  (cost=0.00..4.50<br>

> > rows=5<br>

> > > width=0) (actual time=1.869..1.869 rows=10180 loops=1)<br>

> > >                     Index Cond: (polygon.geom && geom)<br>

> ><br>

> > [...]<br>

> ><br>

> > > Note the "Bitmap Index Scan on point_index" line. The query planner<br>

> > > estimates 5 rows will come back. In reality, over 10000 (a 2000 times<br>

> > > increase) are returned. Is this a bug? Is there anything I can do to<br>

> > > improve the estimated?<br>

> ><br>

> > I didn't see an ANALYZE run between the INSERT and the EXPLAIN,<br>

> > does analyzing both the polygon table and the point table help<br>

> > in any way ? What version of PostGIS are you running ?<br>

> ><br>

> > Note that the JOIN selectivity estimator can't be that good (doesn't<br>

> > have enough information about which polygon you're going to pick from<br>

> > the polygons set, but makes a guess based on the whole table instead)<br>

> > so if you can turn that single polygon into a constant it should help<br>

> > the estimator.<br>

> ><br>

> > --strk;<br>

_______________________________________________<br>

postgis-users mailing list<br>

<a href="mailto:postgis-users@lists.osgeo.org">postgis-users@lists.osgeo.org</a><br>

<a href="http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users" target="_blank">http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users</a><br>

</div></div></blockquote></div><br>