[postgis-devel] GiST JoinSel Bogus?

Mon Nov 19 16:55:40 PST 2012

I've been looking over selectivity code, getting ready to attached
selectivity on &&&, and also to try and figure out some of our knotty
tickets on selectivity, and the GiST JoinSel code seems bogus to me,
anyone care to look?

http://trac.osgeo.org/postgis/browser/trunk/postgis/geometry_estimate.c#L170

Basically, the core assumption is that, within the area that the
tables overlap, every feature in one table will find two features in
the other table it intersects with. Kind of an arbitrary scaling. The
right way to do it, IMO, is to do the table-vs-table equivalent of the
table-vs-extent calculation done in the plain GiST Sel function: add
up the cell intersections individually, and adjust for proportion of
feature overlapping. It's way more complex, but far more likely to
reflect Truth.

Since the correct logic undercounts potential interactions quite
dramatically, it might explain the bad plans reported out of here:
http://gis.stackexchange.com/questions/41199/postgis-intermittent-index-performance,
which look to be caused by a very aggressive selectivity number being
fed to the planner by the && cost function.

Commentary?

P.