[postgis-devel] Point Selectivity
strk at keybit.net
Wed Nov 28 00:15:43 PST 2012
On Tue, Nov 27, 2012 at 05:27:15PM -0600, Paul Ramsey wrote:
> Could you confirm for me:
> When a search box is a point, the search box size is very small, so
> when the box is compared to the histogram it finds one cell, and then
> pro-rates it by the overlap of the search box on the cell, which is
> very very small. This is used to pro-rate the value count, which
> causes the selectivity number to be a very very small portion of the
> cell count. It seems like this will generate pretty small selectivity
> estimates, smaller than are warranted.
> AOI = intersect_x*intersect_y;
> gain = AOI/cell_area;
> The comments say
> * If the search_box is a point, it will
> * overlap a single cell and thus get
> * it's value, which is the fraction of
> * samples (we can presume of row set also)
> * which bumped to that cell.
> but I think, contra the comment, the search box does not get the value
> of a cell, but a very small fraction of that cell value.
> If you can confirm, that would help me move forward without making
> some subtle error in interpreting your prior work.
Yes, I confirm it doesn't take the whole cell value, but only a tiny
fraction of it (point_bbox_area / the_cell_area factor).
The comment should be updated accordingly.
The rationale for that gain is: when the box is smaller than the cell,
how many of the cell-overlapping features would really be hit by that
box ? I recon we could tweak that gain to be non-linear.
Note that this number is later further modified by another gain,
which is ( 1 / <average_number_of_cells_per_feature> ).
But I do see that <average_number_of_cells_per_feature> would never
be < 1 in normal conditions, so it's indeed an uneven threatment.
How's the testsuite for estimates coming along ?
More information about the postgis-devel