[postgis-devel] Re: geometry stats
strk
strk at keybit.net
Thu Feb 26 08:48:21 PST 2004
m.cave-ayland wrote:
> Hi strk,
>
> I've just managed to get my first successful query working using PG7.5
> CVS and your new ANALYZE code for PostGIS :)
>
>
> Along the way I found a bug in line 1187 of postgis_estimate.c:
>
> overlapping_cells = (x_idx_max-x_idx_min+1) *
> (y_idx_max-x_idx_min+1);
>
> ^^^
>
> should actually read:
>
> overlapping_cells = (x_idx_max-x_idx_min+1) *
> (y_idx_max-y_idx_min+1);
>
Fixed.
In the histogram builder the code goes (1355):
x_idx_min = (box->low.x-geomstats->xmin) / geow * bps;
if (x_idx_min <0) x_idx_min = 0;
if (x_idx_min >= bps) x_idx_min = bps-1;
We are at the second rows scan here. The first scan computed geomstats->xmin,
geomstats->xmax, geomstats->ymin, geomstats->ymax looking at each sample
bounding box. Doesn't this mean there is no need to check for boundary
overlflow ?? I'd remove them.
In the histogram evaluator (1094):
x_idx_max = (box->high.x-geomstats->xmin) / geow * bps;
if (x_idx_max <0)
{
// should increment the value somehow
x_idx_max = 0;
}
if (x_idx_max >= bps )
{
// should increment the value somehow
x_idx_max = bps-1;
}
Do you think it's worth actually incrementing the value ?
What kind of curve should we apply in that case ?
What we are detecting is that the query box goes outside the
histogram extent, we should find the values of approximate cells ?
The worst thing that can happen is that the estimator gives a smaller
estimation, which eventually could be 0.0. Would that estimation
just force the planner to make use of the index or would it make
it think there are no rows in the result ?
In the histogram evaluator:
I've re-introduced histogram cell value be divided
by the fraction of cell that really overlaps query box.
Keep up with the good support ! :)
--strk;
More information about the postgis-devel
mailing list