[postgis-devel] geometry stats
strk
strk at keybit.net
Mon Feb 23 04:58:48 PST 2004
Hi Dave,
I've prepared and committed a skeleton for
PG75 integrated stats support.
What is done is:
1) histogram creation based on 300*attstatstarget
sample rows (if available).
2) null fraction computation
3) average width of column values:
SUM(samplegeom->size) / not_null samples
What remains to do now is:
1) fill the histogram values (float4).
2) estimate the histogram.
I've seen the current code is pretty fuzzy, and I've experienced
this kind of algos to require many iterations on fine-tuning.
I'd like to move the 'tunable' parts in the estimator, keeping the
builder as strict as possible.
Since we will use float instead of integers, we could use a number
in the range 0-1 to express the factor of overlapping between a
sample feature's box and an histogram cell. Currently 1 is added to the cell
value if at least 5% of a feature overlaps it (correct me if I'm wrong).
Finally we should 'normalize' the histogram dividing the value of each
cell by the number of not-null (or total) samples handled. This should
give a tune-free histogram, what do you think? Mark?
Then the estimator will need a change too... but I'd like to discuss
this later.
--strk;
More information about the postgis-devel
mailing list