[postgis-devel] geometry stats

strk strk at keybit.net
Mon Feb 23 04:58:48 PST 2004


Hi Dave,
I've prepared and committed a skeleton for
PG75 integrated stats support.

What is done is:

	1) histogram creation based on 300*attstatstarget
	   sample rows (if available).

	2) null fraction computation

	3) average width of column values:
	   SUM(samplegeom->size) / not_null samples 

What remains to do now is:

	1) fill the histogram values (float4).

	2) estimate the histogram.

I've seen the current code is pretty fuzzy, and I've experienced
this kind of algos to require many iterations on fine-tuning.

I'd like to move the 'tunable' parts in the estimator, keeping the
builder as strict as possible.

Since we will use float instead of integers, we could use a number
in the range 0-1 to express the factor of overlapping between a
sample feature's box and an histogram cell. Currently 1 is added to the cell
value if at least 5% of a feature overlaps it (correct me if I'm wrong).
Finally we should 'normalize' the histogram dividing the value of each
cell by the number of not-null (or total) samples handled.  This should
give a tune-free histogram, what do you think? Mark?

Then the estimator will need a change too... but I'd like to discuss
this later.

--strk;




More information about the postgis-devel mailing list