[postgis-devel] RE: standard deviation based histogram extent reduction
'strk'
strk at keybit.net
Fri Jun 11 04:32:24 PDT 2004
On Fri, Jun 11, 2004 at 08:14:59AM +0100, Mark Cave-Ayland wrote:
> Hi strk,
>
> > -----Original Message-----
> > From: 'strk' [mailto:strk at keybit.net]
> > Sent: 10 June 2004 20:01
> > To: Mark Cave-Ayland
> > Cc: postgis-devel at postgis.refractions.net
> > Subject: Re: [postgis-devel] RE: standard deviation based
> > histogram extent reduction
> >
> >
> > I've added irregular sized histogram grid (cell aspect is
> > always nearly square keeping total cells near to requested precision).
> >
> > I'd like to have some test results before working on other
> > refinements. Just to make sure we are not introducing any bug.
> >
> > Thanks for you attention.
> >
> > --strk;
>
> The irregular sized histogram code looks good to me.
Actually I've found a bug in it. Now should be fixed.
>
> The only improvement I was suggesting was that instead of considering
> the cutoff rectangle as being the overall histogram extent, we should
> recalculate the histogram extent ignoring everything outside of this
> rectangle. This would have the result in most cases of bringing in the
> histogram extent much "tighter" around the dataset and hence increase
> the accuracy - other than changing the histogram extents, it won't
> change any of the existing code or methodology.
>
> Cheers,
>
> Mark.
I've committed the "improvement", togheter with handling of
infinite geometries.
Debugging output will report the three steps of histogram extent
definition: sample extent (sample_extent), standard deviation based
reduced extent (sd_histbox), new histogram extent after outliers
cut (histobox).
Number of examined features will also be reported to check how many
samples were cut-off (this is actually: outliers+nulls+infinite, but
its is easy to check - if you want finer report set DEBUG_GEOMETRY_STATS
to 2).
Here are a couple of tests with default stat target.
---
--- 20610 Multipolygons
---
$ grep best mpoly-NOsd # examined: 3000/3000
2 (best/worst/avg) 1.32 -2.68 +-1.97
4 (best/worst/avg) 0 -5.64 +-0.8
8 (best/worst/avg) 0 -5.09 +-0.29
16 (best/worst/avg) 0 -4.75 +-0.1
32 (best/worst/avg) 0 -4.08 +-0.04
$ grep best mpoly-sd # examined: 2759/3000
2 (best/worst/avg) 0.2 3.6 +-2.22
4 (best/worst/avg) 0 2.96 +-1.11
8 (best/worst/avg) 0 -2.79 +-0.41
16 (best/worst/avg) 0 -3 +-0.12
32 (best/worst/avg) 0 -3.21 +-0.04
---
--- 2125 Multilinestrings (too few to tell..)
---
$ grep best mline-NOsd # examined: 2125/2125
2 (best/worst/avg) -0.37 -2.72 +-1.41
4 (best/worst/avg) 0 -2.77 +-0.58
8 (best/worst/avg) 0 -2.72 +-0.17
16 (best/worst/avg) 0 -3.29 +-0.1
32 (best/worst/avg) 0 -4.94 +-0.07
$ grep best mline-sd # examined: 1913/2125
2 (best/worst/avg) 0.51 -5.97 +-2.84
4 (best/worst/avg) 0.04 -2.54 +-0.89
8 (best/worst/avg) 0 -2.44 +-0.35
16 (best/worst/avg) 0 -2.44 +-0.12
32 (best/worst/avg) 0 -2.72 +-0.06
--strk;
More information about the postgis-devel
mailing list