[postgis-devel] RE: standard deviation based histogram extent reduction

'strk' strk at keybit.net
Fri Jun 11 04:32:24 PDT 2004


On Fri, Jun 11, 2004 at 08:14:59AM +0100, Mark Cave-Ayland wrote:
> Hi strk,
> 
> > -----Original Message-----
> > From: 'strk' [mailto:strk at keybit.net] 
> > Sent: 10 June 2004 20:01
> > To: Mark Cave-Ayland
> > Cc: postgis-devel at postgis.refractions.net
> > Subject: Re: [postgis-devel] RE: standard deviation based 
> > histogram extent reduction
> > 
> > 
> > I've added irregular sized histogram grid (cell aspect is 
> > always nearly square keeping total cells near to requested precision).
> > 
> > I'd like to have some test results before working on other 
> > refinements. Just to make sure we are not introducing any bug.
> > 
> > Thanks for you attention.
> > 
> > --strk;
> 
> The irregular sized histogram code looks good to me.

Actually I've found a bug in it. Now should be fixed.

> 
> The only improvement I was suggesting was that instead of considering
> the cutoff rectangle as being the overall histogram extent, we should
> recalculate the histogram extent ignoring everything outside of this
> rectangle. This would have the result in most cases of bringing in the
> histogram extent much "tighter" around the dataset and hence increase
> the accuracy - other than changing the histogram extents, it won't
> change any of the existing code or methodology.
> 
> Cheers,
> 
> Mark.

I've committed the "improvement", togheter with handling of
infinite geometries.

Debugging output will report the three steps of histogram extent
definition: sample extent (sample_extent), standard deviation based
reduced extent (sd_histbox), new histogram extent after outliers
cut (histobox).

Number of examined features will also be reported to check how many
samples were cut-off (this is actually: outliers+nulls+infinite, but
its is easy to check - if you want finer report set DEBUG_GEOMETRY_STATS
to 2).

Here are a couple of tests with default stat target.

  ---
  --- 20610 Multipolygons 
  ---
  
  $ grep best mpoly-NOsd # examined: 3000/3000
      2   (best/worst/avg)        1.32    -2.68   +-1.97
      4   (best/worst/avg)        0       -5.64   +-0.8
      8   (best/worst/avg)        0       -5.09   +-0.29
      16  (best/worst/avg)        0       -4.75   +-0.1
      32  (best/worst/avg)        0       -4.08   +-0.04
  
  $ grep best mpoly-sd  # examined: 2759/3000
      2   (best/worst/avg)        0.2     3.6     +-2.22
      4   (best/worst/avg)        0       2.96    +-1.11
      8   (best/worst/avg)        0       -2.79   +-0.41
      16  (best/worst/avg)        0       -3      +-0.12
      32  (best/worst/avg)        0       -3.21   +-0.04

  --- 
  --- 2125 Multilinestrings (too few to tell..)
  --- 
  
  $ grep best mline-NOsd # examined: 2125/2125
      2   (best/worst/avg)        -0.37   -2.72   +-1.41
      4   (best/worst/avg)        0       -2.77   +-0.58
      8   (best/worst/avg)        0       -2.72   +-0.17
      16  (best/worst/avg)        0       -3.29   +-0.1
      32  (best/worst/avg)        0       -4.94   +-0.07
  
  $ grep best mline-sd # examined: 1913/2125
      2   (best/worst/avg)        0.51    -5.97   +-2.84
      4   (best/worst/avg)        0.04    -2.54   +-0.89
      8   (best/worst/avg)        0       -2.44   +-0.35
      16  (best/worst/avg)        0       -2.44   +-0.12
      32  (best/worst/avg)        0       -2.72   +-0.06

 

--strk;



More information about the postgis-devel mailing list