[postgis-devel] RE: standard deviation based histogram extent reduction

'strk' strk at keybit.net
Thu Jun 24 12:51:32 PDT 2004


Welcome aboard ! 

--strk;

On Thu, Jun 24, 2004 at 11:28:43AM +0100, Mark Cave-Ayland wrote:
> Hi strk,
> 
> Cheers for this! The experiments last week were leading me towards make
> the filter less strong by moving to a value around 3.25 which stills
> cuts out extreme outliers. Unfortunately I'm working on another project
> so I don't get too much time to play with PostGIS at the moment...
> however, I've got a CVS account so I can commit the change myself when I
> get time to save you bandwidth. I'll let you know when I can start
> playing again :)
> 
> 
> Kind regards,
> 
> Mark.
> 
> ---
> 
> Mark Cave-Ayland
> Webbased Ltd.
> Tamar Science Park
> Derriford
> Plymouth
> PL6 8BX
> England
> 
> Tel: +44 (0)1752 764445
> Fax: +44 (0)1752 764446
> 
> 
> This email and any attachments are confidential to the intended
> recipient and may also be privileged. If you are not the intended
> recipient please delete it from your system and notify the sender. You
> should not copy it or use it for any purpose nor disclose or distribute
> its contents to any other person.
> 
> > -----Original Message-----
> > From: 'strk' [mailto:strk at keybit.net] 
> > Sent: 22 June 2004 17:53
> > To: Mark Cave-Ayland
> > Cc: postgis-devel at postgis.refractions.net
> > Subject: Re: [postgis-devel] RE: standard deviation based 
> > histogram extent reduction
> > 
> > 
> > I've made standard deviation multiplication factor a 
> > compile-time define. You can play with SDFACTOR right below 
> > USE_STANDARD_DEVIATION. Let me know...
> > 
> > --strk;
> > 
> > On Fri, Jun 18, 2004 at 11:31:18AM +0100, Mark Cave-Ayland wrote:
> > > Hi strk,
> > > 
> > > I'm currently working on other projects so I haven't had a 
> > chance to 
> > > play with the standard deviation stats code as much as I would 
> > > like.... however it works really well at filtering erroneus 
> > values out 
> > > of the histogram :)
> > > 
> > > I loaded in a couple of Tiger shapefiles of about 10,000 
> > geometries, 
> > > analyzed, tried some queries, added some extra features at 
> > a distance 
> > > from the main data set and repeated. The extra features had 
> > no effect 
> > > on this histogram and everything continued to work as it should - 
> > > except the test_estimation.pl script which returned strange results 
> > > because it did a SELECT extent(the_geom).... and therefore included 
> > > the extra features when dividing the extent into n by n boxes....
> > > 
> > > The only thing I did find was that the current error was a 
> > little too 
> > > high for my liking; I found better results by increasing 
> > the threshold 
> > > to 3 standard deviations from the mean. This should put the error 
> > > somewhere around 0.3% and had the effect of reducing the difference 
> > > between sample_extent and sd_histbox for me, especially on smaller 
> > > datasets. I would be interested to compare results with other users 
> > > who have experimented with PG 7.5 and CVS to see whether they agree 
> > > with these results.
> > > 
> > > 
> > > Cheers,
> > > 
> > > Mark.
> > > 
> > > ---
> > > 
> > > Mark Cave-Ayland
> > > Webbased Ltd.
> > > Tamar Science Park
> > > Derriford
> > > Plymouth
> > > PL6 8BX
> > > England
> > > 
> > > Tel: +44 (0)1752 764445
> > > Fax: +44 (0)1752 764446
> > > 
> > > 
> > > This email and any attachments are confidential to the intended 
> > > recipient and may also be privileged. If you are not the intended 
> > > recipient please delete it from your system and notify the 
> > sender. You 
> > > should not copy it or use it for any purpose nor disclose or 
> > > distribute its contents to any other person.
> > > 
> > > > -----Original Message-----
> > > > From: 'strk' [mailto:strk at keybit.net]
> > > > Sent: 11 June 2004 12:32
> > > > To: Mark Cave-Ayland
> > > > Cc: postgis-devel at postgis.refractions.net
> > > > Subject: Re: [postgis-devel] RE: standard deviation based 
> > > > histogram extent reduction
> > > > 
> > > > 
> > > > On Fri, Jun 11, 2004 at 08:14:59AM +0100, Mark Cave-Ayland wrote:
> > > > > Hi strk,
> > > > > 
> > > > > > -----Original Message-----
> > > > > > From: 'strk' [mailto:strk at keybit.net]
> > > > > > Sent: 10 June 2004 20:01
> > > > > > To: Mark Cave-Ayland
> > > > > > Cc: postgis-devel at postgis.refractions.net
> > > > > > Subject: Re: [postgis-devel] RE: standard deviation based
> > > > > > histogram extent reduction
> > > > > > 
> > > > > > 
> > > > > > I've added irregular sized histogram grid (cell 
> > aspect is always 
> > > > > > nearly square keeping total cells near to
> > > > requested precision).
> > > > > > 
> > > > > > I'd like to have some test results before working on other 
> > > > > > refinements. Just to make sure we are not introducing any bug.
> > > > > > 
> > > > > > Thanks for you attention.
> > > > > > 
> > > > > > --strk;
> > > > > 
> > > > > The irregular sized histogram code looks good to me.
> > > > 
> > > > Actually I've found a bug in it. Now should be fixed.
> > > > 
> > > > > 
> > > > > The only improvement I was suggesting was that instead of
> > > > considering
> > > > > the cutoff rectangle as being the overall histogram extent,
> > > > we should
> > > > > recalculate the histogram extent ignoring everything
> > > > outside of this
> > > > > rectangle. This would have the result in most cases of
> > > > bringing in the
> > > > > histogram extent much "tighter" around the dataset and
> > > > hence increase
> > > > > the accuracy - other than changing the histogram 
> > extents, it won't
> > > > > change any of the existing code or methodology.
> > > > > 
> > > > > Cheers,
> > > > > 
> > > > > Mark.
> > > > 
> > > > I've committed the "improvement", togheter with handling of
> > > > infinite geometries.
> > > > 
> > > > Debugging output will report the three steps of histogram extent
> > > > definition: sample extent (sample_extent), standard deviation
> > > > based reduced extent (sd_histbox), new histogram extent after 
> > > > outliers cut (histobox).
> > > > 
> > > > Number of examined features will also be reported to check
> > > > how many samples were cut-off (this is actually: 
> > > > outliers+nulls+infinite, but its is easy to check - if you
> > > > want finer report set DEBUG_GEOMETRY_STATS to 2).
> > > > 
> > > > Here are a couple of tests with default stat target.
> > > > 
> > > >   ---
> > > >   --- 20610 Multipolygons 
> > > >   ---
> > > >   
> > > >   $ grep best mpoly-NOsd # examined: 3000/3000
> > > >       2   (best/worst/avg)        1.32    -2.68   +-1.97
> > > >       4   (best/worst/avg)        0       -5.64   +-0.8
> > > >       8   (best/worst/avg)        0       -5.09   +-0.29
> > > >       16  (best/worst/avg)        0       -4.75   +-0.1
> > > >       32  (best/worst/avg)        0       -4.08   +-0.04
> > > >   
> > > >   $ grep best mpoly-sd  # examined: 2759/3000
> > > >       2   (best/worst/avg)        0.2     3.6     +-2.22
> > > >       4   (best/worst/avg)        0       2.96    +-1.11
> > > >       8   (best/worst/avg)        0       -2.79   +-0.41
> > > >       16  (best/worst/avg)        0       -3      +-0.12
> > > >       32  (best/worst/avg)        0       -3.21   +-0.04
> > > > 
> > > >   --- 
> > > >   --- 2125 Multilinestrings (too few to tell..)
> > > >   ---
> > > >   
> > > >   $ grep best mline-NOsd # examined: 2125/2125
> > > >       2   (best/worst/avg)        -0.37   -2.72   +-1.41
> > > >       4   (best/worst/avg)        0       -2.77   +-0.58
> > > >       8   (best/worst/avg)        0       -2.72   +-0.17
> > > >       16  (best/worst/avg)        0       -3.29   +-0.1
> > > >       32  (best/worst/avg)        0       -4.94   +-0.07
> > > >   
> > > >   $ grep best mline-sd # examined: 1913/2125
> > > >       2   (best/worst/avg)        0.51    -5.97   +-2.84
> > > >       4   (best/worst/avg)        0.04    -2.54   +-0.89
> > > >       8   (best/worst/avg)        0       -2.44   +-0.35
> > > >       16  (best/worst/avg)        0       -2.44   +-0.12
> > > >       32  (best/worst/avg)        0       -2.72   +-0.06
> > > > 
> > > >  
> > > > 
> > > > --strk;
> > > > 
> > > 
> > 
> 



More information about the postgis-devel mailing list