[Geoserver-devel] [postgis-users] Re: Postgis estimated_extent completely off the mark?

Paul Ramsey pramsey at refractions.net
Wed Mar 21 13:10:38 PDT 2007


On 21-Mar-07, at 1:03 PM, Mark Cave-Ayland wrote:

> On Tue, 2007-03-20 at 16:20 +0100, Andrea Aime wrote:
>> Paul Ramsey ha scritto:
>>> Right, sampling.
>>> Small enough that a random sample has a chance of missing them.
>>> Northern islands in Alaska, Hawaii, etc.
>>
>> So this is a feature, not a bug, apparently.
>> Heh, then the docs should say that estimated_extent is 5% off the
>> proper bounds if features are uniformly distributed in the actual
>> bounds :-)
>> If you have data with strange distribution patters (such as USA
>> states) better not rely on it.
>>
>> Cheers
>> Andrea
>
> Yes, it's due to the way in which the sampling works. Note that you  
> can
> increase the number of sampled rows using ALTER TABLE x ALTER COLUMN y
> SET STATISTICS z and then re-ANALYZING (the default value is 10, so
> perhaps a value of 100 would provide better results). I'm not sure  
> where
> the figure of 5% from proper bounds comes from though - I would have
> imagined it depends on the sample size relative to the population  
> size,
> but then I haven't studied statistics properly for several years  
> now :(

Counter-intuitively it is more tightly linked to the absolute size of  
the sample.  A 1000 person sample of a population of 10000 does not  
have a markedly higher performance than a 1000 person sample of a  
population of 1000000.

P



More information about the postgis-users mailing list