[Geoserver-devel] [postgis-users] Re: Postgis estimated_extent completely off the mark?
Paul Ramsey
pramsey at refractions.net
Wed Mar 21 13:10:38 PDT 2007
On 21-Mar-07, at 1:03 PM, Mark Cave-Ayland wrote:
> On Tue, 2007-03-20 at 16:20 +0100, Andrea Aime wrote:
>> Paul Ramsey ha scritto:
>>> Right, sampling.
>>> Small enough that a random sample has a chance of missing them.
>>> Northern islands in Alaska, Hawaii, etc.
>>
>> So this is a feature, not a bug, apparently.
>> Heh, then the docs should say that estimated_extent is 5% off the
>> proper bounds if features are uniformly distributed in the actual
>> bounds :-)
>> If you have data with strange distribution patters (such as USA
>> states) better not rely on it.
>>
>> Cheers
>> Andrea
>
> Yes, it's due to the way in which the sampling works. Note that you
> can
> increase the number of sampled rows using ALTER TABLE x ALTER COLUMN y
> SET STATISTICS z and then re-ANALYZING (the default value is 10, so
> perhaps a value of 100 would provide better results). I'm not sure
> where
> the figure of 5% from proper bounds comes from though - I would have
> imagined it depends on the sample size relative to the population
> size,
> but then I haven't studied statistics properly for several years
> now :(
Counter-intuitively it is more tightly linked to the absolute size of
the sample. A 1000 person sample of a population of 10000 does not
have a markedly higher performance than a 1000 person sample of a
population of 1000000.
P
More information about the postgis-users
mailing list