[Geoserver-devel] [postgis-users] Re: Postgis estimated_extent completely off the mark?

Thu Mar 22 07:23:22 PDT 2007

Mark Cave-Ayland ha scritto:

> Yes, it's due to the way in which the sampling works. Note that you can
> increase the number of sampled rows using ALTER TABLE x ALTER COLUMN y
> SET STATISTICS z and then re-ANALYZING (the default value is 10, so
> perhaps a value of 100 would provide better results). I'm not sure where
> the figure of 5% from proper bounds comes from though - I would have
> imagined it depends on the sample size relative to the population size,
> but then I haven't studied statistics properly for several years now :(

Nice tip. I tried it out:
alter table major_roads alter column gen_full set statistics 1000;
vacuum analyze major_roads;

tiger2005fe=# select estimated_extent('major_roads','gen_full');
                               estimated_extent
----------------------------------------------------------------------------
  BOX(-136.358154296875 17.6865196228027,-64.5742340087891 59.6282997131348)

Sigh... the result is the same as before...
I tried setting a value bigger than 1000, the result is:

tiger2005fe=# alter table major_roads alter column gen_full set 
statistics 10000;
WARNING:  lowering statistics target to 1000
ALTER TABLE

It seems there is no way to make it close to being correct...
If any developer is curious and wants the data, I do have a compressed 
postgres dump of this table, it's around 125MB.
Cheers
Andrea