[postgis-devel] Re: estimates problems and 1.0.0 delay
strk at refractions.net
strk at refractions.net
Mon Apr 18 04:59:47 PDT 2005
On Mon, Apr 18, 2005 at 12:46:11PM +0100, Mark Cave-Ayland wrote:
> Hi strk/Ron,
>
> > -----Original Message-----
> > From: strk at refractions.net [mailto:strk at refractions.net]
> > Sent: 18 April 2005 12:08
> > To: rm_postgis at cheapcomplexdevices.com; Mark Cave-Ayland
> > (External); postgis-devel at postgis.refractions.net
> > Subject: estimates problems and 1.0.0 delay
> >
> >
> > I've committed Ron's patch, but making some tests I
> > discovered some corner cases still unandled.
> >
> > It is collapsed histogram extent, mostly due to standard
> > deviation based hard deviants cut-off. Problem is that after
> > cut-off of hard deviants we end up with a 0-size dimension on Y or X.
> >
> > This is surely NOT something that happens on a normal
> > database usage but I think we should farther inspect it and
> > find a workaround. Sounds as a can o worms so I wouldn't open
> > it before 1.0.0 which was planned for today, but I'd delay
> > release until tomorrow evening
> > (CET) to allow a few more tests to be performed by Ron and
> > eventually Mark (and myself, of course).
>
> I agree, we could do with having a workaround on this. For example, what
> happens if you load a single point into a geometry table and then ANALYZE?
> Will that also produce a 0 sized dimension in Y and X?
Yes, but the warn can is bigger... a truncated table results in
invalid stats to be produced (stats_valid set to false) while stats
queries still see the older statistic (should we define an invalid
format for statistics?). You can test the invalid stats by inserting
a single point, analyzing, removing the point, analyzing again and
calling estimated_extent().
> My current thinking would be to enforce a minimum (non-zero) histogram size,
> much along the lines that new tables default to returning 1000 rows until
> statistics information becomes available. Then if we find X or Y collapses
> to a zero dimension then we add some default (small) offsets and use this as
> the size of the X or Y histogram instead. This should only be an issue on
> small or artificial datasets, so if the estimates come out slightly less
> accurate for just these cases then we shouldn't have too much of a problem.
If the histogram box collapses due to hard-deviant removed maybe we should
put them back in. On the other hand if it is due to really lined-in
inputs we should be able to handle it in the estimate_selectivity code,
so handling in the estimate_selectivity code should be done first.
--strk;
>
>
> Kind regards,
>
> Mark.
>
> ------------------------
> WebBased Ltd
> South West Technology Centre
> Tamar Science Park
> Plymouth
> PL6 8BT
>
> T: +44 (0)1752 791021
> F: +44 (0)1752 791023
> W: http://www.webbased.co.uk
>
More information about the postgis-devel
mailing list