[postgis-devel] Re: estimates problems and 1.0.0 delay

strk at refractions.net strk at refractions.net
Mon Apr 18 04:59:47 PDT 2005


On Mon, Apr 18, 2005 at 12:46:11PM +0100, Mark Cave-Ayland wrote:
> Hi strk/Ron,
> 
> > -----Original Message-----
> > From: strk at refractions.net [mailto:strk at refractions.net] 
> > Sent: 18 April 2005 12:08
> > To: rm_postgis at cheapcomplexdevices.com; Mark Cave-Ayland 
> > (External); postgis-devel at postgis.refractions.net
> > Subject: estimates problems and 1.0.0 delay
> > 
> > 
> > I've committed Ron's patch, but making some tests I 
> > discovered some corner cases still unandled.
> > 
> > It is collapsed histogram extent, mostly due to standard 
> > deviation based hard deviants cut-off. Problem is that after 
> > cut-off of hard deviants we end up with a 0-size dimension on Y or X.
> > 
> > This is surely NOT something that happens on a normal 
> > database usage but I think we should farther inspect it and 
> > find a workaround. Sounds as a can o worms so I wouldn't open 
> > it before 1.0.0 which was planned for today, but I'd delay 
> > release until tomorrow evening
> > (CET) to allow a few more tests to be performed by Ron and 
> > eventually Mark (and myself, of course).
> 
> I agree, we could do with having a workaround on this. For example, what
> happens if you load a single point into a geometry table and then ANALYZE?
> Will that also produce a 0 sized dimension in Y and X? 

Yes, but the warn can is bigger... a truncated table results in
invalid stats to be produced (stats_valid set to false) while stats
queries still see the older statistic (should we define an invalid
format for statistics?). You can test the invalid stats by inserting
a single point, analyzing, removing the point, analyzing again and
calling estimated_extent().

> My current thinking would be to enforce a minimum (non-zero) histogram size,
> much along the lines that new tables default to returning 1000 rows until
> statistics information becomes available. Then if we find X or Y collapses
> to a zero dimension then we add some default (small) offsets and use this as
> the size of the X or Y histogram instead. This should only be an issue on
> small or artificial datasets, so if the estimates come out slightly less
> accurate for just these cases then we shouldn't have too much of a problem.

If the histogram box collapses due to hard-deviant removed maybe we should
put them back in. On the other hand if it is due to really lined-in
inputs we should be able to handle it in the estimate_selectivity code,
so handling in the estimate_selectivity code should be done first.

--strk;

> 
> 
> Kind regards,
> 
> Mark.
> 
> ------------------------
> WebBased Ltd
> South West Technology Centre
> Tamar Science Park
> Plymouth
> PL6 8BT 
> 
> T: +44 (0)1752 791021
> F: +44 (0)1752 791023
> W: http://www.webbased.co.uk
> 



More information about the postgis-devel mailing list