[postgis-devel] RE: [HACKERS] join selectivity
Mark Cave-Ayland
m.cave-ayland at webbased.co.uk
Thu Dec 16 10:51:14 PST 2004
Hi Tom,
> -----Original Message-----
> From: Tom Lane [mailto:tgl at sss.pgh.pa.us]
> Sent: 16 December 2004 17:56
> To: Mark Cave-Ayland
> Cc: strk at refractions.net; pgsql-hackers at postgresql.org;
> postgis-devel at postgis.refractions.net
> Subject: Re: [HACKERS] join selectivity
>
>
> "Mark Cave-Ayland" <m.cave-ayland at webbased.co.uk> writes:
> > OK I think I've misunderstood something more fundamental
> than that; I
> > understood from what you said that the RESTRICT clause is used to
> > evaluate the cost of table1.geom && table2.geom against
> table2.geom &&
> > table1.geom (i.e. it is used to help decide which one should be seq
> > scanned and which should be index scanned in a nested loop
> node). So
> > is the trick here for a commutative operator to simply
> return the same
> > value for both cases, as other factors such as index size costs are
> > considered elsewhere?
>
> If the operator is commutative then the result should be too.
> Really you should not be thinking about costs at all when
> coding a selectivity
> estimator: its charter is to estimate how many rows will
> match the condition, not to estimate costs per se.
>
> Note however that these aren't really the "same case", as
> you'd be referencing two different columns with presumably
> different statistics.
Well at the moment PostGIS has a RESTRICT function that takes an expression
of the form <column> <op> <constant> where column is a column consisting of
geometries and constant is a bounding box. This is based upon histogram
statistics and works well.
The surprise came when writing the JOIN function and finding that the
RESTRICT clause was being called. Now I understand that this is part of the
nested loop and not the JOIN so that helps. But in the case of <column> <op>
<unknown constant>, if we're estimating the number of rows to return then
that becomes harder - I'm thinking pick a rectangle half the area of the
statistical rectangle for the column and return the number of rows within
that area.
> You should probably read the existing selectivity estimators
> in utils/adt/selfuncs.c. There's a fair amount of
> infrastructure code in that file that you could borrow.
> (It's not currently exported because it tends to change from
> version to version, but maybe we could think about making
> some of the routines global.)
OK will try and find some inspiration within.
Many thanks,
Mark.
------------------------
WebBased Ltd
South West Technology Centre
Tamar Science Park
Plymouth
PL6 8BT
T: +44 (0)1752 791021
F: +44 (0)1752 791023
W: http://www.webbased.co.uk
More information about the postgis-devel
mailing list