[postgis-users] Re: Re bug with GIST indexes

David Blasby dblasby at refractions.net
Wed Mar 12 11:13:50 PST 2003


>
>
>I'm not convinced that users should be expected to always add another
>clause to a select statement to return a result - the operator should
>return the correct result according to its definition instead of having
>to be filtered again!
>  
>
I agree with you 100% - but also remember that the operators (like "&&") 
work on bounding boxes *not* the actual geometries.  Pretty soon (as 
soon as GEOS gets integrated), we'll see routines like:

SELECT * FROM data WHERE data.mygeom && <geom> and intersects(data.geom, 
<geom>)

Meaning, use the index for a parital (BOX) solution, then the actual 
function to get the correct solution (GEOMETRY).

I've heard that they will be query re-write rules in postgresql 7.4 - 
have you looked into any of them?  I'm hoping that a user query like:

SELECT * FROM data WHERE intersects(data.geom, <geom>)
can get auto-magically turned into an indexible statement like:
SELECT * FROM data WHERE data.mygeom && <geom> and intersects(data.geom, 
<geom>)

>Yes, I agree with you on this one. I have a feeling that Oleg and
>Teodor's Rtree implementation is more textbook than practical. Each
>strategy should call a routine that tests consistency and returns the
>correct result. I would have no worries about replacing these with
>routines that we can guarantee will always produce the results we want
>for a given operator, and I wouldn't be surprised if it didn't work
>exactly as intended.
>  
>
Yes - we should test their implentation and send them feedback. 
 Unfortunately, there isnt
all that much documentation on the GiST index, and the rtree_gist 
example is the best...

>1. Alter the BOX3D type so that it contains a SRID field (and basically
>make users have to specify a SRID for the BOX3D when they create it)
>
>2. Alter the operator class so a BOX3D is returned instead of a box for
>the GiST entries (I'm guessing the implicit cast in the SQL file is used
>to cast geometries to box3ds for queries).
>  
>
I dont think the GiST index should be using 3D geometries - thats just 
adding 50% more information
to the index that will never be used.


How about something like:
typedef stuct
{
    int32         SRID;            // -1 if unknown
    double      xmin, ymin;
    double      xmax, ymax;
} BOX2D;

Its dead easy to create one of these from a GEOMETRY (they have a SRID 
and bounding volume), and
you can convert a BOX2D to GEOMETRY by creating one as "BBOXONLYTYPE".

>3. Keep the same algorithms for the GiST indexes but convert them to
>BOX3D (this ensures that any changes to the PostgreSQL BOX type won't
>break things!). Abstract the PostgreSQL interface from the GiST routines
>so then the plan is that only the interface routines would need to be
>rewritten between PostgreSQL revisions (i.e. the algorithms themselves
>would not have to be altered making them more robust). Perhaps set a
>compile time choice for either quadratic pick/split or linear pick/split
>to show how this adds more flexibility.
>  
>
Good idea.

>4. Rewrite the spatial operators so they use a common routine based on
>BOX3Ds regardless of whether the operator is indexed or not.
>  
>
This is another great idea, but its more subtle that you'd think.  
When you do something like:

SELECT * FROM table WHERE geo && 'BOX3D(...)';

Whats actually happening is this:
1. looks for GEOMETRY && BOX3D  operator (there isnt one)
2. converts BOX3D to a GEOEMTRY -"BBOXONLYTYPE"- and finds the GEOMETRY 
&& GEOMETRY operator
3. looks to see if GEOMETRY && GEOMETRY is indexable (it is) and 
determines if it should seqscan or indexscan

So, in the end you are not doing  BOX3D <op> BOX3D, you are actually 
doing a GEOMETRY <op> GEOMETRY

In the sequence scan case, it calls the GEOMETRY <op> GEOMETRY function 
directly.
In the index scan case, it "compresses" the GEOMETRY into a BOX then 
searches the index using BOX <op> BOX function (or BOX2D <op> BOX2D).

SELECT * FROM table WHERE geo::box3d && 'BOX3D(...)';

This will use the BOX3D <op> BOX3D routines (not indexed).

The postgis-for-7.1 version does ensure that SRIDs are respected (cf. 
GEOMETRYKEY - note that this is a variable length structure, as you were 
supposed to use under the old GiST API).

Another idea would be to completely remove the BOX3D type and just have 
GEOMETRY.  Suck all the functionality of  BOX3D into geometry, and make 
sure "BBOXONLYTYPE" is a first class citizen of GEOMETRY (currently its 
not).  You could then put these "BBOXONLYTYPE" geometries into the index 
(compress would convert a GEOMETRY into a "BBOXONLYTYPE" ) and use the 
same functions for the index and seq scan.  
Unfortunately, this is quite a bit of work and you're storing a bunch of 
extra junk in the index, but it does make things more consistent.

>I'm willing to put my money where my mouth is and invest some time and
>effort into this as I think it will make maintaining the codebase much
>easier, especially in terms of integrating the GEOS library. Would you
>guys be willing to accept a set of patches from me to implement these
>changes? 
>  
>
If you send patches, I'll apply them.

dave




More information about the postgis-users mailing list