[postgis-users] Re: Re bug with GIST indexes

David Blasby dblasby at refractions.net
Mon Mar 10 10:48:19 PST 2003


Mark,

Thanks for taking a look at our indexing functions.

I've always tried to keep the PostGIS implementation of GiST as close to 
Oleg and Teodor's (the GiST maintainers) sample implementation for the 
native-postgresql-polygon (contrib/rtree_gist).  This is mostly because 
they keep radically changing what GiST looks like every version of 
Postgresql - causing a rewrite of the PostGIS GiST support code.  Plus, 
if we make a bug report we can easily give the report in terms of their 
code as apposed to PostGIS code.

The original PostGIS-gist-for-7.0 and 7.1 indexes used:

typedef struct geomkey {
        int32 size; /* size in varlena terms */
        BOX     key;
        int32   SRID; //spatial reference system identifier
} GEOMETRYKEY;

for the acutal indexing stucture.  This structure would throw an error 
for miss-matched SRIDs (cf. postgis_gist_71.c).  I belive the original 
postigs-for-7.2 GiST index used this structure.  When postgis-for-7.3 
was made, the 7.2 codebase was superceeded.  
There were a few problems with GiST during postgresql 7.2  and 7.3 (and 
there are currently discussion of possible bugs in 7.4), plus they 
radically changed implementation details.  

This, of course, caused us no end of troubles!  Thats why we just gave 
up trying to do anything special (like SRID-in-the-index) and opt for 
the most-like-the-sample-code-as-humanly-possible approach.  As you've 
noted, we convert the postgis geometry to the 
old-school-native-postgresql BOX structure and use this in the index.

It is possible that the rtree_gist sample implementation is incorrect 
(as you've noted) - the "consistent routine" has always looked fishy to 
me.  
I believe that its just returning an appoximation to the "@" or "~" 
solution.  I bet you're supposed to do:

SELECT * FROM table WHERE a @ b AND geometry_contains(a,b);

the "@" actives the index and returns a partial solution and the 
geometry_contains() provides the full solution.  Theoretically, it 
should be returning the correct solution, but I think the code in 
rtree_internal_consistent() might be incorrect - but there is not enough 
documentation to determine that.  All the sample GiST implemenation seem 
to make the same mistake.

I must be completely honest and say that I've never used the "@" or "~" 
operators.  The only index operator I use is the "&&" (overlap) 
operator.  Its been well tested and there are no known problems with it.  

I've always found the consistency function to be fishy because it doesnt 
seem to be calling the correct functions -

  switch(strategy) {
    case RTLeftStrategyNumber:
    case RTOverLeftStrategyNumber:
      retval = **  box_overleft
      break;
    case RTOverlapStrategyNumber:
      retval = ** box_overlap
      break;
    case RTOverRightStrategyNumber:
    case RTRightStrategyNumber:
      retval =  ** box_right
      break;
    case RTSameStrategyNumber:
    case RTContainsStrategyNumber:
      retval = ** box_contain
      break;
    case RTContainedByStrategyNumber:
      retval =** box_overlap
      break;
    }

For example, for strategy  "RTLeftStrategyNumber" you'd think it would 
be calling box_left() NOT box_overleft().  Strategy 
"RTContainedByStrategyNumber" calls box_overlap() [box_overlap() returns 
true if the boxes overlap at all] instead of the more logical 
box_contain();

TO DO:
1. create table with native postgresql polygons in it and the 
contrib/rtree_gist index.  Ensure the same problem with "@" and "~" 
operators occur with it.
2. contact teodor and oleg and ask them to explain the 
rtree_internal_consistent() function and ensure its doing what it's 
supposed to.  Send them sample SQL to show problems in #1.  They'll 
probably modify the "consistent" function and we'll wrap the change into 
postgis.

I am in agreement with you to use all postgis code for the indexing, 
both so we dont have to worry about postgresql changes biting us and 
having extras like SRID aware index operatations.  But, I am a bit 
hesitant to go back to indexing with a GEOMETRYKEY-type structure 
instead of BOX because I'm sure we'll need to do another re-write for 
7.4 and 7.5 (and probably 7.6).

I'm quite busy right now, so if someone could do #1, I'll contact teodor 
and oleg and wrap their changes into PostGIS.

dave

NB:  IF YOU'RE JUST USING THE && OPERATOR, EVERYTHING IS WORKING FINE 
FOR YOU - DONT PANIC.   THIS ONLY EFFECTS THE VERY FEW PEOPLE USING THE 
"@" AND "~" OPERATORS.






More information about the postgis-users mailing list