[postgis-users] Re: Re bug with GIST indexes

Wed Mar 12 03:17:03 PST 2003

> -----Original Message-----
> From: David Blasby [mailto:dblasby at refractions.net]
> Sent: 10 March 2003 18:48
> To: Mark Cave-Ayland; postgis-users at postgis.refractions.net
> Subject: Re: [postgis-users] Re: Re bug with GIST indexes
> 
> 
> Mark,
> 
> Thanks for taking a look at our indexing functions.

No problem. It turned out to be an interesting exercise to reassure
myself that the spatial indexing wasn't broken!

> I've always tried to keep the PostGIS implementation of GiST
> as close to 
> Oleg and Teodor's (the GiST maintainers) sample 
> implementation for the 
> native-postgresql-polygon (contrib/rtree_gist).  This is 
> mostly because 
> they keep radically changing what GiST looks like every version of 
> Postgresql - causing a rewrite of the PostGIS GiST support 
> code.  Plus, 
> if we make a bug report we can easily give the report in 
> terms of their 
> code as apposed to PostGIS code.
> 
> The original PostGIS-gist-for-7.0 and 7.1 indexes used:
> 
> typedef struct geomkey {
>         int32 size; /* size in varlena terms */
>         BOX     key;
>         int32   SRID; //spatial reference system identifier
> } GEOMETRYKEY;
> 
> for the acutal indexing stucture.  This structure would throw
> an error 
> for miss-matched SRIDs (cf. postgis_gist_71.c).  I belive the 
> original 
> postigs-for-7.2 GiST index used this structure.  When postgis-for-7.3 
> was made, the 7.2 codebase was superceeded.  
> There were a few problems with GiST during postgresql 7.2  
> and 7.3 (and 
> there are currently discussion of possible bugs in 7.4), plus they 
> radically changed implementation details.  
> 
> This, of course, caused us no end of troubles!  Thats why we
> just gave 
> up trying to do anything special (like SRID-in-the-index) and opt for 
> the most-like-the-sample-code-as-humanly-possible approach.  
> As you've 
> noted, we convert the postgis geometry to the 
> old-school-native-postgresql BOX structure and use this in the index.
> 
> It is possible that the rtree_gist sample implementation is incorrect
> (as you've noted) - the "consistent routine" has always 
> looked fishy to 
> me.  
> I believe that its just returning an appoximation to the "@" or "~" 
> solution.  I bet you're supposed to do:
> 
> SELECT * FROM table WHERE a @ b AND geometry_contains(a,b);
> 
> the "@" actives the index and returns a partial solution and the
> geometry_contains() provides the full solution.  Theoretically, it 
> should be returning the correct solution, but I think the code in 
> rtree_internal_consistent() might be incorrect - but there is 
> not enough 
> documentation to determine that.  All the sample GiST 
> implemenation seem 
> to make the same mistake.

I'm not convinced that users should be expected to always add another
clause to a select statement to return a result - the operator should
return the correct result according to its definition instead of having
to be filtered again!

> I must be completely honest and say that I've never used the
> "@" or "~" 
> operators.  The only index operator I use is the "&&" (overlap) 
> operator.  Its been well tested and there are no known 
> problems with it.  
> 
> I've always found the consistency function to be fishy
> because it doesnt 
> seem to be calling the correct functions -
> 
>   switch(strategy) {
>     case RTLeftStrategyNumber:
>     case RTOverLeftStrategyNumber:
>       retval = **  box_overleft
>       break;
>     case RTOverlapStrategyNumber:
>       retval = ** box_overlap
>       break;
>     case RTOverRightStrategyNumber:
>     case RTRightStrategyNumber:
>       retval =  ** box_right
>       break;
>     case RTSameStrategyNumber:
>     case RTContainsStrategyNumber:
>       retval = ** box_contain
>       break;
>     case RTContainedByStrategyNumber:
>       retval =** box_overlap
>       break;
>     }
> 
> For example, for strategy  "RTLeftStrategyNumber" you'd think
> it would 
> be calling box_left() NOT box_overleft().  Strategy 
> "RTContainedByStrategyNumber" calls box_overlap() 
> [box_overlap() returns 
> true if the boxes overlap at all] instead of the more logical 
> box_contain();

Yes, I agree with you on this one. I have a feeling that Oleg and
Teodor's Rtree implementation is more textbook than practical. Each
strategy should call a routine that tests consistency and returns the
correct result. I would have no worries about replacing these with
routines that we can guarantee will always produce the results we want
for a given operator, and I wouldn't be surprised if it didn't work
exactly as intended.

> TO DO:
> 1. create table with native postgresql polygons in it and the
> contrib/rtree_gist index.  Ensure the same problem with "@" and "~" 
> operators occur with it.
> 2. contact teodor and oleg and ask them to explain the 
> rtree_internal_consistent() function and ensure its doing what it's 
> supposed to.  Send them sample SQL to show problems in #1.  They'll 
> probably modify the "consistent" function and we'll wrap the 
> change into 
> postgis.
> 
> I am in agreement with you to use all postgis code for the indexing,
> both so we dont have to worry about postgresql changes biting us and 
> having extras like SRID aware index operatations.  But, I am a bit 
> hesitant to go back to indexing with a GEOMETRYKEY-type structure 
> instead of BOX because I'm sure we'll need to do another re-write for 
> 7.4 and 7.5 (and probably 7.6).

I see what you mean but as PostgreSQL develops then there will be API
changes that come in that will require rewriting code. The key, of
course, is to minimise the amount of work you have to do to keep PostGIS
working!

After a day or so of thinking, this is what I thought would be a good
plan of action to minimise the work required in maintaining this part of
PostGIS:

1. Alter the BOX3D type so that it contains a SRID field (and basically
make users have to specify a SRID for the BOX3D when they create it)

2. Alter the operator class so a BOX3D is returned instead of a box for
the GiST entries (I'm guessing the implicit cast in the SQL file is used
to cast geometries to box3ds for queries).

3. Keep the same algorithms for the GiST indexes but convert them to
BOX3D (this ensures that any changes to the PostgreSQL BOX type won't
break things!). Abstract the PostgreSQL interface from the GiST routines
so then the plan is that only the interface routines would need to be
rewritten between PostgreSQL revisions (i.e. the algorithms themselves
would not have to be altered making them more robust). Perhaps set a
compile time choice for either quadratic pick/split or linear pick/split
to show how this adds more flexibility.

4. Rewrite the spatial operators so they use a common routine based on
BOX3Ds regardless of whether the operator is indexed or not.

I'm willing to put my money where my mouth is and invest some time and
effort into this as I think it will make maintaining the codebase much
easier, especially in terms of integrating the GEOS library. Would you
guys be willing to accept a set of patches from me to implement these
changes? 

Cheers,

Mark.

---

Mark Cave-Ayland
Webbased Ltd.
Tamar Science Park
Derriford
Plymouth
PL6 8BX
England

Tel: +44 (0)1752 764445
Fax: +44 (0)1752 764446

This email and any attachments are confidential to the intended
recipient and may also be privileged. If you are not the intended
recipient please delete it from your system and notify the sender. You
should not copy it or use it for any purpose nor disclose or distribute
its contents to any other person.