[postgis-users] Re: Re bug with GIST indexes
David Blasby
dblasby at refractions.net
Mon Mar 10 10:48:19 PST 2003
Mark,
Thanks for taking a look at our indexing functions.
I've always tried to keep the PostGIS implementation of GiST as close to
Oleg and Teodor's (the GiST maintainers) sample implementation for the
native-postgresql-polygon (contrib/rtree_gist). This is mostly because
they keep radically changing what GiST looks like every version of
Postgresql - causing a rewrite of the PostGIS GiST support code. Plus,
if we make a bug report we can easily give the report in terms of their
code as apposed to PostGIS code.
The original PostGIS-gist-for-7.0 and 7.1 indexes used:
typedef struct geomkey {
int32 size; /* size in varlena terms */
BOX key;
int32 SRID; //spatial reference system identifier
} GEOMETRYKEY;
for the acutal indexing stucture. This structure would throw an error
for miss-matched SRIDs (cf. postgis_gist_71.c). I belive the original
postigs-for-7.2 GiST index used this structure. When postgis-for-7.3
was made, the 7.2 codebase was superceeded.
There were a few problems with GiST during postgresql 7.2 and 7.3 (and
there are currently discussion of possible bugs in 7.4), plus they
radically changed implementation details.
This, of course, caused us no end of troubles! Thats why we just gave
up trying to do anything special (like SRID-in-the-index) and opt for
the most-like-the-sample-code-as-humanly-possible approach. As you've
noted, we convert the postgis geometry to the
old-school-native-postgresql BOX structure and use this in the index.
It is possible that the rtree_gist sample implementation is incorrect
(as you've noted) - the "consistent routine" has always looked fishy to
me.
I believe that its just returning an appoximation to the "@" or "~"
solution. I bet you're supposed to do:
SELECT * FROM table WHERE a @ b AND geometry_contains(a,b);
the "@" actives the index and returns a partial solution and the
geometry_contains() provides the full solution. Theoretically, it
should be returning the correct solution, but I think the code in
rtree_internal_consistent() might be incorrect - but there is not enough
documentation to determine that. All the sample GiST implemenation seem
to make the same mistake.
I must be completely honest and say that I've never used the "@" or "~"
operators. The only index operator I use is the "&&" (overlap)
operator. Its been well tested and there are no known problems with it.
I've always found the consistency function to be fishy because it doesnt
seem to be calling the correct functions -
switch(strategy) {
case RTLeftStrategyNumber:
case RTOverLeftStrategyNumber:
retval = ** box_overleft
break;
case RTOverlapStrategyNumber:
retval = ** box_overlap
break;
case RTOverRightStrategyNumber:
case RTRightStrategyNumber:
retval = ** box_right
break;
case RTSameStrategyNumber:
case RTContainsStrategyNumber:
retval = ** box_contain
break;
case RTContainedByStrategyNumber:
retval =** box_overlap
break;
}
For example, for strategy "RTLeftStrategyNumber" you'd think it would
be calling box_left() NOT box_overleft(). Strategy
"RTContainedByStrategyNumber" calls box_overlap() [box_overlap() returns
true if the boxes overlap at all] instead of the more logical
box_contain();
TO DO:
1. create table with native postgresql polygons in it and the
contrib/rtree_gist index. Ensure the same problem with "@" and "~"
operators occur with it.
2. contact teodor and oleg and ask them to explain the
rtree_internal_consistent() function and ensure its doing what it's
supposed to. Send them sample SQL to show problems in #1. They'll
probably modify the "consistent" function and we'll wrap the change into
postgis.
I am in agreement with you to use all postgis code for the indexing,
both so we dont have to worry about postgresql changes biting us and
having extras like SRID aware index operatations. But, I am a bit
hesitant to go back to indexing with a GEOMETRYKEY-type structure
instead of BOX because I'm sure we'll need to do another re-write for
7.4 and 7.5 (and probably 7.6).
I'm quite busy right now, so if someone could do #1, I'll contact teodor
and oleg and wrap their changes into PostGIS.
dave
NB: IF YOU'RE JUST USING THE && OPERATOR, EVERYTHING IS WORKING FINE
FOR YOU - DONT PANIC. THIS ONLY EFFECTS THE VERY FEW PEOPLE USING THE
"@" AND "~" OPERATORS.
More information about the postgis-users
mailing list