[postgis-devel] Re: geometry stats

David Blasby dblasby at refractions.net
Mon Mar 1 10:34:04 PST 2004


There's been a lot of discussion on Indexing, I thought I'd put my 2c in.

1. I think changing the actual indexed type (currently BOX) to something 
else (ie. BOX3D or BOX_WITH_SRID) isnt worth it.  We're going to be 
moving from BOX to BOX2DFLOAT4 when we move to LWGEOM.  The entire 
GEOMETRY type will be going into various states of deprication as we 
"pump-up" LWGEOM.
Although a BOX_WITH_SRID seems like a good idea (PostGIS for 7.1 used 
this), it actually a big waste of time and space.  Putting the RECHECK 
operator in catch SRID foul-up should be "good enough".

2. Its interesting that oleg and teodor use the RECHECK operator to 
catch the problems with the 'contains' operator.  This leads me to 
believe that the GiST rtree implementation cannot really handle this 
type of query correctly.  I referenced the rtree_internal_consistent() 
function (the one with the CASE RTOverLeftStrategyNumber etc...) - this 
is where I think its screwing up.  I'll see if I can look at this more 
closely today.  Has anyone checked their implementation of GiST Rtree to 
see if it works correctly for contains?  If it doesnt, we should report 
it back to them.  The PostGIS rtree is as-close-as-humanly-possible to 
the GiST Rtree code so we can easily move any patches they make over to 
postgis.

3. I believe the RECHECK operator is basically doing this:

SELECT * FROM <table> WHERE the_geom @~ <box> AND 
geometry_contains(<box>, the_geom);

I must admit, I have *never* used any operator other than '&&'.  As I 
said before, I'll check to make sure our GiST code is correct, but using 
the RECHECK operator isnt a bad option.  If there is an actual problem 
with some of the functions, we can implement our own in PostGIS.

4.  I'll also add some testing code (probably a little java program) to 
test the index out.
     I.e. create two identical tables, one with index, one with out.

    (SELECT id FROM <table_no_index> WHERE the_geom && <box> )
     EXCEPT
    (SELECT id FROM <table_with_index> WHERE the_geom && <box> )

    Should always give you a zero-length result.  This is a test of 
completeness (all actually overlapping rows get returned).  If this 
every returns a non-zero rows result, then we know the indexing schema 
is screwed.  The other side of the test would be to check to see if 
*only* actual overlapping rows get returned.

  (SELECT id FROM <table_with_index> WHERE the_geom && <box> )
     EXCEPT
  (SELECT id FROM <table_no_index> WHERE the_geom && <box> )

   With the currect indexing, this should also give you nothing.  When 
we  move to the BOX2DFLOAT4-based index, there maybe a few extra rows 
returned because the FLOAT4 (vs FLOAT8) box will be slightly bigger.
NOTE: with RECHECK on, the above query will always return zero rows.

dave







More information about the postgis-devel mailing list