[postgis-devel] Re: geometry stats
David Blasby
dblasby at refractions.net
Mon Mar 1 10:34:04 PST 2004
There's been a lot of discussion on Indexing, I thought I'd put my 2c in.
1. I think changing the actual indexed type (currently BOX) to something
else (ie. BOX3D or BOX_WITH_SRID) isnt worth it. We're going to be
moving from BOX to BOX2DFLOAT4 when we move to LWGEOM. The entire
GEOMETRY type will be going into various states of deprication as we
"pump-up" LWGEOM.
Although a BOX_WITH_SRID seems like a good idea (PostGIS for 7.1 used
this), it actually a big waste of time and space. Putting the RECHECK
operator in catch SRID foul-up should be "good enough".
2. Its interesting that oleg and teodor use the RECHECK operator to
catch the problems with the 'contains' operator. This leads me to
believe that the GiST rtree implementation cannot really handle this
type of query correctly. I referenced the rtree_internal_consistent()
function (the one with the CASE RTOverLeftStrategyNumber etc...) - this
is where I think its screwing up. I'll see if I can look at this more
closely today. Has anyone checked their implementation of GiST Rtree to
see if it works correctly for contains? If it doesnt, we should report
it back to them. The PostGIS rtree is as-close-as-humanly-possible to
the GiST Rtree code so we can easily move any patches they make over to
postgis.
3. I believe the RECHECK operator is basically doing this:
SELECT * FROM <table> WHERE the_geom @~ <box> AND
geometry_contains(<box>, the_geom);
I must admit, I have *never* used any operator other than '&&'. As I
said before, I'll check to make sure our GiST code is correct, but using
the RECHECK operator isnt a bad option. If there is an actual problem
with some of the functions, we can implement our own in PostGIS.
4. I'll also add some testing code (probably a little java program) to
test the index out.
I.e. create two identical tables, one with index, one with out.
(SELECT id FROM <table_no_index> WHERE the_geom && <box> )
EXCEPT
(SELECT id FROM <table_with_index> WHERE the_geom && <box> )
Should always give you a zero-length result. This is a test of
completeness (all actually overlapping rows get returned). If this
every returns a non-zero rows result, then we know the indexing schema
is screwed. The other side of the test would be to check to see if
*only* actual overlapping rows get returned.
(SELECT id FROM <table_with_index> WHERE the_geom && <box> )
EXCEPT
(SELECT id FROM <table_no_index> WHERE the_geom && <box> )
With the currect indexing, this should also give you nothing. When
we move to the BOX2DFLOAT4-based index, there maybe a few extra rows
returned because the FLOAT4 (vs FLOAT8) box will be slightly bigger.
NOTE: with RECHECK on, the above query will always return zero rows.
dave
More information about the postgis-devel
mailing list