[postgis-devel] Re: geometry stats

Mark Cave-Ayland m.cave-ayland at webbased.co.uk
Mon Mar 1 06:00:44 PST 2004


Hi strk/Dave/Listers,

> -----Original Message-----
> From: postgis-devel-bounces at postgis.refractions.net 
> [mailto:postgis-devel-bounces at postgis.refractions.net] On 
> Behalf Of strk
> Sent: 27 February 2004 18:10
> To: David Blasby
> Cc: PostGIS Development Discussion
> Subject: Re: [postgis-devel] Re: geometry stats
> 
> 
> dblasby wrote:
> > I dont think there's ever been a report of '&&' (overlap) being 
> > broken.
> 
> I've seen with my eyes different result sets when using 
> sequencial scan vs index scan. The results became the same 
> when changing the call to box_overlap with a call to 
> pgbox_overlap (made coying from geometry_overlap.

I think that it would be quite hard for someone to have seen this IMHO.
All the initial testing I did using && was done using small datasets
(and hence a sequential scan), while the larger index scans contained so
many screenfuls of text it would be difficult to tell if the index were
returning more entries than were visible on the screen. I also imagine
most people are using mapserver, so extra geometries may be being
rendered outside of the visible screen area which we would be unaware of
anyway....
 
> > I dont think the original postgresql rtree implementations
> > (rtree_box_overlap()) are incorrect (the postgis ones are 
> based on them 
> > and I checked them for mistakes).
> > Its more likely in the rtree_internal_consistent() function. 
> 
> rtree_internal_consistent() calls pvbox_overlap (previoulsy 
> box_overlap) for both RTContainedByStrategyNumber and 
> RTOverlapStrategyNumber. Could this be the problem ?

I can't see how a box_overlap() function can be used for overlap AND
contained_by() since they are different predicates.... let me check
contrib/rtree_gist......

Interesting.... if you look at the code, they do use the box_*()
functions for the index scans - BUT only for points! It looks like
polygons have a separate operator class which uses the RECHECK operator
(see rtree_gist.sql).

So perhaps what is happening is that the plain box_*() functions only
work as described for points, otherwise they will return extra data
which is filtered by the RECHECK in Oleg/Teodor's implementation (sigh).
This would mean as long as box_overlap() returns as a minimum the same
results as an contained_by() query then we never see the difference and
the database returns rows that are just discarded as soon as they are
retrieved from the index!

> > We should
> > test oleg and teodora's polygon gist rtree implementation 
> to see if it 
> > correctly handles index searches like "@@" (etc).  Then we 
> can have them 
> > check it out and patch our implementation since its pretty much 
> > *exactly* like theirs.
> 
> rtree_internal_consistent() calls *actually* theirs (apart 
> from pgbox_overlap just introduced).
> 
> > I'd like to see this fixed, but its a very rarely used 
> feature to put
> > too much work into fixing where they can do equivelent queries like:
> > 
> > SELECT * FROM <table> WHERE the_geom && <box> AND
> > geometry_contains(<box>, the_geom) AND contains(<box>, the_geom);
> 
> Again, the bug I've seen was on the the_geom && <box> alone.
> I don't have notice of other bugs... but I can suppose there 
> are other cases. We can just wait for bug reports and take no 
> further step now.

In summary I believe that the PostGIS index operators may be broken for
non-point geometries. To solve this would not take much developer time
at all and will nail these types of problems for once and for all. The
plan should be something like:


1) Write our own equivalent of all the Rtree strategy bounding box
checking routines, 
   similar to what strk and I have done.

2) Call the same function from the sequential operator and the index
operator so they
   both return the same geometries. This will prevent PostgreSQL from
returning extra
   geometries that will get discarded straight away by the next phase -
no point in
   making the DB do more work than it has to.

3) Add the RECHECK clause to the operator class; as per a discussion a
while back between 
   myself and Dave. This means that once the index tuples have been
fetched, they will
   also be passed through the sequential operator. This means that
performing indexed
   indexed queries should now throw an error if the query rectangle and
the geometry have 
   different SRIDs, instead of the current behaviour which is undefined.


Comments?

Mark.

---

Mark Cave-Ayland
Webbased Ltd.
Tamar Science Park
Derriford
Plymouth
PL6 8BX
England

Tel: +44 (0)1752 764445
Fax: +44 (0)1752 764446


This email and any attachments are confidential to the intended
recipient and may also be privileged. If you are not the intended
recipient please delete it from your system and notify the sender. You
should not copy it or use it for any purpose nor disclose or distribute
its contents to any other person.





More information about the postgis-devel mailing list