[postgis-users] Re: Re bug with GIST indexes

Mon Mar 17 07:24:42 PST 2003

> -----Original Message-----
> From: David Blasby [mailto:dblasby at refractions.net] 
> Sent: 12 March 2003 19:14
> To: Mark Cave-Ayland; postgis-users at postgis.refractions.net
> Subject: Re: [postgis-users] Re: Re bug with GIST indexes
> 
> 
> >
> >
> >I'm not convinced that users should be expected to always 
> add another 
> >clause to a select statement to return a result - the 
> operator should 
> >return the correct result according to its definition 
> instead of having 
> >to be filtered again!
> >  
> >
> I agree with you 100% - but also remember that the operators 
> (like "&&") 
> work on bounding boxes *not* the actual geometries.  Pretty soon (as 
> soon as GEOS gets integrated), we'll see routines like:
> 
> SELECT * FROM data WHERE data.mygeom && <geom> and 
> intersects(data.geom, 
> <geom>)
> 
> Meaning, use the index for a parital (BOX) solution, then the actual 
> function to get the correct solution (GEOMETRY).

I don't disagree with the format of the query - for the example above I
can see why that would be completely valid. But in the case of the @
operator it's verbal definition would have to be written as 'roughly
indicates whether A contains B' and then having to verify with another
function just seems plain wrong...

> I've heard that they will be query re-write rules in postgresql 7.4 - 
> have you looked into any of them?  I'm hoping that a user query like:
> 
> SELECT * FROM data WHERE intersects(data.geom, <geom>)
> can get auto-magically turned into an indexible statement 
> like: SELECT * FROM data WHERE data.mygeom && <geom> and 
> intersects(data.geom, 
> <geom>)

No, although that does look interesting, espectially if it could be
controlled by entries that could be inserted by PostGIS into the system
tables.

> >1. Alter the BOX3D type so that it contains a SRID field 
> (and basically 
> >make users have to specify a SRID for the BOX3D when they create it)
> >
> >2. Alter the operator class so a BOX3D is returned instead 
> of a box for 
> >the GiST entries (I'm guessing the implicit cast in the SQL file is 
> >used to cast geometries to box3ds for queries).
> >  
> >
> I dont think the GiST index should be using 3D geometries - 
> thats just 
> adding 50% more information
> to the index that will never be used.
> 
> 
> How about something like:
> typedef stuct
> {
>     int32         SRID;            // -1 if unknown
>     double      xmin, ymin;
>     double      xmax, ymax;
> } BOX2D;
> 
> Its dead easy to create one of these from a GEOMETRY (they 
> have a SRID 
> and bounding volume), and
> you can convert a BOX2D to GEOMETRY by creating one as "BBOXONLYTYPE".

Yes, this would make sense. However, what is the support for PostGIS 3D
spaces like? Should the BOX3D be deprecated for BOX2D? It doesn't look
difficult to extend the R-Tree code into 3 dimensions if people would
want to use spatially indexed 3D queries.... But looks like JTS doesn't
do 3D :(

> >3. Keep the same algorithms for the GiST indexes but convert them to 
> >BOX3D (this ensures that any changes to the PostgreSQL BOX 
> type won't 
> >break things!). Abstract the PostgreSQL interface from the GiST 
> >routines so then the plan is that only the interface routines would 
> >need to be rewritten between PostgreSQL revisions (i.e. the 
> algorithms 
> >themselves would not have to be altered making them more robust). 
> >Perhaps set a compile time choice for either quadratic pick/split or 
> >linear pick/split to show how this adds more flexibility.
>
> Good idea.

Yes, I hope to make this fundamental to the re-write. Less effort to
upgrade for newer versions is good! :)

> >4. Rewrite the spatial operators so they use a common 
> routine based on 
> >BOX3Ds regardless of whether the operator is indexed or not.
> >  
> >
> This is another great idea, but its more subtle that you'd think.  
> When you do something like:
> 
> SELECT * FROM table WHERE geo && 'BOX3D(...)';
> 
> Whats actually happening is this:
> 1. looks for GEOMETRY && BOX3D  operator (there isnt one)
> 2. converts BOX3D to a GEOEMTRY -"BBOXONLYTYPE"- and finds 
> the GEOMETRY 
> && GEOMETRY operator
> 3. looks to see if GEOMETRY && GEOMETRY is indexable (it is) and 
> determines if it should seqscan or indexscan
> 
> So, in the end you are not doing  BOX3D <op> BOX3D, you are actually 
> doing a GEOMETRY <op> GEOMETRY
> 
> In the sequence scan case, it calls the GEOMETRY <op> 
> GEOMETRY function 
> directly.
> In the index scan case, it "compresses" the GEOMETRY into a BOX then 
> searches the index using BOX <op> BOX function (or BOX2D <op> BOX2D).
> 
> SELECT * FROM table WHERE geo::box3d && 'BOX3D(...)';
> 
> This will use the BOX3D <op> BOX3D routines (not indexed).
> 
> The postgis-for-7.1 version does ensure that SRIDs are respected (cf. 
> GEOMETRYKEY - note that this is a variable length structure, 
> as you were 
> supposed to use under the old GiST API).
> 
> Another idea would be to completely remove the BOX3D type and 
> just have 
> GEOMETRY.  Suck all the functionality of  BOX3D into 
> geometry, and make 
> sure "BBOXONLYTYPE" is a first class citizen of GEOMETRY 
> (currently its 
> not).  You could then put these "BBOXONLYTYPE" geometries 
> into the index 
> (compress would convert a GEOMETRY into a "BBOXONLYTYPE" ) 
> and use the 
> same functions for the index and seq scan.  
> Unfortunately, this is quite a bit of work and you're storing 
> a bunch of 
> extra junk in the index, but it does make things more consistent.

Hmmm, I see what you mean.... But this is actually the beginning of a
very cunning plan. How about modifying Compress() to convert geometries
to BOX2Ds by taking their bounding box, and modifying Decompress() to
convert BOX2Ds back into BBOXONLY geometries?

This then would mean that the index entries would be stored on the disk
as BOX2Ds taking minimal space, while it would still be possible for the
Consistent() routines to call functions that take two geometries as a
parameter, plus it gets rid of the problem above where you would be
storing excess information if you were directly storing the GEOMETRY
type. The only thing I can see here is that every BOX2D would still need
to have a constant SRID stored on disk which seems wrong given that the
value will be constant for every index... Is there no way we could
extract this information from the geometry_columns table given the
information in the GiST entry structure? But I guess this would slow
things down a bit?

> If you send patches, I'll apply them.

Thanks Dave!

Cheers,

Mark.

---

Mark Cave-Ayland
Webbased Ltd.
Tamar Science Park
Derriford
Plymouth
PL6 8BX
England

Tel: +44 (0)1752 764445
Fax: +44 (0)1752 764446

This email and any attachments are confidential to the intended
recipient and may also be privileged. If you are not the intended
recipient please delete it from your system and notify the sender. You
should not copy it or use it for any purpose nor disclose or distribute
its contents to any other person.