[postgis-devel] Cube Indexes

Mon Aug 14 04:12:50 PDT 2006

On Fri, 2006-08-11 at 11:38 -0700, Paul Ramsey wrote:
> What would it take to enable multi-dimensional indexes as an option  
> on postgis geometries?
> contrib/cube already does it, can we?
> P

Hi Paul,

As far as I can tell looking at the contrib/cube code, the way it works
is that instead of having a fixed size structure like BOX2DFLOAT4, they
have introduced an equivalent variable length structure called NDBOX
(N-dimensional box) to store information in the index. While there are
fewer operators in the contrib/cube opclass, operators such as overlap
(&&) have been extended to use N dimensions with N-dimensional distance
being calculated using standard Pythagorean distances.

To add this functionality into PostGIS you'd have to do something along
the following lines:

	* Create a new N-dimensional structure (BOXNDFLOAT4) to replace
	BOX2DFLOAT4

	* Alter existing operators to use N-dimensional maths; also
	think about routines such as Distance() etc.

	* Consider adding indexable operators for the 3rd dimension
	operators of in front / behind. I'm not sure about
	continually adding operators for more dimensions as the meaning
	tends to be lost as the number of dimensions increases, however
	contrib/cube supports the &&, @ and ~ operators for N 
	dimensions.

	* Change the GiST index functions to use N-dimensional routines,
	in particular for union/split; consider the B-Tree index class
	for sorting order etc.

	* Alter the statistics collector to collect information about
	N-dimensions and the join/selectivity estimators to use this
	new information

Using contrib/cube as a starting point would help, but it's still a lot
of work since just enhancing BOX2DFLOAT4 to larger dimensions would
touch most places in the source tree. Also moving to a variable length
structure will add an additional 8 bytes (two integers) to each index
entry so the size of a standard 2D index entry would increase by 50%.

In short: it could be done, but it would affect a large part of the code
tree and standard 2D indices would become 50% larger. As a consequence
of changing a large part of the code tree, the testing during coding is
likely to take up a large proportion of the time, although the new
regression tests should help with this somewhat.

Kind regards,

Mark.