[postgis-devel] GeomUnion Performance Info

Wed Jun 22 04:35:17 PDT 2005

Hi Bill/strk,

> -----Original Message-----
> From: postgis-devel-bounces at postgis.refractions.net 
> [mailto:postgis-devel-bounces at postgis.refractions.net] On 
> Behalf Of strk at refractions.net
> Sent: 22 June 2005 10:36
> To: PostGIS Development Discussion
> Subject: Re: [postgis-devel] GeomUnion Performance Info

(cut)

> Index is not required, the ORDER BY will use operators 
> defined by the btree operator class.

Correct. I'm not exactly sure what the ordering is for Btree indices. I
thought that it was the area of the bounding box of the geometry, but
looking at PostGIS 1.0.1 source it appears the ordering criteria is based
upon the outermost edge of each geom. Can anyone clarify this criterion as
there are no comments in the code as to how this works?

> I belive speedup is due to GEOS algorithms taking less time 
> when coupled geometries have a certain spatial relation, I 
> don't know much about GiST cluster order, but it is not the 
> same of btree clusters, knowing which performs better would 
> be a step forward.

Yeah, I reckon that if you can create an ordering that can allow GEOS to
discard distinct spatial entities more quickly then you could get quite a
considerable performance improvement.

> > BTW: I tried to create a btree index on my parcels, but got this:
> > 
> > gis=# create index parcel_shape_btree ON parcels (parcel_shape);
> > ERROR:  index row requires 8452 bytes, maximum size is 8191
> > 
> > go figure.
> 
> Ach! I think we should visually see that btree order!

Hmmm, perhaps the btree index is trying to stuff the whole geometry into the
index key instead of just the bounding box? I'm not sure if there is
anything we can do about this...

> Actually... first thing would be evaluating difference
> between the two (GiST and btree) in terms of performance.
> 
> You should try to obtain two table w/out indexes
> One ordered by GiST, one by btree.

We must be very, very careful here. GiST indices are *not* ordered which
means once you get to a particular tree node, the keys are not ordered by
bounding box. Hence I would expect that if you CLUSTER a table whilst
feeding the input geometries in two different orders then it is likely the
resulting indices will *not* be identical since the page splits will occur
in different places causing the spatial regions to change.

Curiosity got the better of me and so I decided to have a quick look at the
source to see how the GiST is traversed during an index scan. From what I
can tell, it is a depth-first tree search. So for our Rtree structure we end
up with something like this (where non-leaf nodes are the minimum bounding
box containing all the geometries below):

					root
					 |
				----------------
				|              |
			   -------        -------
			   |     |        |     |
			  ---   ---      ---   ---
			  | |   | |      | |   | |
			  A B   C D      E F   G H

So in a depth first tree traversal, the order will be A B C D E F G H where
the pairs A,B C,D E,F G,H are spatially close. On a macroscopic scale you
would therefore expect the geometries to be divided roughly into a set of
boxes where the contents of each box are drawn at the same time, which seems
to agree quite nicely with Bill's GiST cluster layer on his website. But as
mentioned before, I would expect changing the order of the input geometries
would cause these regions to drift due to differences in page splitting.

Kind regards,

Mark.

------------------------
WebBased Ltd
17 Research Way
Tamar Science Park
Plymouth
PL6 8BT 

T: +44 (0)1752 797131
F: +44 (0)1752 791023
W: http://www.webbased.co.uk