[postgis-devel] GeomUnion Performance Info

strk at refractions.net strk at refractions.net
Wed Jun 22 02:35:45 PDT 2005


On Tue, Jun 21, 2005 at 10:59:43PM -0400, bill wrote:
> strk at refractions.net wrote:

...

> >Watch out: ORDER BY will use btree index, not the GiST one.
> 
> Actually, I do not have a btree index on parcel_shape.  Does having a 
> GiST index implicitely create one? 

Index is not required, the ORDER BY will use operators defined by
the btree operator class.

> I can't imagine that a manual sort 
> of the stored strings (without an index) would speed things up.

I belive speedup is due to GEOS algorithms taking less time
when coupled geometries have a certain spatial relation,
I don't know much about GiST cluster order, but it is not
the same of btree clusters, knowing which performs better
would be a step forward.

> >Again will use btree index, not GiST.
> >It would be interesting to understand WHAT makes this difference exactly.
> 
> When I manually do a 'cluster on', it does order by the GiST, correct?

cluster on btree_index <-- order by btree
cluster on gist_index <-- order by GiST

You might try both cases.

> >Maybe the ordered case would rather ensure there would be always work
> >for the unioner as consecutively ordered geometries would probably
> >intersect.

This is my guess as well, anyway the kind of intersection also
makes the difference. I think geoms that intersect in  many places
take more time.
Imagine a geometry containing all the others (geom A):
	- output would be geom A
	- union of geom A with any geometry would not take much
	  (this is a guess, no intersections are found but only
	   full containment)
	- other geometries might have a lot of intersection which
	  would take more time to compute (again, a guess)
	- The sooner geom A is considered, the better the query goes.

You could make tests for this as well:

	- define a geometry being
	  the envelope() of full table extent and assign it a
	  very high gid (geom A)
	- geomunion the table so that geom A comes last
	- geomunion the table so that geom A comes first

...
> One thought is that using ANY index eliminates the need to load the rest 
> of the columns (or is only the bbox stored in the index?).  I'm not sure 
> about that on TOAST tables, but it seems to make sense on non-spatial 
> data, perhaps it translates.

I think we should not mix access costs and union costs.
We are analyzing union costs related to order of feed.

> >	- adding short-cuts in postgis union aggregate
> >	  bypassing GEOS in the disjoint-bboxes case

> Just on a lark, I've colorized two maps based on the order they were 
> being sorted (black are drawn first, white last): the first is after a 
> cluster on parcel_shape_idx (at GiST index), and the second is on the 
> random order.  You can see that the GiST clustered order is clearly 
> geographically clustered.  You can access it at my map testing page at 
> http://www.mapshine.com/mapsurfer/  You can use the username/password 
> mapdev/GDAL if you'd like to play around with it.  Just turn on the 
> layers "GIST Cluster" and "Random Cluster" at the left.
> 
> BTW: I tried to create a btree index on my parcels, but got this:
> 
> gis=# create index parcel_shape_btree ON parcels (parcel_shape);
> ERROR:  index row requires 8452 bytes, maximum size is 8191
> 
> go figure.

Ach! I think we should visually see that btree order!
Can you avoid clustering and just use ORDER BY ?

Actually... first thing would be evaluating difference
between the two (GiST and btree) in terms of performance.

You should try to obtain two table w/out indexes
One ordered by GiST, one by btree.
Yes, tables are NOT ordered (teoretically) but I'm sure
you can obtain that... SELECT into btree_ordered .. ORDER BY ...
cluster; select into gist_ordered ...
something like that, compare geomunion performances and show
them visually.

> Do I need to update GEOS or Postgis (or both) from CVS to get the 
> bounding box shortcuts?

GEOS from 2-1 branch (if you are in 2.x version already you need no
rebuilt of postgis)

> 
> Thanks
> Bill

Thank you!

--strk;



More information about the postgis-devel mailing list