[postgis-devel] GeomUnion Performance Info
strk at refractions.net
strk at refractions.net
Wed Jun 22 02:35:45 PDT 2005
On Tue, Jun 21, 2005 at 10:59:43PM -0400, bill wrote:
> strk at refractions.net wrote:
...
> >Watch out: ORDER BY will use btree index, not the GiST one.
>
> Actually, I do not have a btree index on parcel_shape. Does having a
> GiST index implicitely create one?
Index is not required, the ORDER BY will use operators defined by
the btree operator class.
> I can't imagine that a manual sort
> of the stored strings (without an index) would speed things up.
I belive speedup is due to GEOS algorithms taking less time
when coupled geometries have a certain spatial relation,
I don't know much about GiST cluster order, but it is not
the same of btree clusters, knowing which performs better
would be a step forward.
> >Again will use btree index, not GiST.
> >It would be interesting to understand WHAT makes this difference exactly.
>
> When I manually do a 'cluster on', it does order by the GiST, correct?
cluster on btree_index <-- order by btree
cluster on gist_index <-- order by GiST
You might try both cases.
> >Maybe the ordered case would rather ensure there would be always work
> >for the unioner as consecutively ordered geometries would probably
> >intersect.
This is my guess as well, anyway the kind of intersection also
makes the difference. I think geoms that intersect in many places
take more time.
Imagine a geometry containing all the others (geom A):
- output would be geom A
- union of geom A with any geometry would not take much
(this is a guess, no intersections are found but only
full containment)
- other geometries might have a lot of intersection which
would take more time to compute (again, a guess)
- The sooner geom A is considered, the better the query goes.
You could make tests for this as well:
- define a geometry being
the envelope() of full table extent and assign it a
very high gid (geom A)
- geomunion the table so that geom A comes last
- geomunion the table so that geom A comes first
...
> One thought is that using ANY index eliminates the need to load the rest
> of the columns (or is only the bbox stored in the index?). I'm not sure
> about that on TOAST tables, but it seems to make sense on non-spatial
> data, perhaps it translates.
I think we should not mix access costs and union costs.
We are analyzing union costs related to order of feed.
> > - adding short-cuts in postgis union aggregate
> > bypassing GEOS in the disjoint-bboxes case
> Just on a lark, I've colorized two maps based on the order they were
> being sorted (black are drawn first, white last): the first is after a
> cluster on parcel_shape_idx (at GiST index), and the second is on the
> random order. You can see that the GiST clustered order is clearly
> geographically clustered. You can access it at my map testing page at
> http://www.mapshine.com/mapsurfer/ You can use the username/password
> mapdev/GDAL if you'd like to play around with it. Just turn on the
> layers "GIST Cluster" and "Random Cluster" at the left.
>
> BTW: I tried to create a btree index on my parcels, but got this:
>
> gis=# create index parcel_shape_btree ON parcels (parcel_shape);
> ERROR: index row requires 8452 bytes, maximum size is 8191
>
> go figure.
Ach! I think we should visually see that btree order!
Can you avoid clustering and just use ORDER BY ?
Actually... first thing would be evaluating difference
between the two (GiST and btree) in terms of performance.
You should try to obtain two table w/out indexes
One ordered by GiST, one by btree.
Yes, tables are NOT ordered (teoretically) but I'm sure
you can obtain that... SELECT into btree_ordered .. ORDER BY ...
cluster; select into gist_ordered ...
something like that, compare geomunion performances and show
them visually.
> Do I need to update GEOS or Postgis (or both) from CVS to get the
> bounding box shortcuts?
GEOS from 2-1 branch (if you are in 2.x version already you need no
rebuilt of postgis)
>
> Thanks
> Bill
Thank you!
--strk;
More information about the postgis-devel
mailing list