[postgis-devel] GeomUnion Performance Info
strk at refractions.net
strk at refractions.net
Tue Jun 21 14:37:39 PDT 2005
On Sun, Jun 19, 2005 at 04:46:48PM -0400, bill wrote:
...
> First, the reason I had seen a performance increase was that the
> GeomUnion implementation is dramatically influenced by the _order_ the
> shapes are fed to it. The query I had used had a where clause that used
> the index, so the shapes had returned "ordered by" the spatial index --
> that is, clustered spatially. I realised this when I looked at the
> source, and confirmed it by unioning the same shapes in the following ways:
>
> 1) As I received them from the county (which, as you will see has some
> order to it)
> 2) Ordered by the spatial index (order by parcel_shape)
Watch out: ORDER BY will use btree index, not the GiST one.
> I realized that I was adding overhead by making the sort happen on each
> iteration, so I removed the "order by" and did a "cluster on" to
> actually reorder the rows on disk, and added these tests:
>
> 3) Ordered by a random column that I added, indexed, and clustered on
> 4) Ordered by the spatial index, but again, clustered on disk
>
> (I also tried the "MemGeomUnion" which doesn't try to load all of the
> shapes into memory, but the impact of that change was trivial -- and
> negative)
>
> I've put up the results of the test (just an Excel export) on
> http://www.mapshine.net/PerfData/
>
> I'd be interested in any feedback. The basic results can be summarized as:
>
> 1) For todays PostGIS 1.x, the best GeomUnion results will on a table
> clustered on a GIST Spatial Index. that is:
> pgsql> create index shape_idx on tablename GIST (shape); -- vaccum as needed
> pgsql> cluster shape_idx on tablename; -- wierd syntax (it looks backwards)
>
> If you cannot do that, then creating a subselect that sorts the items
> before performing the shape:
>
> pgsql> select GeomUnion(shape) from (select shape from tablename order
> by shape) as stupidPGalias;
Again will use btree index, not GiST.
It would be interesting to understand WHAT makes this difference exactly.
> I also think that there is room for some pretty dramatic performance
> improvements in the PostGIS GeomUnion code. Even assuming that the GEOS
> Union operation for two shapes is optimal (a big assumption), there are
> far too many Union operations being done that are unnecessary. For
> example, we (at the PostGIS level) generally have BBoxes that can be
> used to cull the number of Unions that are needed. I believe this could
> make a HUGE difference in the random addition case, and a significant
> improvement even in the ordered case.
Maybe the ordered case would rather ensure there would be always work
for the unioner as consecutively ordered geometries would probably
intersect.
> If it doesn't step on anyone's toes, I'll plan on putting together a
> test that shows what I'm thinking.
>
> Hope this helps, and let me know what you think.
Sure it will help, so we have 2 lines:
- understanding what kind of order helps union
- adding short-cuts in postgis union aggregate
bypassing GEOS in the disjoint-bboxes case
--strk;
More information about the postgis-devel
mailing list