[postgis-users] Re: Cascaded Union Aggregate function
Obe, Regina
robe.dnd at cityofboston.gov
Wed Oct 8 06:05:55 PDT 2008
Jose Carlos,
This looks interesting. I just did a quick glance and looks like you are using a couple of the key concepts
Martin had outlined and I tried to implement in the make-shift aggregate function.
1) things whose bbox are disjoint from everything else just collect (admittedly if intersects were faster, I could maybe use that)
2) recursively iteratively union things close together
The above I just determine with a bounding box check. A true intersect check proved to be too slow and also I'm not really using indexes
affectively. As you said a true topology structure would probably help in that regard and with using the resulting geometry, though not sure how much better that would be than the spatial index that exists already.
I'm basically just sorting the geometries by bounding box which admittedly is not that great, but that's because I don't really understand how spatial indexes work at the moment. Sorting by centroid or snaptogrid seemed too costly in my tests. Sometimes the bounding box sort helps and sometimes it makes things slightly worse (I think that accounts for the differences in timings of Dane's tests between my earlier and new).
Also once it gets into the aggregate array, I think I loose all the power of the already existing spatial index :(.
There is also a not so negligable penalty paid with dynamically growing an array in Postgres. In earlier studies -
I determined st_unite_garray(ARRAY(SELECT ....)) or upgis_unitecascade_garray_sort(ARRAY(SELECT ...)) is sometimes much faster for large numbers of geometries than using the aggregate wrapper,
but the aggregate wrapper is so much more flexible to use that I just settle for it. I suspect its because the ARRAY(SELECT ...) construct allocates the size of the array first and then dumps stuff into it vs. the aggregate approach constantly growing the array as new geometries come in. So I suppose if there were a way to somehow figure out size to allocate and allocate (for larger numbers of geometries, it would be more useful).
For small numbers of geometries in each group, this doesn't seem to be too much of a penalty.
Thanks,
Regina
-----Original Message-----
From: postgis-users-bounces at postgis.refractions.net [mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of Jose Carlos Martinez Llario
Sent: Wednesday, October 08, 2008 5:06 AM
To: postgis-users at postgis.refractions.net
Subject: [postgis-users] Re: Cascaded Union Aggregate function
Hi,
I want to tell you about the work i did a couple of years ago about the
same topics with the hope it can help you.
I tried to improve the union operator to be able to carry out the
'dissolve spatial operation'. Im a cartographer and I noticed that we
can not use PostGIS in a direct way for dissolving a layer with 10.000
geometries or more because it takes a long time and it needs a lot of
resources for working.
I wrote a paper (sorry about the english...) explaining the work i did
(unfortunately i need to write papers because of my job...i hate that of
course).
In the introduction of the article i did some test about cascade union.
I got around 10 times faster results with the cascade union.
In the second part i proposed an algorithm to dissolve just the adjacent
polygons. For example in a cadastral dataset starting from parcels to
get lots boundaries. I got around 100 times faster. In total around 1000
times faster than just using the union aggregate with the whole dataset.
In the article a test removed the shared boundaries in a layer with 500
000 polygons. It took around 1000 secs in a kind of old computer.
Geometries The whole aggregate Cascade (i
called it 'fragmented') Just adjacent polygons
Real case
10 000 1 600
s around 100
s < 15 s
30 000 i couldnt test
it around 1000 s
around 50 s
Cadastral simulation (lots with 10 parcels)
80 000 parcels i couldnt test it
around 5000 s around 100 s
500 000 parcels i couldnt test it
i couldnt test it around
1000 s
Look in the article because the datasets have special geometry
configuration.
I think the union operation can be really fast in a system with
persistent topology. Im looking forward to try this with the current
implementation of topology in PostGIS. I would like to know if someone
is checking or implementing these kind of operations in PostGIs using
topology.
I hope it can be useful for the PostGIS community,
The link of the article is:
http://cartosig.upv.es/es/system/files/jomarlla/article-martinez-llario-2.pdf
Cheers,
José Carlos
_______________________________________________
postgis-users mailing list
postgis-users at postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users
-----------------------------------------
The substance of this message, including any attachments, may be
confidential, legally privileged and/or exempt from disclosure
pursuant to Massachusetts law. It is intended
solely for the addressee. If you received this in error, please
contact the sender and delete the material from any computer.
More information about the postgis-users
mailing list