[postgis-users] Re: Cascaded Union Aggregate function

Obe, Regina robe.dnd at cityofboston.gov
Wed Oct 8 06:05:55 PDT 2008


Jose Carlos,

This looks interesting.  I just did a quick glance and looks like you are using a couple of the key concepts
Martin had outlined and I tried to implement in the make-shift aggregate function.

1) things whose bbox are disjoint from everything else just collect (admittedly if intersects were faster, I could maybe use that)
2) recursively iteratively union things close together

The above I just determine with a bounding box check.  A true intersect check proved to be too slow and also I'm not really using indexes
affectively. As you said a true topology structure would probably help in that regard and with using the resulting geometry, though not sure how much better that would be than the spatial index that exists already.  

I'm basically just sorting the geometries by bounding box which admittedly is not that great, but that's because I don't really understand how spatial indexes work at the moment.  Sorting by centroid or snaptogrid seemed too costly in my tests. Sometimes the bounding box sort helps and sometimes it makes things slightly worse (I think that accounts for the differences in timings of Dane's tests between my earlier and new).

Also once it gets into the aggregate array, I think I loose all the power of the already existing spatial index :(.

There is also a not so negligable penalty paid with dynamically growing an array in Postgres. In earlier studies - 
I determined st_unite_garray(ARRAY(SELECT ....)) or upgis_unitecascade_garray_sort(ARRAY(SELECT ...)) is sometimes much faster for large numbers of geometries than using the aggregate wrapper, 
but the aggregate wrapper is so much more flexible to use that I just settle for it.  I suspect its because the ARRAY(SELECT ...) construct allocates the size of the array first and then dumps stuff into it vs. the aggregate approach constantly growing the array as new geometries come in.  So I suppose if there were a way to somehow figure out size to allocate and allocate (for larger numbers of geometries, it would be more useful).

For small numbers of geometries in each group, this doesn't seem to be too much of a penalty.

Thanks,
Regina



-----Original Message-----
From: postgis-users-bounces at postgis.refractions.net [mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of Jose Carlos Martinez Llario
Sent: Wednesday, October 08, 2008 5:06 AM
To: postgis-users at postgis.refractions.net
Subject: [postgis-users] Re: Cascaded Union Aggregate function

Hi,

I want to tell you about the work i did a couple of years ago about the 
same topics with the hope it can help you.
I tried to improve the union operator to be able to carry out the 
'dissolve spatial operation'. Im a cartographer and I noticed that we 
can not use PostGIS in a direct way for dissolving a layer with 10.000 
geometries or more because it takes a long time and it needs a lot of 
resources for working.

I wrote a paper (sorry about the english...) explaining the work i did 
(unfortunately i need to write papers because of my job...i hate that of 
course).
In the introduction of the article  i did some test about cascade union. 
I got around 10 times faster results with the cascade union.
In the second part i proposed an algorithm to dissolve just the adjacent 
polygons. For example in a cadastral dataset starting from parcels to 
get lots boundaries. I got around 100 times faster. In total around 1000 
times faster than just using the union aggregate with the whole dataset.

In the article a test removed the shared boundaries in a layer with 500 
000 polygons. It took around 1000 secs in a kind of old computer.

  Geometries                        The whole aggregate    Cascade (i 
called it 'fragmented')      Just adjacent polygons
 Real case
   10 000                                    1 600 
s                                 around 100 
s                                     < 15 s
   30 000                                 i couldnt test 
it                         around 1000 s                             
around  50 s

Cadastral simulation (lots with 10 parcels)
  80 000 parcels                             i couldnt test it           
                   around 5000 s                             around 100 s
500 000 parcels                             i couldnt test it           
                  i couldnt test it                              around 
1000 s


Look in the article because the datasets have special geometry 
configuration.

I think the union operation can be really fast in a system with 
persistent topology. Im looking forward to try this with the current 
implementation of topology in PostGIS. I would like to know if someone 
is checking or implementing these kind of operations in PostGIs using 
topology.

I hope it can be useful for the PostGIS community,
The link of the article is:
http://cartosig.upv.es/es/system/files/jomarlla/article-martinez-llario-2.pdf

Cheers,
José Carlos


_______________________________________________
postgis-users mailing list
postgis-users at postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users
-----------------------------------------
The substance of this message, including any attachments, may be
confidential, legally privileged and/or exempt from disclosure
pursuant to Massachusetts law. It is intended
solely for the addressee. If you received this in error, please
contact the sender and delete the material from any computer.



More information about the postgis-users mailing list