[postgis-users] geomunion revisited....

Wed Nov 14 10:08:36 PST 2007

On Nov 14, 2007 9:48 AM, Martin Davis <mbdavis at refractions.net> wrote:
> Lee,
>
> Having done some more investigation, I can say that your dataset is a
> poster child for the use of Cascaded Union.  This is because it contains
> relatively simple geometries with a very high degree of overlap.
> Cascaded Union has the effect of quickly merging and discarding linework
> which doesn't appear in the final result.  This makes each individual
> union operation fairly performant.
>
> In JTS-land my results were:
>
> Cascaded Union: 20 sec
> Iterated Union: 3 h 40 min !  (This is equivalent to the PostGIS
> geomunion aggregate)
>
> I realize this doesn't help directly with your work, but it is a good
> example of why it would be nice to have CascadedUnion functionality in
> PostGIS.
>
> I can also see why buffer was struggling with this data.  It has to deal
> with all the linework in the dataset at once, and with that many
> overlapping lines it is simply overwhelmed.   Using buffer() is a hack
> to get around the lack of true CascadedUnion.  It works in many cases,
> but does have its limits.
>
> Martin
>
*Apologies for the cross-posting*

This is highly relevant to some of the work that I do on a regular
basis with soil survey data. Dealing with thousands of polygons which
share a common attribute is usually a nice time to GeomUnion them
together. I have only been successful at working with rather small
geometry collections, and something like CascadedUnion sounds
fantastic. What would it take to get this implemented in PostGIS ? It
might also be worth having in GRASS as well.

Cheers,

Dylan

>
> Martin Davis wrote:
> > As I mentioned in my previous post, buffer has to pull all geometries
> > into memory at once and process them all together.  It creates a lot
> > of internal data structures in the course of processing.  I'm not that
> > suprised that it doesn't work on 30K geometries. Or, there may well be
> > a memory leak - our test cases don't actually include one with 30K
> > geoms  8^)  (Lee, I'd be interested to see if this works in JTS - can
> > you send me your dataset as a shapefile?)
> >
> > I can think of a couple of things to do:
> > - Get the new version of JTS (from me) and use the CascadedUnion class
> > - Use a simple grid index on your data to partition it, union each
> > partition separately, and then union the results.  This process would
> > look something like:    - pick an appropriate grid size (say 10x10 ?)
> >    - compute the interior point of each geometry, determine which grid
> > cell it lies in, and save this in a new column
> >    - union all geoms in each grid cell together
> >    - union the result together
> >
> > Note: I haven't actually implemented this approach, but it's the only
> > way I can see to reduce the amount of geometry you are working with at
> > any given step.
> >
> > Lee Keel wrote:
> >>> -----Original Message-----
> >>> From: postgis-users-bounces at postgis.refractions.net
> >>> [mailto:postgis-users-
> >>> bounces at postgis.refractions.net] On Behalf Of Kevin Neufeld
> >>> Sent: Tuesday, November 13, 2007 12:40 AM
> >>> To: PostGIS Users Discussion
> >>> Subject: Re: [postgis-users] geomunion revisited....
> >>>
> >>> Also, I think you may be after the SQL syntax UNION ALL, not UNION, if
> >>> you are simply after concatenating the resultset from table 2 to
> >>> table 1.
> >>>
> >>> select simplify(...
> >>> UNION ALL
> >>> select simplify(...
> >>>
> >>> Also, I too would be curious how buffer(collect(...), 0) fares for you
> >>> instead of geomunion() for your final 5 hour query.
> >>> Since you're doing this all at once anyway, it may work.
> >>> Geomunion() is
> >>> an aggregate for the two geometry method.  Which means that it will
> >>> slowly add one geometry at a time to an every growing resulting
> >>> geometry.  Buffer(collect(...), 0) will decontruct the collection once,
> >>> and rebuild it once to a union-ed geometry.
> >>> -- Kevin
> >>>
> >>> Josh Livni wrote:
> >>>
> >>>> I think running buffer(bunch_of_geoms,0) rather than
> >>>> geomunion(bunch_of_geoms) might be faster.
> >>>>
> >>>> That said, still seems it takes a bit long.  If it were me I'd try
> >>>> playing around by first creating a table with your results, eg:
> >>>>
> >>>> '''create table simple_buffers as (
> >>>> select simplify(buffer(simplify(the_geom, 5), 400), 150) as the_geom
> >>>> from table1
> >>>> union
> >>>> select simplify(buffer(simplify(the_geom, 5), 400), 150) as the_geom
> >>>> from table2); '''
> >>>>
> >>>> then you could create a GiST index on the_geom of your new table, run
> >>>> vacuum analyze on it, and then try the geomunion and buffer to test.
> >>>> With 32k+ rows, depending on how much they overlap, that might help a
> >>>> little, or at least help potentially isolate the issue if it does take
> >>>> as long.
> >>>> Also I assume you've modified your postgres config away from the
> >>>> defaults to take advantage of more memory on your machine, etc.
> >>>>
> >>>>
> >>>>
> >>>>  -Josh
> >>>>
> >> [Lee Keel]
> >> Well, I have played with this some more and here is what I have found.
> >>
> >> I tried using the buffer(collect(the_geom), 0) on the largest table
> >> (about
> >> 30K rows) and I get the following error:
> >>
> >> NOTICE:  St9bad_alloc
> >>
> >> ERROR: GEOS buffer() threw an error!
> >> SQL state: XX000
> >>
> >>
> >> This entire table is only 5872KB but when I monitor this buffer
> >> process the
> >> memory gets over 1.7GB before it finally crashes out.  Even if you
> >> were to
> >> triple the memory I can't see where it would ever get to 1.7GB.
> >> NOTE: When
> >> monitoring the process of doing just the collect(the_geom) the memory
> >> footprint gets up to about 26-28MB and takes about 2.5 minutes.  This
> >> leads
> >> me to think there might be a memory leak in the buffer().
> >>
> >>
> >> I tried using a test table.  This new table is 3704KB.  But I
> >> realized that
> >> a gist index is not going to buy me anything because it is still
> >> going to
> >> pull the entire table.
> >>
> >>
> >> I have listed my config settings below.  I am not sure how much more
> >> I can
> >> tweak them since I am running on 64bit Vista.  I have had a few posts
> >> in the
> >> last week that have basically said that the windows version of this
> >> can't be
> >> tweaked much more due to constraints by windows.
> >>
> >> shared_buffers = 256MB
> >> temp_buffers = 32MB
> >> max_prepared_transactions = 100
> >> work_mem = 16MB
> >> maintenance_work_mem = 256MB
> >> max_stack_depth = 3MB
> >>
> >> Thanks for everyone's help and attention to this.
> >> Lee
> >>
> >> This email and any files transmitted with it are confidential and
> >> intended solely for the use of the individual or entity to whom they
> >> are addressed. If you have received this email in error please notify
> >> the sender. This message contains confidential information and is
> >> intended only for the individual named. If you are not the named
> >> addressee you should not disseminate, distribute or copy this e-mail.
> >> _______________________________________________
> >> postgis-users mailing list
> >> postgis-users at postgis.refractions.net
> >> http://postgis.refractions.net/mailman/listinfo/postgis-users
> >>
> >>
> >
>
> --
> Martin Davis
> Senior Technical Architect
> Refractions Research, Inc.
> (250) 383-3022
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>