[postgis-users] geomunion HOWTO?

Kevin Neufeld kneufeld at refractions.net
Fri Mar 14 09:06:02 PDT 2008


Hi Roger,

You may very well be pushing the bounds of what geomunion can do 
efficiently.  Back in Nov, as Martin pointed out, Lee Keel had a similar 
problem.  He was trying to dissolve a set of 32000 polygons, and even 
that was a long time (about the size of one of your 8 classes you are 
grouping on).  As you can see from Martin's link reference, however, we 
were able to bring this time down to 30 secs.

The problem here is two fold:
1. You input geometry may not be spatially close to eachother.  As Paul 
Ramsey mentioned, "The goal is to cause each individual geometry + 
geometry union to *reduce* the amount of aggregate linework. When the 
g+g ops have no locality, each addition *adds* to the amount of 
linework, making successive ops slower and slower and slower."
2. Through testing, I found that union does not perform well with large 
collections.  This is may be due to the size of your work_mem and 
shared_buffer settings and the size of the ever growing collected 
geometry, I'm not sure.  In any case, I found that a collected size of 
about 10 geometries performs well.

My solution to you would be:
1. Do as Paul Ramsey suggested and sort your input geometries so they 
are spatially close to each other before dissolving them together.  Paul 
did this by ordering by the x + y of the extents of the geometry.  This 
may work well, test it out.  In my solution earlier, I sorted by the 
centroid of the geometry, snapped to some grid.  But then Lee's data was 
fairly evenly scattered across his dataset extents - I'm not sure about 
yours.
2. Perform your dissolve on smaller groups (191000 records grouped into 
8 classes is far too large, leaving each group to be ~24000).  I would 
recommend you group on your class and a unique id, rounded to the 
nearest 10th digit.  This will ensure you union will not aggregate more 
than 10 geometries at a time, which I found is the entire key to this 
problem.

ie. (I haven't tested this, but you could try something like this).

CREATE TEMP TABLE tmp
  (id serial, class char(8), the_geom geometry);

INSERT INTO tmp (class, the_geom)
  SELECT class, the_geom
  FROM test_suit_h_crop3_class
  ORDER BY class, X(Extent(the_geom)) + Y(Extent(the_geom));
  -- OR
  ORDER BY class, the_geom;
  -- OR
  ORDER BY class, SnapToGrid(X(Extent(the_geom)), <some resonable 
precision>);
-- I'd be curious to see which is fastest

CREATE TABLE union_result AS
SELECT class, ST_Union(the_geom) AS the_geom
FROM ( 
  SELECT min(id) AS id, class, ST_Union(the_geom) AS the_geom
  FROM (
    SELECT min(id) AS id, class, ST_Union(the_geom) AS the_geom
    FROM (
      SELECT min(id) AS id, class, ST_Union(the_geom) AS the_geom
      FROM (
        SELECT min(id) AS id, class, ST_Union(the_geom) AS the_geom
        FROM tmp
        GROUP BY class, round(id/10)
        ORDER BY class, id) AS tmp1
      GROUP BY class, round(id/100)
      ORDER BY class, id) AS tmp2
    GROUP BY class, round(id/1000)
    ORDER BY class, id) AS tmp3
  GROUP BY class, round(id/10000)
  ORDER BY class, id) AS tmp4
GROUP BY class, round(id/100000)
ORDER BY class, id;

Hope this clarifies things.
-- Kevin

Martin Davis wrote:
> Yep, that's the kind of number I was asking for.  Not surprising that 
> it takes a long time!
>
> I think you might be pushing the bounds of what can be handled by 
> PostGIS/GEOS at the moment...
>
> Roger André wrote:
>> Hi, and thanks for the feedback.  I have 8 classes, spread across 
>> 191,317 total records.  Not sure is that was what you meant by number 
>> of geometries.
>> -- 
>>
>> On Thu, Mar 13, 2008 at 3:56 PM, Paragon Corporation <lr at pcorp.us 
>> <mailto:lr at pcorp.us>> wrote:
>>
>>      Slight correction
>>
>>      SELECT st_multi(st_union(the_geom)) AS the_geom, class FROM
>>     "test_suit_h_crop3_class" GROUP BY class;
>>
>>     or
>>
>>      SELECT st_multi(st_collect(the_geom)) AS the_geom, class FROM
>>     "test_suit_h_crop3_class" GROUP BY class;
>>
>>
>>     -----Original Message-----
>>     From: Paragon Corporation [mailto:lr at pcorp.us <mailto:lr at pcorp.us>]
>>     Sent: Thursday, March 13, 2008 6:54 PM
>>     To: 'PostGIS Users Discussion'
>>     Subject: RE: [postgis-users] geomunion HOWTO?
>>
>>     Also scrap the AsText call you have.   I'm guessing its slowing
>>     things down
>>     a bit, although probably not much, but its totally unnecessary at
>>     anyrate.
>>     Should just be
>>
>>      SELECT st_multi(st_geomunion(the_geom)) AS the_geom, class FROM
>>     "test_suit_h_crop3_class" GROUP BY class;
>>
>>     You may also want to consider using ST_Collect instead of
>>     ST_GeomUnion,
>>     although for large files may not help much.
>>
>>
>>     Hope that helps,
>>     Regina
>>
>>
>>     -----Original Message-----
>>     From: postgis-users-bounces at postgis.refractions.net
>>     <mailto:postgis-users-bounces at postgis.refractions.net>
>>     [mailto:postgis-users-bounces at postgis.refractions.net
>>     <mailto:postgis-users-bounces at postgis.refractions.net>] On Behalf
>>     Of Paul
>>     Ramsey
>>     Sent: Thursday, March 13, 2008 6:39 PM
>>     To: PostGIS Users Discussion
>>     Subject: Re: [postgis-users] geomunion HOWTO?
>>
>>     No, you are probably just exercising the geometric operators a
>>     lot. It is
>>     possible a cascaded union would do better, but we don't have that
>>     programmed
>>     right now.  You could try and make it mildly faster by forcing the
>>     union to
>>     happen in a minimally more efficient order, by sorting when you
>>     create your
>>     first table, see below...
>>
>>     No guarantees this makes anything better, just a random guess at a
>>     hack.
>>
>>     On 3/13/08, Roger André <randre at gmail.com
>>     <mailto:randre at gmail.com>> wrote:
>>     > I'm trying to find a way to generate "dissolved" geometries 
>> without
>>     > exporting shapefiles from PostGIS and performing the operating in
>>     > ArcGIS.  I found some instructions online at
>>     >
>>     
>> http://www.paolocorti.net/public/wordpress/index.php/2007/03/30/union-of-two 
>>
>>     -geometries-in-postgis/
>>     
>> <http://www.paolocorti.net/public/wordpress/index.php/2007/03/30/union-of-two-geometries-in-postgis/>. 
>>
>>     >  These work fine on their example, but the opeartion when 
>> applied to
>>     > my data set never completes.  I realize my data set is pretty 
>> large
>>     > (), but the same dissolve operation when done via ArcGIS on a
>>     > shapefile exported by pgsql2shp takes around 5 minutes to 
>> complete.
>>     > This leads me to believe I'm doing something completely wrong, 
>> and I
>>     > would love to get some feedback from those of you with
>>     experience doing
>>     this.  Below are the steps I've done.
>>     >
>>     > Step 1 - create a "crop_3" table that contains only crop3
>>     values, and
>>     > a class.  This completes within 30 secs:
>>     >
>>     > begin;
>>     > create table "test_suit_h_crop3_class" ( "alloc_id" char(8) 
>> PRIMARY
>>     > KEY,  "crop3" numeric, "class" char(8) ); select
>>     >
>>     
>> AddGeometryColumn('','test_suit_h_crop3_class','the_geom','-1','MULTIP
>>     > OLYGON',2); insert into "test_suit_h_crop3_class" ("alloc_id",
>>     > "crop3", "class",
>>     >  "the_geom")
>>     > select vw_suit_area_h.alloc_id, vw_suit_area_h.crop3, case when
>>     crop3
>>     > < 1 then 'class_0'
>>     > when crop3 >= 1 and crop3 < 860 then 'class_1'
>>     > when crop3 >= 860 and crop3 < 1720 then 'class_2'
>>     >  when crop3 >= 1720 and crop3 < 3440 then 'class_3'
>>     > when crop3 >= 3440 and crop3 < 5160 then 'class_4'
>>     > when crop3 >= 5160 and crop3 < 6880 then 'class_5'
>>     > when crop3 >= 6880 and crop3 < 7740 then 'class_6'
>>     >  when crop3 >= 7740 then 'class_7'
>>     > ELSE 'other'
>>     > end AS class,
>>     > vw_suit_area_h.the_geom
>>     > FROM vw_suit_area_h
>>
>>     ORDER BY X(Extent(the_geom)) + Y(Extent(the_geom))
>>
>>     > end;
>>
>>     More ideally, we would bit-interleave the X and Y values, to 
>> force the
>>     ordering of the inputs to be very well localized, and even more
>>     ideally do
>>     an actual cascaded union.
>>
>>     The goal is to cause each individual geometry + geometry union to
>>     *reduce* the amount of aggregate linework. When the g+g ops have no
>>     locality, each addition *adds* to the amount of linework, making
>>     successive
>>     ops slower and slower and slower.
>>
>>     > Step 2 - create a temp "dissolve" table to store the results of a
>>     > geometric union run of the above table, grouped by "class".  I
>>     run out
>>     > of patience before this ever completes (I've let it run for 
>> hours.)
>>     >
>>     > begin;
>>     > CREATE TABLE "test_suit_area_h_crop3_diss" (  gid serial PRIMARY
>>     KEY,
>>     > "class" char(8) ); select
>>     >
>>     
>> AddGeometryColumn('','test_suit_area_h_crop3_diss','the_geom','-1','MU
>>     > LTIPOLYGON',2); INSERT INTO "test_suit_area_h_crop3_diss"
>>     > (the_geom,class)  SELECT astext(multi(geomunion(the_geom))) AS
>>     > the_geom, class FROM "test_suit_h_crop3_class" GROUP BY class; 
>> end;
>>     >
>>     > Thanks,
>>     >
>>     > Roger
>>     >
>>     > _______________________________________________
>>     >  postgis-users mailing list
>>     >  postgis-users at postgis.refractions.net
>>     <mailto:postgis-users at postgis.refractions.net>
>>     > http://postgis.refractions.net/mailman/listinfo/postgis-users
>>     >
>>     >
>>     _______________________________________________
>>     postgis-users mailing list
>>     postgis-users at postgis.refractions.net
>>     <mailto:postgis-users at postgis.refractions.net>
>>     http://postgis.refractions.net/mailman/listinfo/postgis-users
>>
>>
>>
>>     _______________________________________________
>>     postgis-users mailing list
>>     postgis-users at postgis.refractions.net
>>     <mailto:postgis-users at postgis.refractions.net>
>>     http://postgis.refractions.net/mailman/listinfo/postgis-users
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>   
>



More information about the postgis-users mailing list