[postgis-devel] Lame attempt at cascade union in sql

Obe, Regina robe.dnd at cityofboston.gov
Thu Aug 7 18:53:15 PDT 2008


Martin,

Unfortunately my test servers are not as fast as yours and I probably have less patience than you do.  I didn't have the patience to run thru 30,000 polys so I just ran thru 5000.  My new iterative recursive approach seems to come much closer to Kevin's now although Kevin's still beats it.

Below are the new functions and the results.  Of course the raw ST_Union is so slow I didn't bother testing it.
I also clustered by the_geom since I assumed that is more standard for people.

CREATE OR REPLACE FUNCTION st_unitecascade_garray(geometry[])
  RETURNS geometry AS
$$
SELECT CASE WHEN array_upper($1,1) < 51 THEN st_unite_garray($1) ELSE
	st_unitecascade_garray(ARRAY(SELECT st_unite_garray($1[i:least(i + 49,array_upper($1,1))]) As geom
FROM generate_series(1, array_upper($1,1), 50) As i)) END

$$
  LANGUAGE 'sql' IMMUTABLE;

CREATE OR REPLACE FUNCTION st_unitecascade_garray_sort(geometry[])
  RETURNS geometry AS
$$
	SELECT CASE WHEN array_upper($1,1) < 51  THEN
st_unite_garray($1) ELSE
		st_unitecascade_garray(ARRAY(SELECT ($1)[s] FROM
generate_series(1, array_upper($1, 1)) AS s 
			ORDER BY ST_SnaptoGrid(ST_Centroid(($1)[s]),0.1) ) ) END;
$$
  LANGUAGE 'sql' IMMUTABLE;


  
--DROP AGGREGATE st_cascadeunion(geometry);

CREATE AGGREGATE st_cascadeunion(geometry) (
  SFUNC=st_geom_accum,
  STYPE=geometry[],
  FINALFUNC=st_unitecascade_garray_sort
);

--BEGIN TESTS --
CREATE INDEX idx_sample_poly_the_geom
  ON sample_poly
  USING gist
  (the_geom);
ALTER TABLE sample_poly CLUSTER ON idx_sample_poly_the_geom;
CLUSTER sample_poly;


--3515 ms, 3453 ms, 3579 ms
SELECT ST_CascadeUnion(the_geom)
FROM (SELECT * FROM sample_poly limit 2000) As foo;

--2375 ms, 2375 ms, 2390 ms
SELECT ST_Union(the_geom) AS the_geom
FROM (
   SELECT min(id) AS id, ST_Union(the_geom) AS the_geom
   FROM (
     SELECT min(id) AS id, ST_Union(the_geom) AS the_geom
     FROM (
       SELECT min(id) AS id, ST_Union(the_geom) AS the_geom
       FROM (
         SELECT min(id) AS id, ST_Union(the_geom) AS the_geom
         FROM (SELECT * FROM sample_poly limit 2000) As foo
         GROUP BY round(id/10)
         ORDER BY id) AS tmp1
       GROUP BY round(id/100)
       ORDER BY id) AS tmp2
     GROUP BY round(id/1000)
     ORDER BY id) AS tmp3
   GROUP BY round(id/10000)
   ORDER BY id) AS tmp4
GROUP BY round(id/100000);

--11,437 ms, 11687 ms, 11281 ms
SELECT ST_CascadeUnion(the_geom)
FROM (SELECT * FROM sample_poly limit 5000) As foo;

--7609 ms, 7609 ms, 7609 ms 
SELECT ST_Union(the_geom) AS the_geom
FROM (
   SELECT min(id) AS id, ST_Union(the_geom) AS the_geom
   FROM (
     SELECT min(id) AS id, ST_Union(the_geom) AS the_geom
     FROM (
       SELECT min(id) AS id, ST_Union(the_geom) AS the_geom
       FROM (
         SELECT min(id) AS id, ST_Union(the_geom) AS the_geom
         FROM (SELECT * FROM sample_poly limit 5000) As foo
         GROUP BY round(id/10)
         ORDER BY id) AS tmp1
       GROUP BY round(id/100)
       ORDER BY id) AS tmp2
     GROUP BY round(id/1000)
     ORDER BY id) AS tmp3
   GROUP BY round(id/10000)
   ORDER BY id) AS tmp4
GROUP BY round(id/100000);





-----Original Message-----
From: postgis-devel-bounces at postgis.refractions.net on behalf of Martin Davis
Sent: Thu 8/7/2008 5:23 PM
To: PostGIS Development Discussion
Subject: Re: [postgis-devel] Lame attempt at cascade union in sql
 
Nice work...

Just for comparison purposes, running the same dataset through the JTS 
Cascaded Union ran in 20 sec (on a reasonably fast workstation - but 
aren't all CPUs the same speed now, more or less? 8^)

Running on the data pulled from a PostGIS instance added 1 sec in 
overhead...

Kevin Neufeld wrote:
> Hi Regina,
>
> So, I ran a few tests on a table with ~30000 overlapping polygons.
>
> Test1:
> SELECT ST_Union(the_geom) FROM sample_poly;
> --  Total runtime: + 2 hours before I cancelled it.
>
>
> Test2:
> SELECT ST_CascadeUnion(the_geom) FROM sample_poly;
> --  Total runtime: 19 minutes.
>
>
> Test3:
> SELECT ST_Union(the_geom) AS the_geom
> FROM (
>   SELECT min(id) AS id, ST_Union(the_geom) AS the_geom
>   FROM (
>     SELECT min(id) AS id, ST_Union(the_geom) AS the_geom
>     FROM (
>       SELECT min(id) AS id, ST_Union(the_geom) AS the_geom
>       FROM (
>         SELECT min(id) AS id, ST_Union(the_geom) AS the_geom
>         FROM sample_poly
>         GROUP BY round(id/10)
>         ORDER BY id) AS tmp1
>       GROUP BY round(id/100)
>       ORDER BY id) AS tmp2
>     GROUP BY round(id/1000)
>     ORDER BY id) AS tmp3
>   GROUP BY round(id/10000)
>   ORDER BY id) AS tmp4
> GROUP BY round(id/100000);
> --  Total runtime: 1.5 minutes
>
>
> Test4:
> Reordered the tuples in sample_poly by "ORDER BY 
> ST_SnapToGrid(ST_Centroid(the_geom), 0.01);"
> Reran query in Test3:
> --  Total runtime: 50 secs
>
>
> So, your function is a definite improvement over ST_Union, but the 
> timings could still be better.
>
> Cheers,
> Kevin
>
> Obe, Regina wrote:
>> Kevin,
>>
>> Can you try the cascade approach I sent in my last email with your large
>> dataset.
>>
>> I tried it with my real data and here are the results
>> --time 3328 ms, 3203 ms, 3188 ms
>> SELECT ST_CascadeUnion(the_geom)
>> FROM (SELECT * FROM landparcels
>> WHERE pid lIKE '01%' and the_geom is not null
>> LIMIT 1000) p
>>
>> --time 23375 ms, 23594 ms, 24109 ms
>> SELECT ST_Union(the_geom)
>> FROM (SELECT * FROM landparcels
>> WHERE pid lIKE '01%' and the_geom is not null
>> LIMIT 1000) p
>> -----------------------------------------
>> The substance of this message, including any attachments, may be
>> confidential, legally privileged and/or exempt from disclosure
>> pursuant to Massachusetts law. It is intended
>> solely for the addressee. If you received this in error, please
>> contact the sender and delete the material from any computer.
>>
>> _______________________________________________
>> postgis-devel mailing list
>> postgis-devel at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-devel
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>

-- 
Martin Davis
Senior Technical Architect
Refractions Research, Inc.
(250) 383-3022

_______________________________________________
postgis-devel mailing list
postgis-devel at postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20080807/6e93e858/attachment.html>


More information about the postgis-devel mailing list