<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">

<HTML>

<HEAD>

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">

<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7652.24">

<TITLE>RE: [postgis-devel] Lame attempt at cascade union in sql</TITLE>

</HEAD>

<BODY>

<!-- Converted from text/plain format -->


<P><FONT SIZE=2>Curiosity got the better of me this time so I ran thru the full set once and got this.  So still better, but I guess there is still room for improvement and my cascading seems to not be growing linearly as Kevin's is.<BR>

<BR>

--277,656 ms = 4.6276 minutes<BR>

SELECT ST_CascadeUnion(the_geom)<BR>

FROM (SELECT * FROM sample_poly) As foo;<BR>

<BR>

--69,953 ms = 1.16 minutes<BR>

SELECT ST_Union(the_geom) AS the_geom<BR>

FROM (<BR>

   SELECT min(id) AS id, ST_Union(the_geom) AS the_geom<BR>

   FROM (<BR>

     SELECT min(id) AS id, ST_Union(the_geom) AS the_geom<BR>

     FROM (<BR>

       SELECT min(id) AS id, ST_Union(the_geom) AS the_geom<BR>

       FROM (<BR>

         SELECT min(id) AS id, ST_Union(the_geom) AS the_geom<BR>

         FROM (SELECT * FROM sample_poly) As foo<BR>

         GROUP BY round(id/10)<BR>

         ORDER BY id) AS tmp1<BR>

       GROUP BY round(id/100)<BR>

       ORDER BY id) AS tmp2<BR>

     GROUP BY round(id/1000)<BR>

     ORDER BY id) AS tmp3<BR>

   GROUP BY round(id/10000)<BR>

   ORDER BY id) AS tmp4<BR>

GROUP BY round(id/100000);<BR>

<BR>

<BR>

<BR>

<BR>

-----Original Message-----<BR>

From: postgis-devel-bounces@postgis.refractions.net on behalf of Obe, Regina<BR>

Sent: Thu 8/7/2008 9:53 PM<BR>

To: PostGIS Development Discussion<BR>

Subject: RE: [postgis-devel] Lame attempt at cascade union in sql<BR>

<BR>

Martin,<BR>

<BR>

Unfortunately my test servers are not as fast as yours and I probably have less patience than you do.  I didn't have the patience to run thru 30,000 polys so I just ran thru 5000.  My new iterative recursive approach seems to come much closer to Kevin's now although Kevin's still beats it.<BR>

<BR>

Below are the new functions and the results.  Of course the raw ST_Union is so slow I didn't bother testing it.<BR>

I also clustered by the_geom since I assumed that is more standard for people.<BR>

<BR>

CREATE OR REPLACE FUNCTION st_unitecascade_garray(geometry[])<BR>

  RETURNS geometry AS<BR>

$$<BR>

SELECT CASE WHEN array_upper($1,1) < 51 THEN st_unite_garray($1) ELSE<BR>

        st_unitecascade_garray(ARRAY(SELECT st_unite_garray($1[i:least(i + 49,array_upper($1,1))]) As geom<BR>

FROM generate_series(1, array_upper($1,1), 50) As i)) END<BR>

<BR>

$$<BR>

  LANGUAGE 'sql' IMMUTABLE;<BR>

<BR>

CREATE OR REPLACE FUNCTION st_unitecascade_garray_sort(geometry[])<BR>

  RETURNS geometry AS<BR>

$$<BR>

        SELECT CASE WHEN array_upper($1,1) < 51  THEN<BR>

st_unite_garray($1) ELSE<BR>

                st_unitecascade_garray(ARRAY(SELECT ($1)[s] FROM<BR>

generate_series(1, array_upper($1, 1)) AS s<BR>

                        ORDER BY ST_SnaptoGrid(ST_Centroid(($1)[s]),0.1) ) ) END;<BR>

$$<BR>

  LANGUAGE 'sql' IMMUTABLE;<BR>

<BR>

<BR>

 <BR>

--DROP AGGREGATE st_cascadeunion(geometry);<BR>

<BR>

CREATE AGGREGATE st_cascadeunion(geometry) (<BR>

  SFUNC=st_geom_accum,<BR>

  STYPE=geometry[],<BR>

  FINALFUNC=st_unitecascade_garray_sort<BR>

);<BR>

<BR>

--BEGIN TESTS --<BR>

CREATE INDEX idx_sample_poly_the_geom<BR>

  ON sample_poly<BR>

  USING gist<BR>

  (the_geom);<BR>

ALTER TABLE sample_poly CLUSTER ON idx_sample_poly_the_geom;<BR>

CLUSTER sample_poly;<BR>

<BR>

<BR>

--3515 ms, 3453 ms, 3579 ms<BR>

SELECT ST_CascadeUnion(the_geom)<BR>

FROM (SELECT * FROM sample_poly limit 2000) As foo;<BR>

<BR>

--2375 ms, 2375 ms, 2390 ms<BR>

SELECT ST_Union(the_geom) AS the_geom<BR>

FROM (<BR>

   SELECT min(id) AS id, ST_Union(the_geom) AS the_geom<BR>

   FROM (<BR>

     SELECT min(id) AS id, ST_Union(the_geom) AS the_geom<BR>

     FROM (<BR>

       SELECT min(id) AS id, ST_Union(the_geom) AS the_geom<BR>

       FROM (<BR>

         SELECT min(id) AS id, ST_Union(the_geom) AS the_geom<BR>

         FROM (SELECT * FROM sample_poly limit 2000) As foo<BR>

         GROUP BY round(id/10)<BR>

         ORDER BY id) AS tmp1<BR>

       GROUP BY round(id/100)<BR>

       ORDER BY id) AS tmp2<BR>

     GROUP BY round(id/1000)<BR>

     ORDER BY id) AS tmp3<BR>

   GROUP BY round(id/10000)<BR>

   ORDER BY id) AS tmp4<BR>

GROUP BY round(id/100000);<BR>

<BR>

--11,437 ms, 11687 ms, 11281 ms<BR>

SELECT ST_CascadeUnion(the_geom)<BR>

FROM (SELECT * FROM sample_poly limit 5000) As foo;<BR>

<BR>

--7609 ms, 7609 ms, 7609 ms<BR>

SELECT ST_Union(the_geom) AS the_geom<BR>

FROM (<BR>

   SELECT min(id) AS id, ST_Union(the_geom) AS the_geom<BR>

   FROM (<BR>

     SELECT min(id) AS id, ST_Union(the_geom) AS the_geom<BR>

     FROM (<BR>

       SELECT min(id) AS id, ST_Union(the_geom) AS the_geom<BR>

       FROM (<BR>

         SELECT min(id) AS id, ST_Union(the_geom) AS the_geom<BR>

         FROM (SELECT * FROM sample_poly limit 5000) As foo<BR>

         GROUP BY round(id/10)<BR>

         ORDER BY id) AS tmp1<BR>

       GROUP BY round(id/100)<BR>

       ORDER BY id) AS tmp2<BR>

     GROUP BY round(id/1000)<BR>

     ORDER BY id) AS tmp3<BR>

   GROUP BY round(id/10000)<BR>

   ORDER BY id) AS tmp4<BR>

GROUP BY round(id/100000);<BR>

<BR>

<BR>

<BR>

<BR>

<BR>

-----Original Message-----<BR>

From: postgis-devel-bounces@postgis.refractions.net on behalf of Martin Davis<BR>

Sent: Thu 8/7/2008 5:23 PM<BR>

To: PostGIS Development Discussion<BR>

Subject: Re: [postgis-devel] Lame attempt at cascade union in sql<BR>

<BR>

Nice work...<BR>

<BR>

Just for comparison purposes, running the same dataset through the JTS<BR>

Cascaded Union ran in 20 sec (on a reasonably fast workstation - but<BR>

aren't all CPUs the same speed now, more or less? 8^)<BR>

<BR>

Running on the data pulled from a PostGIS instance added 1 sec in<BR>

overhead...<BR>

<BR>

Kevin Neufeld wrote:<BR>

> Hi Regina,<BR>

><BR>

> So, I ran a few tests on a table with ~30000 overlapping polygons.<BR>

><BR>

> Test1:<BR>

> SELECT ST_Union(the_geom) FROM sample_poly;<BR>

> --  Total runtime: + 2 hours before I cancelled it.<BR>

><BR>

><BR>

> Test2:<BR>

> SELECT ST_CascadeUnion(the_geom) FROM sample_poly;<BR>

> --  Total runtime: 19 minutes.<BR>

><BR>

><BR>

> Test3:<BR>

> SELECT ST_Union(the_geom) AS the_geom<BR>

> FROM (<BR>

>   SELECT min(id) AS id, ST_Union(the_geom) AS the_geom<BR>

>   FROM (<BR>

>     SELECT min(id) AS id, ST_Union(the_geom) AS the_geom<BR>

>     FROM (<BR>

>       SELECT min(id) AS id, ST_Union(the_geom) AS the_geom<BR>

>       FROM (<BR>

>         SELECT min(id) AS id, ST_Union(the_geom) AS the_geom<BR>

>         FROM sample_poly<BR>

>         GROUP BY round(id/10)<BR>

>         ORDER BY id) AS tmp1<BR>

>       GROUP BY round(id/100)<BR>

>       ORDER BY id) AS tmp2<BR>

>     GROUP BY round(id/1000)<BR>

>     ORDER BY id) AS tmp3<BR>

>   GROUP BY round(id/10000)<BR>

>   ORDER BY id) AS tmp4<BR>

> GROUP BY round(id/100000);<BR>

> --  Total runtime: 1.5 minutes<BR>

><BR>

><BR>

> Test4:<BR>

> Reordered the tuples in sample_poly by "ORDER BY<BR>

> ST_SnapToGrid(ST_Centroid(the_geom), 0.01);"<BR>

> Reran query in Test3:<BR>

> --  Total runtime: 50 secs<BR>

><BR>

><BR>

> So, your function is a definite improvement over ST_Union, but the<BR>

> timings could still be better.<BR>

><BR>

> Cheers,<BR>

> Kevin<BR>

><BR>

> Obe, Regina wrote:<BR>

>> Kevin,<BR>

>><BR>

>> Can you try the cascade approach I sent in my last email with your large<BR>

>> dataset.<BR>

>><BR>

>> I tried it with my real data and here are the results<BR>

>> --time 3328 ms, 3203 ms, 3188 ms<BR>

>> SELECT ST_CascadeUnion(the_geom)<BR>

>> FROM (SELECT * FROM landparcels<BR>

>> WHERE pid lIKE '01%' and the_geom is not null<BR>

>> LIMIT 1000) p<BR>

>><BR>

>> --time 23375 ms, 23594 ms, 24109 ms<BR>

>> SELECT ST_Union(the_geom)<BR>

>> FROM (SELECT * FROM landparcels<BR>

>> WHERE pid lIKE '01%' and the_geom is not null<BR>

>> LIMIT 1000) p<BR>

>> -----------------------------------------<BR>

>> The substance of this message, including any attachments, may be<BR>

>> confidential, legally privileged and/or exempt from disclosure<BR>

>> pursuant to Massachusetts law. It is intended<BR>

>> solely for the addressee. If you received this in error, please<BR>

>> contact the sender and delete the material from any computer.<BR>

>><BR>

>> _______________________________________________<BR>

>> postgis-devel mailing list<BR>

>> postgis-devel@postgis.refractions.net<BR>

>> <A HREF="http://postgis.refractions.net/mailman/listinfo/postgis-devel">http://postgis.refractions.net/mailman/listinfo/postgis-devel</A><BR>

> _______________________________________________<BR>

> postgis-devel mailing list<BR>

> postgis-devel@postgis.refractions.net<BR>

> <A HREF="http://postgis.refractions.net/mailman/listinfo/postgis-devel">http://postgis.refractions.net/mailman/listinfo/postgis-devel</A><BR>

><BR>

<BR>

--<BR>

Martin Davis<BR>

Senior Technical Architect<BR>

Refractions Research, Inc.<BR>

(250) 383-3022<BR>

<BR>

_______________________________________________<BR>

postgis-devel mailing list<BR>

postgis-devel@postgis.refractions.net<BR>

<A HREF="http://postgis.refractions.net/mailman/listinfo/postgis-devel">http://postgis.refractions.net/mailman/listinfo/postgis-devel</A><BR>

<BR>

<BR>

</FONT>

</P>


</BODY>

</HTML>