[postgis-devel] More work on Cascade Union
Martin Davis
mbdavis at refractions.net
Wed Aug 20 09:56:07 PDT 2008
Obe, Regina wrote:
> Martin,
>
> Evidently I didn't look at enough of the code in the JTS code base. I
> was focusing on CascadedUnion class rather than the UnaryUnionOp class.
>
> On closer inspection I see the CascadedUnion class is only called for
> Polygons and not Points and Lines and for points and lines it does
> presumably what it has always done - some sort of Overlay operation.
>
Yes. CascadedUnion really only helps if you are reducing the number of
points in the geometries as you union them. With points and lines, it's
only in very rare situations that the number of vertices would actually
decrease. (Also, with points the existing union algorithm is about as
performant as possible). So you're better off simply unioning all lines
together at once.
> So in theory if unite_garray is doing what JTS is doing, performance on
> points and lines shouldn't be any crummier than my cascaded union
> approach.
>
> But look at these stats, - doing the same in JTS is faster than both
> standard Postgis union and cascade union thingy and too fast for me to
> tell, but there is a point at which I can successfully crash JTS with
> this exercise :) - just feed it the whole tmp_points and watch it
> quickly crash in about 11 seconds. I didn't wait for Postgis cascade
> union to finish though so that wasn't a fair test.
>
Hmm... what exactly is the data that causes this crash? I'd like to
dig into this and find out what's going on in JTS.
> One thing I will say is the ST_Union will internally order the
> geometries where as the cascade approach I'm using will only internally
> order if given a preordered set. I presume having an internally sorted
> geometry is a desirable thing.
>
Not sure what you mean by "internally sorted geometry"? If you talking
about a linear sort of the geometries (by x or possibly x,y), I don't
think this is going to be as good as doing a true spatial sort (which is
what the Rtree index in CascadedUnion does, and what Kevin's grid
approach does)
> I was going to try to add some sort of bubblesort algorithm as the items
> are coming into the array in the aggregation process. Trying to do the
> sort after the fact with a generate_series hack seems to sometimes add
> overhead although a lot of the time with the revised algorithm using
> envelop check the overhead added by the pre sorting is significantly
> offset by the gain in iterating over an ordered set so most of the times
> I am seeing speed gains. So short-end of the stick - I can definitely
> get better timings than the last algorithm set I submitted.
>
> So I'll play around with array bubble sort vs. generate series ordered
> by hack to see which performs better.
>
> --Here are stats with just simple point geometries
>
> CREATE TABLE tmp_points(the_geom geometry);
> INSERT INTO tmp_points
> SELECT ST_MakePoint(x,y) As the_geom FROM generate_series(1,1000) x,
> generate_series(1,1000) y
> order by random();
>
>
> --734 ms (without order by hack)
> 281 ms (with generate series ordered by hack in
> st_unitecascade_garray_sort it performs even better - by a factor of 3)
>
> SELECT ST_CascadeUnion(the_geom)
> FROM (SELECT * FROM tmp_points limit 1000) as foo
>
> -- 2594 ms, (with order hack 1359 ms) -
> SELECT ST_CascadeUnion(the_geom)
> FROM (SELECT * FROM tmp_points limit 2000) as foo
>
> --9610 ms
> SELECT ST_Union(the_geom)
> FROM (SELECT * FROM tmp_points limit 1000) as foo
>
> --41625 ms
> SELECT ST_Union(the_geom)
> FROM (SELECT * FROM tmp_points limit 2000) as foo
>
> Any rate I'm still trying to figure out what the Geos unite_garray is
> doing. As far as I can tell its doing a combination of iterated unions
> and collect buffers. But that's just doing a very cursory glance at the
> 3.0, 3.1 code base.
>
> Thanks,
> Regina
>
> -----Original Message-----
> From: postgis-devel-bounces at postgis.refractions.net
> [mailto:postgis-devel-bounces at postgis.refractions.net] On Behalf Of
> Martin Davis
> Sent: Tuesday, August 19, 2008 8:13 PM
> To: PostGIS Development Discussion
> Subject: Re: [postgis-devel] More work on Cascade Union
>
> It looks like OJ *is* using the new JTS UnaryUnionOp (aka
> CascadedUnion). This was a change made on July 26 by Michael Michaud.
>
> I'll save looking at your pgsql code for a slow afternoon sometime - I'm
>
> sure it will be educational. In the meantime, good work on
> porting/modifying/enhancing the CascadedUnion concept for PostGIS!
>
> Martin
>
> Obe, Regina wrote:
>
>> Martin,
>>
>> Do you know if OpenJump finally fully integrated the cascade union? I
>>
>
>
>> downloaded the openjump nightly build. I did timings and it was able
>> to complete usstatebounds in about 12 secs (whereas before remember it
>>
>
>
>> gave a memory overflow) and the original samply_poly I think it did in
>>
>
>
>> about 22 seconds. I wasn't sure if this is because I am running on a
>> Vista pc (and probably more beefed up one than I ran last time). I
>> also don't see the count down with the union thing anymore.
>>
>>
>> Attached are my latest efforts. This includes the envelop check. The
>> bool_and aggregate was disappointing so I fell on good old EXISTS (in
>> this scenario especially when you can't really rely on indexes, exists
>>
>
>
>> performs better than a LEFT JOIN).
>>
>> Unfortunately in PostgreSQL since the geometries get swept into an
>> array, I can't really count on indexes to help out so I limited the
>> max check to be 1000 records and also do iterated checks. Which I
>> think is fairly different from the JTS approach.
>>
>>
>> This time I changed the core of the cascade union function into a
>> plpgsql function since I figured people would have an easier time
>> reading it. Not that I couldn't do it all with sql functions, but sql
>>
>
>
>> functions had the disadvantage of not allowing declaration of
>> variables or using pretty names for input variables.
>>
>> I have the st99_d00 set down to below 8 secs - but still sucks
>> compared to JTS cascade union (3-4 secs), but still much better than
>> the 48-55 second timings I get with the postgis union function.
>>
>> I fear I can't do much better than this without delving into GEOS
>> land, which I am not quite ready for.
>>
>> One thing I did discover is that the dump collect method is a very
>> important thing to have and would make the whole ST_Collect thing much
>>
>
>
>> more useful for people if it actually existed since I think that is
>> what people actually expect from that function. I think part of the
>> frustration that people have is that st_collect returns geometry
>> collections more often than not. I created a helper function that
>> does it since I had to use that technique so many times in the code,
>> but I suspect this would be done much more efficiently in the
>> lwpostgis code since it would then require only one trip instead of
>> many. I haven't calculated to see how much of the time is wasted in
>> there. I know that most of the lost time is still in the
>> ST_Unite_Garray function and the non-intersect check did not help
>> quite as much as I had hoped it would.
>>
>>
>>
>>
>> --7855 ms, 7874 ms
>> SELECT ST_CascadeUnion(the_geom) FROM st99_d00;
>>
>> This syntax which escapes the aggregate penalty gives me these timings
>> --7195 ms, 6804 ms, 6996 ms
>> SELECT st_unitecascade_garray_sort(ARRAY(SELECT the_geom FROM
>>
> st99_d00));
>
>> --114282 ms -- SELECT 114282/1000.0/60 -- 1.90 minutes (I didn't
>> bother running this with ST_Union since I would be waiting till the
>> cows came home)
>> SELECT st_unitecascade_garray_sort(ARRAY(SELECT the_geom FROM
>> usstatebounds));
>>
>> --152438 ms -- SELECT 152438/1000.0/60 -- 2.55 minutes
>> SELECT ST_CascadeUnion(the_geom) FROM usstatebounds;
>>
>> My cursory glance of the stats looks like I am getting the right
>> results although the internal sorting may not be optimal.
>>
>> Any feedback would be appreciated.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
> ------------------------------------------------------------------------
>
>> *The substance of this message, including any attachments, may be
>> confidential, legally privileged and/or exempt from disclosure
>> pursuant to Massachusetts law. It is intended solely for the
>> addressee. If you received this in error, please contact the sender
>> and delete the material from any computer. *
>>
>>
>>
> ------------------------------------------------------------------------
>
>> * Help make the earth a greener place. If at all possible resist
>> printing this email and join us in saving paper. *
>>
>> * *
>>
>> * *
>>
>>
>>
> ------------------------------------------------------------------------
>
>> _______________________________________________
>> postgis-devel mailing list
>> postgis-devel at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>>
>>
>
>
--
Martin Davis
Senior Technical Architect
Refractions Research, Inc.
(250) 383-3022
More information about the postgis-devel
mailing list