<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7652.24">
<TITLE>RE: [postgis-devel] More work on Cascade Union</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->
<P><FONT SIZE=2>Martin,<BR>
On further inspection into the depths of GEOS code. I see you are right it is doing a Binary ops thing. I'm not sure you assumption about effectiveness of binary ops is wrong for points since OpenJump works pretty fast.<BR>
<BR>
So maybe its an issue with the unite_garray function in PostGIS lwgeom code that calls GEOS -. The penalty incurred for each and every POSTGIS2GEOS(geom) it to a GEOS one and then unioning it with the result that I am seeing and not necessarily an issue with GEOS itself.<BR>
<BR>
What confused me is that there are 2 versions of unite_garray - one that uses buffer collect and one that uses an iterate GEOS union approach.<BR>
<BR>
The version of the trunk code I'm looking at has this remarked out<BR>
/* #define UNITE_USING_BUFFER 1 */<BR>
But I was mistakingly reading the UNITE USING BUFFER version which apparently is not in use.<BR>
<BR>
<BR>
Thanks,<BR>
Regina<BR>
<BR>
-----Original Message-----<BR>
From: postgis-devel-bounces@postgis.refractions.net on behalf of Martin Davis<BR>
Sent: Wed 8/20/2008 2:52 PM<BR>
To: PostGIS Development Discussion<BR>
Subject: Re: [postgis-devel] More work on Cascade Union<BR>
<BR>
<BR>
<BR>
Obe, Regina wrote:<BR>
> So why does GEOS perform so poorly then? I suspect the cascade<BR>
> algorithm I am using is gaining by the fact that it realizes fairly<BR>
> quickly it can just collect all the points except the rare cases where<BR>
> the points are duplicated then it unions.<BR>
><BR>
><BR>
> I'll check on unioning tiger lines to see what kind of metrics I<BR>
> get. Looking at the GEOS code - I don't get a sense its doing the<BR>
> same thing as JTS, but I haven't really delved to see what the Overlay<BR>
> classes are actually doing in JTS or done more than a scant glance at<BR>
> the GEOS functions so I could very well be wrong in my reading.<BR>
><BR>
The GEOS overlay algorithms are basically identical to JTS.<BR>
<BR>
So you're saying that GEOS is slow for unioning lots of points? Not<BR>
sure why this would be. It's possible that it's just the GEOS mem<BR>
management which is slowing things down. Or, perhaps the Point union<BR>
algorithm is not as performant as I thought it was. When I get time<BR>
(after my 3 wk vacation...) I'll try and have a look at this.<BR>
><BR>
> <BR>
> > Hmm... what exactly is the data that causes this crash? I'd like to<BR>
> > dig into this and find out what's going on in JTS.<BR>
><BR>
> Simple - Create a table like this. Granted very unrealistic and then<BR>
> try to union all the points in JTS (well in my case OpenJump latest<BR>
> snapshot - reaches memory heap fault in 11 seconds) -<BR>
> 1,000,000 points in one geom is hard to swallow anyway so I wasn't too<BR>
> concerned.<BR>
><BR>
> CREATE TABLE tmp_points(the_geom geometry);<BR>
> INSERT INTO tmp_points<BR>
> SELECT ST_MakePoint(x,y) As the_geom FROM generate_series(1,1000) x,<BR>
> generate_series(1,1000) y<BR>
> order by random();<BR>
><BR>
Sounds like the JVM memory limit is set too low to run a million<BR>
points. Crank it up to 1 GB and try again.<BR>
><BR>
><BR>
><BR>
> > Not sure what you mean by "internally sorted geometry"? If you talking<BR>
> > about a linear sort of the geometries (by x or possibly x,y), I don't<BR>
> > think this is going to be as good as doing a true spatial sort (which is<BR>
> > what the Rtree index in CascadedUnion does, and what Kevin's grid<BR>
> > approach does)<BR>
> Doing a true spatial sort - basically order by the_geom which I assume<BR>
> would use the Rtree operators for sorting or at least seems to.<BR>
><BR>
Hmm... AFAIK sorting geometry just does a conventional "x then y"<BR>
ordering of the geometry. This is better than nothing, but isn't as<BR>
effective as a true spatial index ordering. It's worth noting that<BR>
using a spatial index actually doesn't give you a linear order of the<BR>
geometries - the key is to walk the tree of objects in depth-first<BR>
order, performing unions progressively up the tree. I don't think a<BR>
linear order provides the same kind of effect.<BR>
><BR>
><BR>
><BR>
><BR>
><BR>
><BR>
><BR>
> ------------------------------------------------------------------------<BR>
><BR>
> *The substance of this message, including any attachments, may be<BR>
> confidential, legally privileged and/or exempt from disclosure<BR>
> pursuant to Massachusetts law. It is intended solely for the<BR>
> addressee. If you received this in error, please contact the sender<BR>
> and delete the material from any computer. *<BR>
><BR>
> ------------------------------------------------------------------------<BR>
><BR>
> * Help make the earth a greener place. If at all possible resist<BR>
> printing this email and join us in saving paper. *<BR>
><BR>
> * *<BR>
><BR>
> * *<BR>
><BR>
> ------------------------------------------------------------------------<BR>
><BR>
> _______________________________________________<BR>
> postgis-devel mailing list<BR>
> postgis-devel@postgis.refractions.net<BR>
> <A HREF="http://postgis.refractions.net/mailman/listinfo/postgis-devel">http://postgis.refractions.net/mailman/listinfo/postgis-devel</A><BR>
> <BR>
<BR>
--<BR>
Martin Davis<BR>
Senior Technical Architect<BR>
Refractions Research, Inc.<BR>
(250) 383-3022<BR>
<BR>
_______________________________________________<BR>
postgis-devel mailing list<BR>
postgis-devel@postgis.refractions.net<BR>
<A HREF="http://postgis.refractions.net/mailman/listinfo/postgis-devel">http://postgis.refractions.net/mailman/listinfo/postgis-devel</A><BR>
<BR>
<BR>
<BR>
</FONT>
</P>
</BODY>
</HTML>