[postgis-devel] More work on Cascade Union
Martin Davis
mbdavis at refractions.net
Wed Aug 20 11:52:57 PDT 2008
Obe, Regina wrote:
> So why does GEOS perform so poorly then? I suspect the cascade
> algorithm I am using is gaining by the fact that it realizes fairly
> quickly it can just collect all the points except the rare cases where
> the points are duplicated then it unions.
>
>
> I'll check on unioning tiger lines to see what kind of metrics I
> get. Looking at the GEOS code - I don't get a sense its doing the
> same thing as JTS, but I haven't really delved to see what the Overlay
> classes are actually doing in JTS or done more than a scant glance at
> the GEOS functions so I could very well be wrong in my reading.
>
The GEOS overlay algorithms are basically identical to JTS.
So you're saying that GEOS is slow for unioning lots of points? Not
sure why this would be. It's possible that it's just the GEOS mem
management which is slowing things down. Or, perhaps the Point union
algorithm is not as performant as I thought it was. When I get time
(after my 3 wk vacation...) I'll try and have a look at this.
>
>
> > Hmm... what exactly is the data that causes this crash? I'd like to
> > dig into this and find out what's going on in JTS.
>
> Simple - Create a table like this. Granted very unrealistic and then
> try to union all the points in JTS (well in my case OpenJump latest
> snapshot - reaches memory heap fault in 11 seconds) -
> 1,000,000 points in one geom is hard to swallow anyway so I wasn't too
> concerned.
>
> CREATE TABLE tmp_points(the_geom geometry);
> INSERT INTO tmp_points
> SELECT ST_MakePoint(x,y) As the_geom FROM generate_series(1,1000) x,
> generate_series(1,1000) y
> order by random();
>
Sounds like the JVM memory limit is set too low to run a million
points. Crank it up to 1 GB and try again.
>
>
>
> > Not sure what you mean by "internally sorted geometry"? If you talking
> > about a linear sort of the geometries (by x or possibly x,y), I don't
> > think this is going to be as good as doing a true spatial sort (which is
> > what the Rtree index in CascadedUnion does, and what Kevin's grid
> > approach does)
> Doing a true spatial sort - basically order by the_geom which I assume
> would use the Rtree operators for sorting or at least seems to.
>
Hmm... AFAIK sorting geometry just does a conventional "x then y"
ordering of the geometry. This is better than nothing, but isn't as
effective as a true spatial index ordering. It's worth noting that
using a spatial index actually doesn't give you a linear order of the
geometries - the key is to walk the tree of objects in depth-first
order, performing unions progressively up the tree. I don't think a
linear order provides the same kind of effect.
>
>
>
>
>
>
>
> ------------------------------------------------------------------------
>
> *The substance of this message, including any attachments, may be
> confidential, legally privileged and/or exempt from disclosure
> pursuant to Massachusetts law. It is intended solely for the
> addressee. If you received this in error, please contact the sender
> and delete the material from any computer. *
>
> ------------------------------------------------------------------------
>
> * Help make the earth a greener place. If at all possible resist
> printing this email and join us in saving paper. *
>
> * *
>
> * *
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>
--
Martin Davis
Senior Technical Architect
Refractions Research, Inc.
(250) 383-3022
More information about the postgis-devel
mailing list