[postgis-devel] More work on Cascade Union
Martin Davis
mbdavis at refractions.net
Wed Aug 20 13:39:15 PDT 2008
Obe, Regina wrote:
>
> Martin,
> On further inspection into the depths of GEOS code. I see you are
> right it is doing a Binary ops thing. I'm not sure you assumption
> about effectiveness of binary ops is wrong for points since OpenJump
> works pretty fast.
>
That's good to hear.
>
> So maybe its an issue with the unite_garray function in PostGIS lwgeom
> code that calls GEOS -. The penalty incurred for each and every
> POSTGIS2GEOS(geom) it to a GEOS one and then unioning it with the
> result that I am seeing and not necessarily an issue with GEOS itself.
>
> What confused me is that there are 2 versions of unite_garray - one
> that uses buffer collect and one that uses an iterate GEOS union approach.
>
> The version of the trunk code I'm looking at has this remarked out
> /* #define UNITE_USING_BUFFER 1 */
> But I was mistakingly reading the UNITE USING BUFFER version which
> apparently is not in use.
>
It's annoying that there's so much overhead in all the marshalling that
goes on between Postgres, PostGIS, and GEOS. Points are the worst case
for this, since they have so much overhead relative to their size. I'm
not sure what the best approach is to mitigate this. One option might
be to do more point operations directly in PostGIS, since they tend to
be fairly simple compared to more complex geoms. But this starts
getting into a lot of code to be developed...
>
>
>
> Thanks,
> Regina
>
> -----Original Message-----
> From: postgis-devel-bounces at postgis.refractions.net on behalf of
> Martin Davis
> Sent: Wed 8/20/2008 2:52 PM
> To: PostGIS Development Discussion
> Subject: Re: [postgis-devel] More work on Cascade Union
>
>
>
> Obe, Regina wrote:
> > So why does GEOS perform so poorly then? I suspect the cascade
> > algorithm I am using is gaining by the fact that it realizes fairly
> > quickly it can just collect all the points except the rare cases where
> > the points are duplicated then it unions.
> >
> >
> > I'll check on unioning tiger lines to see what kind of metrics I
> > get. Looking at the GEOS code - I don't get a sense its doing the
> > same thing as JTS, but I haven't really delved to see what the Overlay
> > classes are actually doing in JTS or done more than a scant glance at
> > the GEOS functions so I could very well be wrong in my reading.
> >
> The GEOS overlay algorithms are basically identical to JTS.
>
> So you're saying that GEOS is slow for unioning lots of points? Not
> sure why this would be. It's possible that it's just the GEOS mem
> management which is slowing things down. Or, perhaps the Point union
> algorithm is not as performant as I thought it was. When I get time
> (after my 3 wk vacation...) I'll try and have a look at this.
> >
> >
> > > Hmm... what exactly is the data that causes this crash? I'd like to
> > > dig into this and find out what's going on in JTS.
> >
> > Simple - Create a table like this. Granted very unrealistic and then
> > try to union all the points in JTS (well in my case OpenJump latest
> > snapshot - reaches memory heap fault in 11 seconds) -
> > 1,000,000 points in one geom is hard to swallow anyway so I wasn't too
> > concerned.
> >
> > CREATE TABLE tmp_points(the_geom geometry);
> > INSERT INTO tmp_points
> > SELECT ST_MakePoint(x,y) As the_geom FROM generate_series(1,1000) x,
> > generate_series(1,1000) y
> > order by random();
> >
> Sounds like the JVM memory limit is set too low to run a million
> points. Crank it up to 1 GB and try again.
> >
> >
> >
> > > Not sure what you mean by "internally sorted geometry"? If you
> talking
> > > about a linear sort of the geometries (by x or possibly x,y), I don't
> > > think this is going to be as good as doing a true spatial sort
> (which is
> > > what the Rtree index in CascadedUnion does, and what Kevin's grid
> > > approach does)
> > Doing a true spatial sort - basically order by the_geom which I assume
> > would use the Rtree operators for sorting or at least seems to.
> >
> Hmm... AFAIK sorting geometry just does a conventional "x then y"
> ordering of the geometry. This is better than nothing, but isn't as
> effective as a true spatial index ordering. It's worth noting that
> using a spatial index actually doesn't give you a linear order of the
> geometries - the key is to walk the tree of objects in depth-first
> order, performing unions progressively up the tree. I don't think a
> linear order provides the same kind of effect.
> >
> >
> >
> >
> >
> >
> >
> > ------------------------------------------------------------------------
> >
> > *The substance of this message, including any attachments, may be
> > confidential, legally privileged and/or exempt from disclosure
> > pursuant to Massachusetts law. It is intended solely for the
> > addressee. If you received this in error, please contact the sender
> > and delete the material from any computer. *
> >
> > ------------------------------------------------------------------------
> >
> > * Help make the earth a greener place. If at all possible resist
> > printing this email and join us in saving paper. *
> >
> > * *
> >
> > * *
> >
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > postgis-devel mailing list
> > postgis-devel at postgis.refractions.net
> > http://postgis.refractions.net/mailman/listinfo/postgis-devel
> >
>
> --
> Martin Davis
> Senior Technical Architect
> Refractions Research, Inc.
> (250) 383-3022
>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>
--
Martin Davis
Senior Technical Architect
Refractions Research, Inc.
(250) 383-3022
More information about the postgis-devel
mailing list