[postgis-devel] More Cascade Union Adventures
Martin Davis
mbdavis at refractions.net
Wed Aug 13 13:49:42 PDT 2008
Yep, I don't Hausdorff distance is available in very many places. Pity,
because it is a very useful metric for comparing geometry. The full
Hausdorff distance is quite challenging to implement, but I have made a
VertexHausdorffDistance approximation which is just as useful in most
situations, and is a lot simpler to implement (and faster to run).
By the way, are you sure that OpenJUMP is using CascadedUnion? AFAIK it
didn't in the past... Michael Michaud has just released (today!) an
extension which I think does use CascadedUnion - so you might want to
try that.
http://geo.michaelm.free.fr/OpenJUMP/resources/aggregation-0.1.jar
When I did the original testing with the 30K polygon dataset that you're
using, I was getting times of around 20 sec using CascadedUnion...
Obe, Regina wrote:
>
> Martin,
> Pardon my ignorance.
>
> I don't see a Hausdorff distance in OpenJump or maybe it goes by a
> more verbose name and the descriptions I read about Hausdorff
> distances are greek to me.
>
> Well the Mass town test seems to pass my trivial test exercises. It
> looks like massachusetts with no visually apparent gaps, has the same
> number points in all cases, similar area
>
> both num points - 476026
> both num geometries - 694
> area (ST_CascadeUnion - 2.094208570266725E10)
> area (JTS - 2.0942085702666965E10)
>
>
> -----Original Message-----
> From: postgis-devel-bounces at postgis.refractions.net on behalf of
> Martin Davis
> Sent: Wed 8/13/2008 11:51 AM
> To: PostGIS Development Discussion
> Subject: Re: [postgis-devel] More Cascade Union Adventures
>
> I wouldn't expect to have the results be exactly equal - the union code
> is likely to be input-order dependent.
>
> And the difference in areas doesn't seem surprising either - it's way
> down in the small decimal places, which would occur even with slight
> differences in the geometry.
>
> A more revealing test would be to compute the Hausdorff distance between
> the union boundaries - that would show if they differed by very much,
> and where. PostGIS doesn't have this - I can't remember whether
> OpenJUMP does or not. JEQL has this operation, too.
>
> Obe, Regina wrote:
> > Now testing all including JTS 1.9.0 (OpenJump) on a Win XP runing
> > PostgreSQL 8.3.1, PostGIS 1.3.3, Geos 3.0.0. It appears using array
> > trumps all, Cascade aggregate union is a vast improvement over ST_Union
> > (ST_Union I didn't bother testing of course because it owuld never
> > finish on this test), but evidentally the array accum calls give a major
> > penalty.
> >
> > I found some things I found a bit possibly disturbing. Maybe its just
> > the nature of unioning in different orders. I compared all 3 outputs
> > and none of them are binary equal or even ST_Equals for that matter.
> >
> > However all 3 give same basic stats using OpenJump - e.g
> > All result in = 32972 pts
> > components = 10
> > lengths and areas are off by a bit
> > length = agg cascade union = 17.262684721407624, k nested union =
> > 17.262684721407688,
> > jts = 17.262684721406348
> >
> > Should I be bothered by any of these or are they just rounding errors?
> >
> > --Other odd thing is that the array approach is not as good as it was on
> > my other machine bu the aggregate union performs better. I'll just
> > chuck this off to different postgresql version, and memory settings.
> >
> > Thanks,
> > Regina
> >
> >
> > -- 209,563 | 198,391 ms - SELECT 198391/1000.00/60 = 3.31 minutes
> >
> > SELECT ST_CascadeUnion(the_geom)
> > FROM (SELECT the_geom FROM sample_poly) As foo;
> >
> > -- 48,594 ms | 47,688 ms = 48 secs
> > SELECT st_unitecascade_garray_sort(ARRAY(SELECT the_geom FROM
> > sample_poly));
> >
> > -- 74,515 ms | 74,922 ms = SELECT 74515/1000.00/60 = 1.24 minutes
> > SELECT ST_Union(the_geom) AS the_geom, 'nested union'
> > FROM (
> > SELECT min(id) AS id, ST_Union(the_geom) AS the_geom
> > FROM (
> > SELECT min(id) AS id, ST_Union(the_geom) AS the_geom
> > FROM (
> > SELECT min(id) AS id, ST_Union(the_geom) AS the_geom
> > FROM (
> > SELECT min(id) AS id, ST_Union(the_geom) AS the_geom
> > FROM (SELECT the_geom, id FROM sample_poly) As foo
> > GROUP BY round(id/10)
> > ORDER BY id) AS tmp1
> > GROUP BY round(id/100)
> > ORDER BY id) AS tmp2
> > GROUP BY round(id/1000)
> > ORDER BY id) AS tmp3
> > GROUP BY round(id/10000)
> > ORDER BY id) AS tmp4
> > GROUP BY round(id/100000);
> >
> >
> > -- Open Jump JTS 1.9.0 Win XP
> > --After loading running union across the whole set
> > --1.14 minutes
> > --Loading database query
> > SELECT ST_AsBinary(the_geom)
> > FROM sample_poly;
> >
> > -----------------------------------------
> > The substance of this message, including any attachments, may be
> > confidential, legally privileged and/or exempt from disclosure
> > pursuant to Massachusetts law. It is intended
> > solely for the addressee. If you received this in error, please
> > contact the sender and delete the material from any computer.
> >
> > _______________________________________________
> > postgis-devel mailing list
> > postgis-devel at postgis.refractions.net
> > http://postgis.refractions.net/mailman/listinfo/postgis-devel
> >
> >
>
> --
> Martin Davis
> Senior Technical Architect
> Refractions Research, Inc.
> (250) 383-3022
>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>
--
Martin Davis
Senior Technical Architect
Refractions Research, Inc.
(250) 383-3022
More information about the postgis-devel
mailing list