[geos-devel] JTS/GEOS performance

strk at refractions.net strk at refractions.net
Mon Jan 31 05:42:37 EST 2005


On Fri, Jan 28, 2005 at 11:50:47AM -0700, Chapman, Martin wrote:
> Martin,
> 
> First I want to say that you guys are showing a lot of professionalism
> by how gracefully you guys accept constructive criticism from your
> users.  Not many developers can take such comments with open minds.  

I think developers with non-open/free-minds will hardly be
comfortable with open/free-software developments models.

> Secondly, I agree with your comments below.  Porting from one language
> to the other can be tricky when it comes to memory management, and in
> some cases it will be tough to optimize just because of the nature of
> the operation.  
> 
> That said, from my observations I don't think optimizing GEOS would be
> as tough as it may appear from the surface.  From what I've seen it's
> the nicest implementation of the OGC SFS I've seen yet, and the object
> model is very logical and complete.  I think the optimization strategy
> should be an incremental one, and a few small changes may solve 80% of
> the issues for performance and memory usage.  I have a bunch of ideas,
> but need some time to sit down and get back into the code before I offer
> any concrete suggestions.  I will take some time this weekend to see if
> I can come up with some easy changes.

What I've been thinking as a possible (harmless) step is reducing
memory fragmentation by reducing heap-allocated object elements.

As for the Coordinate copies I've made some tests.
The following table shows number of created Coordinate and
CoordinateSequence objects in relation to the number of output
and input points to the Buffer function. Input geom is the
one contained in the .xml file attached in my previous post.

         Input points : 3629

              buffer | out    | cted    | cted      |
              units  | points | points  | coord seq |
            ---------+--------+---------+-----------+
                 100 |   5640 |   14720 |      2787 |
                 200 |   5035 |   32206 |     12058 |
                 500 |   3941 |  120217 |     61498 |
                1000 |   3060 |  317219 |    169320 |

It would be useful to try out a CoordinateSequence implementation
using a linked list of Coordinate pointers. It has to be inspected
wheter this would cleanly work with current code (ie, wheter
implementing the class and make it the DefaultCoordinateSequence
will suffice to force it's use in all inner code).

All the Coordinates in that case would need to be immutable and
probably a Coordinate pool should be maintained to take care of
their mantagement (when to delete them? ref count? never? on demand?).
To reflect the Java way Coordinates (as all objects actually)
should be reference counted pointers. Maybe we can do it defining
a smart pointer as inherithing from Coordinate.

Do you C++ guru see these as possible ways ?

--strk;

> 
> As a quick background, I write user interfaces for remote sensing
> archive systems using MFC for windows desktops and Java/C++ for web
> based applications on Linux/Windows.  We manage/provision massive
> vector/raster datasets for many government agencies and the
> intelligence/military community.  Therefore, my team and I are
> constantly seeking out the fastest and most efficient technology for
> managing geo-spatial datasets.  We have really enjoyed using PostGIS,
> and GDAL/OGR/GEOS.  
> 
> Keep up the great work and I'll send my suggestions to you this weekend.
> 
> Martin
> 
> -----Original Message-----
> From: Martin Davis [mailto:mbdavis at VividSolutions.com] 
> Sent: Friday, January 28, 2005 11:12 AM
> To: GEOS Development List
> Subject: RE: [geos-devel] JTS/GEOS performance
> 
> 
> Thanks for the detailed reply, Martin.
> 
> Your comments about how objects are handled probably reflects a
> too-close adherence to the Java origins of the code.  Or, they could
> just be plain failure to take advantage of efficiencies possible with C.
> In either case, it seems like a substantial revision of the code base
> will be required.
> 
> Regarding your comment about the inefficiency of computing intersects as
> ! Disjoint, I'm aware of this issue and am starting to take steps to
> address it in JTS.  I agree that for intersects at least a faster
> general implementation is possible.  I'd be interested in hearing some
> details of your implementation - for instance, how do you make the
> intersection detection efficient?
> 
> As for the other named spatial predicates, I'm not sure I see as much
> opportunity for optimization.  This is due to the fact that most of the
> others require precise knowledge about how the boundary of one geometry
> interacts with the exterior or interior of another geometry.  This in
> turn requires knowledge of *all* intersections, not just whether an
> intersection exists.  This might be optimizable for particular cases of
> geometry types, but you then have the situation of needing to specify
> numerous different algorithms for different pairs of geometry types.
> This is why the initial JTS work focussed on coming up with a general
> algorithm for computing all predicates in the same way.  
> 
> But it would be great if you see any opportunities for faster
> implementations in particular cases, and if you could post details so
> all can benefit.
> 
> 
> Martin Davis, Senior Technical Architect
> Vivid Solutions Inc.      www.vividsolutions.com
> Suite #1A-2328 Government Street Victoria, B.C. V8T 5G5
> Phone: (250) 385 6040 - Local 308 Fax: (250) 385 6046
> 
> 
> > -----Original Message-----
> > From: Chapman, Martin [mailto:MChapman at sanz.com]
> > Sent: January 28, 2005 9:47 AM
> > To: GEOS Development List
> > Subject: RE: [geos-devel] JTS/GEOS performance
> > 
> > 
> > I studied the GEOS code and here is my opinion:
> > 
> > 1.  GEOS makes multiple copies of the coordinates during many
> > operations rather than storing the points once and then 
> > passing pointers. 
> > 
> > 2.  GEOS has a complicated object model that makes objects
> > for everything.  To calculate most spatial operations and 
> > relationships requires the construction and destruction of 
> > many many objects.
> > 
> > 3.  The DE9IM implementation makes mistakes like intersects =
> > !disjoint. That requires a full scan of all points in a poly, 
> > whereas a straight intersect algorithm will return faster 
> > most of the time because it can exit after it finds the first 
> > intersecting point.  In general, I think there should be 
> > optimized algorithms for each spatial operation/relationship 
> > rather than trying to calculate the full de9-im matrix.  You 
> > can keep the matrix, just change the way it gets populated 
> > behind the scenes.
> > 
> > 4.  When doing operations on point arrays, make functions
> > that takes pointers to arrays and just pass the pointers in 
> > from the geometry classes.  Currently, when you call a 
> > function like intersects GEOS makes copies of points and a 
> > slew of objects for each geometry operation.  
> > 
> > For a comparison, we tried to use geos with postgresql and
> > used the POSTGIS intersect method.  We found it very slow so 
> > we wrote our own user-defined intersect function.  Ours is 
> > 109 times faster on a million plus geometries.  That means a 
> > one second query for us takes GEOS 109 seconds.
> > 
> > Martin
> > 
> > 
> > 
> > -----Original Message-----
> > From: Martin Davis [mailto:mbdavis at VividSolutions.com]
> > Sent: Friday, January 28, 2005 10:12 AM
> > To: GEOS Development List
> > Subject: RE: [geos-devel] JTS/GEOS performance
> > 
> > 
> > Poor alloc performance would be my first guess for the
> > performance difference.  I would think Java is pretty 
> > optimized for memory allocation, since it relies on it so heavily.  
> > 
> > I think one standard approach to handling slow malloc is to
> > build a sub-allocation layer.  This would be made a bit 
> > easier in GEOS, since almost the allocation done inside 
> > geometry methods is released at the end of the method.  You 
> > could probably even dispense with the free calls as long as 
> > you were operating in the memory pool.  (One messiness that 
> > would need to be thought out is how to support exposing the 
> > various components such as tree indexes and noding while 
> > still allowing a custom allocator to be supplied.  Or you 
> > could just forget about this and just support access through 
> > geometry methods.  This is getting pretty far from the 
> > philosophy of JTS though...).
> > 
> > I'm not enough of a C expert to know if this is the best
> > route to pursue
> > - anyone have any other ideas?
> > 
> > Martin Davis, Senior Technical Architect
> > Vivid Solutions Inc.      www.vividsolutions.com
> > Suite #1A-2328 Government Street Victoria, B.C. V8T 5G5
> > Phone: (250) 385 6040 - Local 308 Fax: (250) 385 6046
> > 
> > 
> > > -----Original Message-----
> > > From: strk at refractions.net [mailto:strk at refractions.net]
> > > Sent: January 28, 2005 4:20 AM
> > > To: geos-devel at geos.refractions.net
> > > Subject: [geos-devel] JTS/GEOS performance
> > > 
> > > 
> > > Some time ago I've been researching about GEOS performance
> > problems as
> > > related to JTS. Attached is a shapefile and an .xml test
> > you can use
> > > to compare the two.
> > > 
> > > JTS does not support buffers tests, so you'll need to use another
> > > method for that. I used JUMP, which reports computation time.
> > > 
> > > Well. The operation is a buffer(polygon, 2000).
> > > 
> > >  JTS:  18 seconds
> > > GEOS: 574 seconds (9 minutes, 34 secs)
> > > 
> > > GEOS computation keeps the CPU pretty busy (98.2-99.8%)
> > > and takes up to about 170 MB of ram
> > > 
> > > JTS seems to use 3 threads, the bigger using at most 80%
> > > of CPU, but most of the time far below that point.
> > > JUMP reports 104MB committed, but I'm not sure about the meaning.
> > > 
> > > For GEOS, valgrind reports (with buffer 500):
> > >  malloc/free: 2982697 allocs, 2982697 frees, 924407212 bytes
> > > allocated.
> > > 
> > > How much do you think this wild allocation negatively influence the
> > > poor performance of GEOS ?
> > > 
> > > --strk;
> > > _______________________________________________
> > > geos-devel mailing list
> > > geos-devel at geos.refractions.net
> > > http://geos.refractions.net/mailman/listinfo/geos-devel
> > > 
> > _______________________________________________
> > geos-devel mailing list
> > geos-devel at geos.refractions.net
> > http://geos.refractions.net/mailman/listinfo/geos-devel
> > 
> > _______________________________________________
> > geos-devel mailing list
> > geos-devel at geos.refractions.net
> > http://geos.refractions.net/mailman/listinfo/geos-devel
> > 
> _______________________________________________
> geos-devel mailing list
> geos-devel at geos.refractions.net
> http://geos.refractions.net/mailman/listinfo/geos-devel
> _______________________________________________
> geos-devel mailing list
> geos-devel at geos.refractions.net
> http://geos.refractions.net/mailman/listinfo/geos-devel



More information about the geos-devel mailing list