[geos-devel] JTS/GEOS performance

Martin Davis mbdavis at VividSolutions.com
Mon Jan 31 14:22:15 EST 2005


> The following table shows number of created Coordinate and 
> CoordinateSequence objects in relation to the number of 

Something seems wrong here - are there really that many created
CoordinateSequences?  It's *possible* - there's a lot of noding that
goes on during buffering.  But this should really be looked at, and
possibly compared to JTS.

> It would be useful to try out a CoordinateSequence 
> implementation using a linked list of Coordinate pointers. It 

Hmmm.  This seems fundamentally the wrong approach.  JTS does not need
the flexibility of linked lists inside CoordinateSequences.  It sounds
like you are trying to build a garbage-collected memory management
layer.  This may be the only reasonable solution.  However, rather than
hacking this into the existing code in an obtrusive kind of way, it
would be nicer to look for a clean, layered solution.  Are there any
open source GC packages for C++?


Martin Davis, Senior Technical Architect
Vivid Solutions Inc.      www.vividsolutions.com
Suite #1A-2328 Government Street Victoria, B.C. V8T 5G5
Phone: (250) 385 6040 - Local 308 Fax: (250) 385 6046


> -----Original Message-----
> From: strk at refractions.net [mailto:strk at refractions.net] 
> Sent: January 31, 2005 2:43 AM
> To: GEOS Development List
> Subject: Re: [geos-devel] JTS/GEOS performance
> 
> 
> On Fri, Jan 28, 2005 at 11:50:47AM -0700, Chapman, Martin wrote:
> > Martin,
> > 
> > First I want to say that you guys are showing a lot of 
> professionalism 
> > by how gracefully you guys accept constructive criticism from your 
> > users.  Not many developers can take such comments with open minds.
> 
> I think developers with non-open/free-minds will hardly be 
> comfortable with open/free-software developments models.
> 
> > Secondly, I agree with your comments below.  Porting from 
> one language 
> > to the other can be tricky when it comes to memory 
> management, and in 
> > some cases it will be tough to optimize just because of the 
> nature of 
> > the operation.
> > 
> > That said, from my observations I don't think optimizing 
> GEOS would be 
> > as tough as it may appear from the surface.  From what I've 
> seen it's 
> > the nicest implementation of the OGC SFS I've seen yet, and 
> the object 
> > model is very logical and complete.  I think the 
> optimization strategy 
> > should be an incremental one, and a few small changes may 
> solve 80% of 
> > the issues for performance and memory usage.  I have a 
> bunch of ideas, 
> > but need some time to sit down and get back into the code before I 
> > offer any concrete suggestions.  I will take some time this 
> weekend to 
> > see if I can come up with some easy changes.
> 
> What I've been thinking as a possible (harmless) step is 
> reducing memory fragmentation by reducing heap-allocated 
> object elements.
> 
> As for the Coordinate copies I've made some tests.
> The following table shows number of created Coordinate and 
> CoordinateSequence objects in relation to the number of 
> output and input points to the Buffer function. Input geom is 
> the one contained in the .xml file attached in my previous post.
> 
>          Input points : 3629
> 
>               buffer | out    | cted    | cted      |
>               units  | points | points  | coord seq |
>             ---------+--------+---------+-----------+
>                  100 |   5640 |   14720 |      2787 |
>                  200 |   5035 |   32206 |     12058 |
>                  500 |   3941 |  120217 |     61498 |
>                 1000 |   3060 |  317219 |    169320 |
> 
> It would be useful to try out a CoordinateSequence 
> implementation using a linked list of Coordinate pointers. It 
> has to be inspected wheter this would cleanly work with 
> current code (ie, wheter implementing the class and make it 
> the DefaultCoordinateSequence will suffice to force it's use 
> in all inner code).
> 
> All the Coordinates in that case would need to be immutable 
> and probably a Coordinate pool should be maintained to take 
> care of their mantagement (when to delete them? ref count? 
> never? on demand?). To reflect the Java way Coordinates (as 
> all objects actually) should be reference counted pointers. 
> Maybe we can do it defining a smart pointer as inherithing 
> from Coordinate.
> 
> Do you C++ guru see these as possible ways ?
> 
> --strk;
> 
> > 
> > As a quick background, I write user interfaces for remote sensing 
> > archive systems using MFC for windows desktops and Java/C++ for web 
> > based applications on Linux/Windows.  We manage/provision massive 
> > vector/raster datasets for many government agencies and the 
> > intelligence/military community.  Therefore, my team and I are 
> > constantly seeking out the fastest and most efficient 
> technology for 
> > managing geo-spatial datasets.  We have really enjoyed 
> using PostGIS, 
> > and GDAL/OGR/GEOS.
> > 
> > Keep up the great work and I'll send my suggestions to you this 
> > weekend.
> > 
> > Martin
> > 
> > -----Original Message-----
> > From: Martin Davis [mailto:mbdavis at VividSolutions.com]
> > Sent: Friday, January 28, 2005 11:12 AM
> > To: GEOS Development List
> > Subject: RE: [geos-devel] JTS/GEOS performance
> > 
> > 
> > Thanks for the detailed reply, Martin.
> > 
> > Your comments about how objects are handled probably reflects a 
> > too-close adherence to the Java origins of the code.  Or, 
> they could 
> > just be plain failure to take advantage of efficiencies 
> possible with 
> > C. In either case, it seems like a substantial revision of the code 
> > base will be required.
> > 
> > Regarding your comment about the inefficiency of computing 
> intersects 
> > as ! Disjoint, I'm aware of this issue and am starting to 
> take steps 
> > to address it in JTS.  I agree that for intersects at least 
> a faster 
> > general implementation is possible.  I'd be interested in 
> hearing some 
> > details of your implementation - for instance, how do you make the 
> > intersection detection efficient?
> > 
> > As for the other named spatial predicates, I'm not sure I 
> see as much 
> > opportunity for optimization.  This is due to the fact that most of 
> > the others require precise knowledge about how the boundary of one 
> > geometry interacts with the exterior or interior of another 
> geometry.  
> > This in turn requires knowledge of *all* intersections, not just 
> > whether an intersection exists.  This might be optimizable for 
> > particular cases of geometry types, but you then have the 
> situation of 
> > needing to specify numerous different algorithms for 
> different pairs 
> > of geometry types. This is why the initial JTS work 
> focussed on coming 
> > up with a general algorithm for computing all predicates in 
> the same way.
> > 
> > But it would be great if you see any opportunities for faster 
> > implementations in particular cases, and if you could post 
> details so 
> > all can benefit.
> > 
> > 
> > Martin Davis, Senior Technical Architect
> > Vivid Solutions Inc.      www.vividsolutions.com
> > Suite #1A-2328 Government Street Victoria, B.C. V8T 5G5
> > Phone: (250) 385 6040 - Local 308 Fax: (250) 385 6046
> > 
> > 
> > > -----Original Message-----
> > > From: Chapman, Martin [mailto:MChapman at sanz.com]
> > > Sent: January 28, 2005 9:47 AM
> > > To: GEOS Development List
> > > Subject: RE: [geos-devel] JTS/GEOS performance
> > > 
> > > 
> > > I studied the GEOS code and here is my opinion:
> > > 
> > > 1.  GEOS makes multiple copies of the coordinates during many 
> > > operations rather than storing the points once and then passing 
> > > pointers.
> > > 
> > > 2.  GEOS has a complicated object model that makes objects for 
> > > everything.  To calculate most spatial operations and 
> relationships 
> > > requires the construction and destruction of many many objects.
> > > 
> > > 3.  The DE9IM implementation makes mistakes like intersects = 
> > > !disjoint. That requires a full scan of all points in a poly, 
> > > whereas a straight intersect algorithm will return faster most of 
> > > the time because it can exit after it finds the first 
> intersecting 
> > > point.  In general, I think there should be optimized 
> algorithms for 
> > > each spatial operation/relationship rather than trying to 
> calculate 
> > > the full de9-im matrix.  You can keep the matrix, just change the 
> > > way it gets populated behind the scenes.
> > > 
> > > 4.  When doing operations on point arrays, make functions 
> that takes 
> > > pointers to arrays and just pass the pointers in from the 
> geometry 
> > > classes.  Currently, when you call a function like 
> intersects GEOS 
> > > makes copies of points and a slew of objects for each geometry 
> > > operation.
> > > 
> > > For a comparison, we tried to use geos with postgresql 
> and used the 
> > > POSTGIS intersect method.  We found it very slow so we 
> wrote our own 
> > > user-defined intersect function.  Ours is 109 times faster on a 
> > > million plus geometries.  That means a one second query 
> for us takes 
> > > GEOS 109 seconds.
> > > 
> > > Martin
> > > 
> > > 
> > > 
> > > -----Original Message-----
> > > From: Martin Davis [mailto:mbdavis at VividSolutions.com]
> > > Sent: Friday, January 28, 2005 10:12 AM
> > > To: GEOS Development List
> > > Subject: RE: [geos-devel] JTS/GEOS performance
> > > 
> > > 
> > > Poor alloc performance would be my first guess for the 
> performance 
> > > difference.  I would think Java is pretty optimized for memory 
> > > allocation, since it relies on it so heavily.
> > > 
> > > I think one standard approach to handling slow malloc is 
> to build a 
> > > sub-allocation layer.  This would be made a bit easier in GEOS, 
> > > since almost the allocation done inside geometry methods 
> is released 
> > > at the end of the method.  You could probably even 
> dispense with the 
> > > free calls as long as you were operating in the memory 
> pool.  (One 
> > > messiness that would need to be thought out is how to support 
> > > exposing the various components such as tree indexes and noding 
> > > while still allowing a custom allocator to be supplied.  Or you
> > > could just forget about this and just support access through 
> > > geometry methods.  This is getting pretty far from the 
> > > philosophy of JTS though...).
> > > 
> > > I'm not enough of a C expert to know if this is the best route to 
> > > pursue
> > > - anyone have any other ideas?
> > > 
> > > Martin Davis, Senior Technical Architect
> > > Vivid Solutions Inc.      www.vividsolutions.com
> > > Suite #1A-2328 Government Street Victoria, B.C. V8T 5G5
> > > Phone: (250) 385 6040 - Local 308 Fax: (250) 385 6046
> > > 
> > > 
> > > > -----Original Message-----
> > > > From: strk at refractions.net [mailto:strk at refractions.net]
> > > > Sent: January 28, 2005 4:20 AM
> > > > To: geos-devel at geos.refractions.net
> > > > Subject: [geos-devel] JTS/GEOS performance
> > > > 
> > > > 
> > > > Some time ago I've been researching about GEOS performance
> > > problems as
> > > > related to JTS. Attached is a shapefile and an .xml test
> > > you can use
> > > > to compare the two.
> > > > 
> > > > JTS does not support buffers tests, so you'll need to 
> use another 
> > > > method for that. I used JUMP, which reports computation time.
> > > > 
> > > > Well. The operation is a buffer(polygon, 2000).
> > > > 
> > > >  JTS:  18 seconds
> > > > GEOS: 574 seconds (9 minutes, 34 secs)
> > > > 
> > > > GEOS computation keeps the CPU pretty busy (98.2-99.8%) 
> and takes 
> > > > up to about 170 MB of ram
> > > > 
> > > > JTS seems to use 3 threads, the bigger using at most 
> 80% of CPU, 
> > > > but most of the time far below that point. JUMP reports 104MB 
> > > > committed, but I'm not sure about the meaning.
> > > > 
> > > > For GEOS, valgrind reports (with buffer 500):
> > > >  malloc/free: 2982697 allocs, 2982697 frees, 924407212 bytes 
> > > > allocated.
> > > > 
> > > > How much do you think this wild allocation negatively influence 
> > > > the poor performance of GEOS ?
> > > > 
> > > > --strk;
> > > > _______________________________________________
> > > > geos-devel mailing list
> > > > geos-devel at geos.refractions.net 
> > > > http://geos.refractions.net/mailman/listinfo/geos-devel
> > > > 
> > > _______________________________________________
> > > geos-devel mailing list
> > > geos-devel at geos.refractions.net 
> > > http://geos.refractions.net/mailman/listinfo/geos-devel
> > > 
> > > _______________________________________________
> > > geos-devel mailing list
> > > geos-devel at geos.refractions.net 
> > > http://geos.refractions.net/mailman/listinfo/geos-devel
> > > 
> > _______________________________________________
> > geos-devel mailing list
> > geos-devel at geos.refractions.net 
> > http://geos.refractions.net/mailman/listinfo/geos-devel
> > _______________________________________________
> > geos-devel mailing list
> > geos-devel at geos.refractions.net 
> > http://geos.refractions.net/mailman/listinfo/geos-devel
> _______________________________________________
> geos-devel mailing list
> geos-devel at geos.refractions.net 
> http://geos.refractions.net/mailman/listinfo/geos-devel
> 



More information about the geos-devel mailing list