[geos-devel] JTS/GEOS performance

Mon Jan 31 19:33:35 EST 2005

Thanks for the detailed comments, Martin.  It's certainly useful having
someone review the code at this level, and even more useful for them to
actually provide alternative implementations.

A few comments in response:

1.  The new JTS offers this scheme as well.  In fact, GEOS also provides
the ability to do this.  However, the key to making use of the
efficiency possible with an alternate storage format is to avoid having
to extract large numbers of points into other kinds of data structures
for geometric calculations.  This is fairly straightforward to do for
simple geometric algorithms, but much harder to do for complex
algorithms such as optimized intersection tests.  The new JTS tries to
strike a better balance between these conflicting requirements by
providing some optimized predicate tests (intersects and contains) for
simple cases (rectangles).  This could be extended to handle arbitrary
geometries for intersects, with some work.   With quite a bit more work
the internal intersection subsystem could be optimized as well, I think.

How are you intersection tests in a performant way?  All approached that
I've seen are *much* more complex to implement than the brute force
technique, and require construction of numerous objects to build
internal data structures.

> etc...) I have a generic algorithms class that has basic math 
> functions for doing the operations that take array pointers 
> for X and Y and the algorithms simply operate on the arrays 
> and know nothing about my SFS objects.  This is based off of 

JTS has numerous algorithms base on coordinate lists - it's just that in
the initial implementation they operated on Coordinate[] objects.  Many
of them could be rewritten to use CoordinateSequences.  This exacts a
bit of a time penalty in the Java world, since it requires more function
calls.  C++ compilers could probably optimize this away.

After looking at your LineString implementation, I'm not clear how your
"generic algorithms class" handles the full complexity of the SFS
DE-9IM.   Even the "straightforward" predicates have some complexities
which to my analysis require knowledge of the topology of the containing
geometries.  Does your code implement the SFS predicate semantics?

> I store the data.  In addition to this, each of my SFS 
> geometry implmentations (point, linestring, polygon) uses the 
> algorithms differently in order to optimize the processing 
> for each geomoetry type.  Also, this type of strategy allows 

How do you deal with double-dispatch?  E.g. how does
Point.intersect(Geometry) handle different kinds of geometry arguments?
Do you have some way of dealing with the combinatorial explosion of
[type] x [type] ?

Martin Davis, Senior Technical Architect
Vivid Solutions Inc.      www.vividsolutions.com
Suite #1A-2328 Government Street Victoria, B.C. V8T 5G5
Phone: (250) 385 6040 - Local 308 Fax: (250) 385 6046

> -----Original Message-----
> From: Chapman, Martin [mailto:MChapman at sanz.com] 
> Sent: January 31, 2005 3:40 PM
> To: Martin Davis; strk at refractions.net; pramsey at refractions.net
> Subject: RE: [geos-devel] JTS/GEOS performance
> 
> 
> Strk, Martin, Paul,
> 
> Well, I spent some time looking at the code and here are some 
> initial comments.
> 
> 1.  The CoordinateSequence object and it's sub class(s) may 
> be causing issues reguarding processing speed and memory 
> consumption.  In my model I use two vectors to store the x 
> and y points (on the heap), thus I avoid the overhead of of a 
> class instantiation for each coordinate pair. When dealing 
> with millions of points, that can add up.  Also, I provide a 
> way to set and get the vectors by passing pointers to the the 
> arrays that requires no copying of the points.  I saw several 
> places in geos where the coordinate points are copied.  
> Depending on how those functions are used could result in a 
> lot of copies of the points.
> 
> 2.  When I pass points for processing (like intersect, 
> contains, within,
> etc...) I have a generic algorithms class that has basic math 
> functions for doing the operations that take array pointers 
> for X and Y and the algorithms simply operate on the arrays 
> and know nothing about my SFS objects.  This is based off of 
> the same type of design pattern that STL uses...data 
> containers and algorithms.  That way I can easily write new 
> algorithms, or optimized algorithms without changing the way 
> I store the data.  In addition to this, each of my SFS 
> geometry implmentations (point, linestring, polygon) uses the 
> algorithms differently in order to optimize the processing 
> for each geomoetry type.  Also, this type of strategy allows 
> me to avoid the overhead that geos has when doing an 
> intersection for example, because I don't allocate a bunch of 
> class objects for each intersection test.  I noticed in geos 
> that to do an intersect/contains/within/touches/etc... That 
> you are creating and destroying many objects for each geometry. 
> 
> Comments below...
> 1.  Your comment about the linked lists.  Lists versus arrays 
> is really a matter of how you use your data.  For instance, a 
> list is fast when inserting and removing elements in your 
> container because all that is needed is to reassign some 
> pointers, but is extremely slow for random access because you 
> need to traverse the list for each random access. The random 
> search time for a list will be linear as the size of your 
> list grows.  Vectors (arrays) on the other hand are great for 
> random access because they have the [] function that takes an 
> array index, and the array bytes are stored sequentially.  
> Inserting and deleting in an array on the  hand are very 
> slow, because the array needs to be resized frequently which 
> results in a copy of the data.  So, changing you 
> CoordinateSequence object to use a list will only be helpful 
> if you are inserting/removing frequently...which is probably 
> not the case which vector data...so I would stick with 
> vectors if I was you.  Plus I can use memcpy with vectors and 
> pass them easily into OGR without copying points.
> 
> 2.  Smart pointers.  I agree that you guys need a smart 
> pointer class that will work on all of your classes (not for 
> speed though, just mem management).  That's what I have in my 
> model.  I wrote a smart pointer template class (CVPPtr< 
> ISomeInterface>) for my api that works with all my objects, 
> and all my objects are referenced counted and destroy 
> themselves intellegently when the ref count drops to zero.  
> You can use auto_ptr for non-reference counted objects, but 
> that results in a lot more copies of objects than a 
> ref-counted system.  They also intelligently addref whenever 
> you assign the class to another pointer. This pays off big 
> when I have millions of geomoetries that all have a pointer 
> to the same spatial reference class, but can easily be 
> re-assigned individually or in mass to another SRS. 
> 
> I am still looking at the code to see what else I can come up 
> with, just haven't had a lot of time.  The smart pointer code 
> is attached along with a base class template class IVPUnknown 
> and IVPUnknownImpl (which is essentially a cross-platform COM 
> model and one implementation class CVPEnvelope so you could 
> see an example of my smart pointer/ ref counting 
> functionality.  Here is an example of what would happen when 
> using my smart pointer class:
> 
> // in some function
> CVPPtr< IVPEnvelope> spEnvelope = new 
> CVPEnvelope(envelope.MinX, envelope.MinY, envelope.MaxX, 
> envelope.MaxY);  // refcount == 1 CVPPtr< IVPEnvelope> 
> spEnvelope2 = spEnvelope;  // refcount == 2 
> spEnvelope2.Release();  // refcount == 1 
> spEnvelope2.Attach(spEnvelope); // refcount == 1 
> spEnvelope2.Detach(); // refcount == 1
> IVPEnvelope* pEnvelope = spEnvelope; // refcount == 1
> pEnvelope->AddRef(); // refcount == 2;
> spEnvelope2 = pEnvelope; // refcount == 3
> pEnvelope->Release(); // refcount == 2;
> return spEnvelope.Detach(); // refcount == 2
> } // end of funtion - spEnvelope2 goes out of scope and 
> decrements the refcount to 1 again, but the calling function 
> has an object with only one ref count and can destroy it 
> whenever it goes out of scope.  
> 
> Therefore, if the function crashes, all smart pointers are 
> unwound from the stack and the destructor descrement until 
> the refcount == 0 and then the object automatically deletes 
> itself (thus...no mem leaks).  For this to work, all 
> interfaces that need ref counting need to derive from the 
> IVPUnknown interface and all objects need to derive from 
> IVPUnknownImpl...I use multiple inheritance to do this as you 
> will see in CVPEnvelope that I attached.  This code works on 
> windows and any machine that supports pthreads.  I also 
> attached my LineString class for another example.  
> 
> Hope this is of some value to you guys... 
> 
> Martin
> 
> -----Original Message-----
> From: strk at refractions.net [mailto:strk at refractions.net] 
> Sent: Monday, January 31, 2005 3:43 AM
> To: GEOS Development List
> Subject: Re: [geos-devel] JTS/GEOS performance
> 
> 
> On Fri, Jan 28, 2005 at 11:50:47AM -0700, Chapman, Martin wrote:
> > Martin,
> > 
> > First I want to say that you guys are showing a lot of 
> professionalism
> 
> > by how gracefully you guys accept constructive criticism from your
> > users.  Not many developers can take such comments with open minds.
> 
> I think developers with non-open/free-minds will hardly be 
> comfortable with open/free-software developments models.
> 
> > Secondly, I agree with your comments below.  Porting from 
> one language
> 
> > to the other can be tricky when it comes to memory 
> management, and in
> > some cases it will be tough to optimize just because of the 
> nature of 
> > the operation.
> > 
> > That said, from my observations I don't think optimizing 
> GEOS would be
> 
> > as tough as it may appear from the surface.  From what I've 
> seen it's
> > the nicest implementation of the OGC SFS I've seen yet, and 
> the object
> 
> > model is very logical and complete.  I think the 
> optimization strategy
> 
> > should be an incremental one, and a few small changes may 
> solve 80% of
> 
> > the issues for performance and memory usage.  I have a 
> bunch of ideas,
> 
> > but need some time to sit down and get back into the code before I
> > offer any concrete suggestions.  I will take some time this 
> weekend to
> 
> > see if I can come up with some easy changes.
> 
> What I've been thinking as a possible (harmless) step is 
> reducing memory fragmentation by reducing heap-allocated 
> object elements.
> 
> As for the Coordinate copies I've made some tests.
> The following table shows number of created Coordinate and 
> CoordinateSequence objects in relation to the number of 
> output and input points to the Buffer function. Input geom is 
> the one contained in the .xml file attached in my previous post.
> 
>          Input points : 3629
> 
>               buffer | out    | cted    | cted      |
>               units  | points | points  | coord seq |
>             ---------+--------+---------+-----------+
>                  100 |   5640 |   14720 |      2787 |
>                  200 |   5035 |   32206 |     12058 |
>                  500 |   3941 |  120217 |     61498 |
>                 1000 |   3060 |  317219 |    169320 |
> 
> It would be useful to try out a CoordinateSequence 
> implementation using a linked list of Coordinate pointers. It 
> has to be inspected wheter this would cleanly work with 
> current code (ie, wheter implementing the class and make it 
> the DefaultCoordinateSequence will suffice to force it's use 
> in all inner code).
> 
> All the Coordinates in that case would need to be immutable 
> and probably a Coordinate pool should be maintained to take 
> care of their mantagement (when to delete them? ref count? 
> never? on demand?). To reflect the Java way Coordinates (as 
> all objects actually) should be reference counted pointers. 
> Maybe we can do it defining a smart pointer as inherithing 
> from Coordinate.
> 
> Do you C++ guru see these as possible ways ?
> 
> --strk;
> 
> > 
> > As a quick background, I write user interfaces for remote sensing
> > archive systems using MFC for windows desktops and Java/C++ for web 
> > based applications on Linux/Windows.  We manage/provision massive 
> > vector/raster datasets for many government agencies and the 
> > intelligence/military community.  Therefore, my team and I are 
> > constantly seeking out the fastest and most efficient 
> technology for 
> > managing geo-spatial datasets.  We have really enjoyed 
> using PostGIS, 
> > and GDAL/OGR/GEOS.
> > 
> > Keep up the great work and I'll send my suggestions to you this
> > weekend.
> > 
> > Martin
> > 
> > -----Original Message-----
> > From: Martin Davis [mailto:mbdavis at VividSolutions.com]
> > Sent: Friday, January 28, 2005 11:12 AM
> > To: GEOS Development List
> > Subject: RE: [geos-devel] JTS/GEOS performance
> > 
> > 
> > Thanks for the detailed reply, Martin.
> > 
> > Your comments about how objects are handled probably reflects a
> > too-close adherence to the Java origins of the code.  Or, 
> they could 
> > just be plain failure to take advantage of efficiencies 
> possible with 
> > C. In either case, it seems like a substantial revision of the code 
> > base will be required.
> > 
> > Regarding your comment about the inefficiency of computing 
> intersects
> > as ! Disjoint, I'm aware of this issue and am starting to 
> take steps 
> > to address it in JTS.  I agree that for intersects at least 
> a faster 
> > general implementation is possible.  I'd be interested in 
> hearing some
> 
> > details of your implementation - for instance, how do you make the
> > intersection detection efficient?
> > 
> > As for the other named spatial predicates, I'm not sure I 
> see as much
> > opportunity for optimization.  This is due to the fact that most of 
> > the others require precise knowledge about how the boundary of one 
> > geometry interacts with the exterior or interior of another 
> geometry.
> 
> > This in turn requires knowledge of *all* intersections, not just
> > whether an intersection exists.  This might be optimizable for 
> > particular cases of geometry types, but you then have the 
> situation of
> 
> > needing to specify numerous different algorithms for different pairs
> > of geometry types. This is why the initial JTS work 
> focussed on coming
> 
> > up with a general algorithm for computing all predicates in the same
> way.
> > 
> > But it would be great if you see any opportunities for faster
> > implementations in particular cases, and if you could post 
> details so 
> > all can benefit.
> > 
> > 
> > Martin Davis, Senior Technical Architect
> > Vivid Solutions Inc.      www.vividsolutions.com
> > Suite #1A-2328 Government Street Victoria, B.C. V8T 5G5
> > Phone: (250) 385 6040 - Local 308 Fax: (250) 385 6046
> > 
> > 
> > > -----Original Message-----
> > > From: Chapman, Martin [mailto:MChapman at sanz.com]
> > > Sent: January 28, 2005 9:47 AM
> > > To: GEOS Development List
> > > Subject: RE: [geos-devel] JTS/GEOS performance
> > > 
> > > 
> > > I studied the GEOS code and here is my opinion:
> > > 
> > > 1.  GEOS makes multiple copies of the coordinates during many
> > > operations rather than storing the points once and then passing 
> > > pointers.
> > > 
> > > 2.  GEOS has a complicated object model that makes objects for
> > > everything.  To calculate most spatial operations and 
> relationships 
> > > requires the construction and destruction of many many objects.
> > > 
> > > 3.  The DE9IM implementation makes mistakes like intersects =
> > > !disjoint. That requires a full scan of all points in a poly, 
> > > whereas a straight intersect algorithm will return faster most of 
> > > the time because it can exit after it finds the first 
> intersecting 
> > > point.  In general, I think there should be optimized 
> algorithms for
> 
> > > each spatial operation/relationship rather than trying to 
> calculate
> > > the full de9-im matrix.  You can keep the matrix, just change the 
> > > way it gets populated behind the scenes.
> > > 
> > > 4.  When doing operations on point arrays, make functions 
> that takes
> 
> > > pointers to arrays and just pass the pointers in from the geometry
> > > classes.  Currently, when you call a function like 
> intersects GEOS 
> > > makes copies of points and a slew of objects for each geometry 
> > > operation.
> > > 
> > > For a comparison, we tried to use geos with postgresql 
> and used the
> > > POSTGIS intersect method.  We found it very slow so we 
> wrote our own
> 
> > > user-defined intersect function.  Ours is 109 times faster on a
> > > million plus geometries.  That means a one second query 
> for us takes
> 
> > > GEOS 109 seconds.
> > > 
> > > Martin
> > > 
> > > 
> > > 
> > > -----Original Message-----
> > > From: Martin Davis [mailto:mbdavis at VividSolutions.com]
> > > Sent: Friday, January 28, 2005 10:12 AM
> > > To: GEOS Development List
> > > Subject: RE: [geos-devel] JTS/GEOS performance
> > > 
> > > 
> > > Poor alloc performance would be my first guess for the performance
> > > difference.  I would think Java is pretty optimized for memory 
> > > allocation, since it relies on it so heavily.
> > > 
> > > I think one standard approach to handling slow malloc is 
> to build a
> > > sub-allocation layer.  This would be made a bit easier in GEOS, 
> > > since almost the allocation done inside geometry methods 
> is released
> 
> > > at the end of the method.  You could probably even 
> dispense with the
> 
> > > free calls as long as you were operating in the memory pool.  (One
> > > messiness that would need to be thought out is how to support 
> > > exposing the various components such as tree indexes and noding 
> > > while still allowing a custom allocator to be supplied.  Or you
> > > could just forget about this and just support access through 
> > > geometry methods.  This is getting pretty far from the 
> > > philosophy of JTS though...).
> > > 
> > > I'm not enough of a C expert to know if this is the best route to
> > > pursue
> > > - anyone have any other ideas?
> > > 
> > > Martin Davis, Senior Technical Architect
> > > Vivid Solutions Inc.      www.vividsolutions.com
> > > Suite #1A-2328 Government Street Victoria, B.C. V8T 5G5
> > > Phone: (250) 385 6040 - Local 308 Fax: (250) 385 6046
> > > 
> > > 
> > > > -----Original Message-----
> > > > From: strk at refractions.net [mailto:strk at refractions.net]
> > > > Sent: January 28, 2005 4:20 AM
> > > > To: geos-devel at geos.refractions.net
> > > > Subject: [geos-devel] JTS/GEOS performance
> > > > 
> > > > 
> > > > Some time ago I've been researching about GEOS performance
> > > problems as
> > > > related to JTS. Attached is a shapefile and an .xml test
> > > you can use
> > > > to compare the two.
> > > > 
> > > > JTS does not support buffers tests, so you'll need to 
> use another
> > > > method for that. I used JUMP, which reports computation time.
> > > > 
> > > > Well. The operation is a buffer(polygon, 2000).
> > > > 
> > > >  JTS:  18 seconds
> > > > GEOS: 574 seconds (9 minutes, 34 secs)
> > > > 
> > > > GEOS computation keeps the CPU pretty busy (98.2-99.8%) 
> and takes
> > > > up to about 170 MB of ram
> > > > 
> > > > JTS seems to use 3 threads, the bigger using at most 80% of CPU,
> > > > but most of the time far below that point. JUMP reports 104MB 
> > > > committed, but I'm not sure about the meaning.
> > > > 
> > > > For GEOS, valgrind reports (with buffer 500):
> > > >  malloc/free: 2982697 allocs, 2982697 frees, 924407212 bytes
> > > > allocated.
> > > > 
> > > > How much do you think this wild allocation negatively influence
> > > > the poor performance of GEOS ?
> > > > 
> > > > --strk;
> > > > _______________________________________________
> > > > geos-devel mailing list
> > > > geos-devel at geos.refractions.net
> > > > http://geos.refractions.net/mailman/listinfo/geos-devel
> > > > 
> > > _______________________________________________
> > > geos-devel mailing list
> > > geos-devel at geos.refractions.net
> > > http://geos.refractions.net/mailman/listinfo/geos-devel
> > > 
> > > _______________________________________________
> > > geos-devel mailing list
> > > geos-devel at geos.refractions.net
> > > http://geos.refractions.net/mailman/listinfo/geos-devel
> > > 
> > _______________________________________________
> > geos-devel mailing list
> > geos-devel at geos.refractions.net
> > http://geos.refractions.net/mailman/listinfo/geos-devel
> > _______________________________________________
> > geos-devel mailing list
> > geos-devel at geos.refractions.net 
> > http://geos.refractions.net/mailman/listinfo/geos-devel
> _______________________________________________
> geos-devel mailing list
> geos-devel at geos.refractions.net 
> http://geos.refractions.net/mailman/listinfo/geos-devel
>