[gdal-dev] Re: JAVA API - Performance - Wow!

Simone Giannecchini simone.giannecchini at geo-solutions.it
Fri Nov 20 06:24:34 EST 2009


Ciao Ivan,
wasn't Java supposed to slow? :-)

Anyway, Even has done a good job lately to improve the Java bindings,
I am trying to spare some of our time to track his work closer and
give more feedback.

Ciao,
Simone
-------------------------------------------------------
Ing. Simone Giannecchini
GeoSolutions S.A.S.
Founder - Software Engineer
Via Carignoni 51
55041  Camaiore (LU)
Italy

phone: +39 0584983027
fax:      +39 0584983027
mob:    +39 333 8128928


http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://simboss.blogspot.com/
http://www.linkedin.com/in/simonegiannecchini

-------------------------------------------------------



On Fri, Nov 20, 2009 at 2:05 AM, Ivan Lucena <ivan.lucena at pmldnet.com> wrote:
> Even,
>
> As I said before, the new API is great but I need to report a correction on my performance analysis.
>
> I found that an unnecessary float-to-string conversion was taking much of the processing time on my code.
>
> After I remove that from the code the performance of the Frank's proximity algorithm in C or in Java is basically *identical*. No kidding!
>
> My best regards,
>
> Ivan
>
>
>>  -------Original Message-------
>>  From: Ivan <ivan.lucena at pmldnet.com>
>>  Subject: Re: [gdal-dev] Re: JAVA API - Performance
>>  Sent: Nov 18 '09 16:05
>>
>>  Even,
>>
>>  I just got it to work with the new API today. That is great!
>>
>>  And just liike you said, the advantage is in usability not performance.
>>
>>  Thanks,
>>
>>  Ivan
>>
>>  >  -------Original Message-------
>>  >  From: Even Rouault <even.rouault at mines-paris.org>
>>  >  Subject: Re: [gdal-dev] Re: JAVA API - Performance
>>  >  Sent: Nov 14 '09 17:32
>>  >
>>  >  Selon Ivan <ivan.lucena at pmldnet.com>:
>>  >
>>  >  I've commited new API that adds ReadRaster() and WriteRaster() methods that use
>>  >  the regular Java arrays (byte[], short[], int[], float[], double[]). See
>>  >  http://gdal.org/java
>>  >
>>  >  On my PC,
>>  >  http://trac.osgeo.org/gdal/browser/trunk/gdal/swig/java/apps/GDALTestIO.java
>>  >  runs in about 20.3s for the ReadRaster()/WriteRaster() case, and in about 24.7s
>>  >  for the ReadRaster_Direct()/WriteRaster_Direct() case. Not a big advantage
>>  >  (which tends to not any advantage at all when run with the -server flag, as both
>>  >  run in about 21.3 s !), but regular Java arrays are a bit easier to use than
>>  >  ByteBuffer (especially that with Sun JVM 1.6, the array() method on ByteBuffer
>>  >  is not implemented).
>>  >
>>  >  > Even,
>>  >  >
>>  >  > You are right. The point is how to take full advantage of the GDAL Java API
>>  >  > choosing the right
>>  >  > approach to deal with the raster buffer on the client side.
>>  >  >
>>  >  > Best regards,
>>  >  >
>>  >  > Ivan
>>  >  >
>>  >  > Even Rouault wrote:
>>  >  > > Selon Ivan <ivan.lucena at pmldnet.com>:
>>  >  > >
>>  >  > > Ivan,
>>  >  > >
>>  >  > > I'm not sure what you are really measuring if you compare a C++ code versus
>>  >  > its
>>  >  > > translation to Java code. I think it just reflects the known slowdown of
>>  >  > Java
>>  >  > > when doing intensive computations in comparison to native code. The 0.2
>>  >  > second
>>  >  > > difference between the regular array version and the ByteBuffer one is the
>>  >  > > interesting result, not the 1.2/1.0 second difference between C++ and Java.
>>  >  > >
>>  >  > >> Caio Simone,
>>  >  > >>
>>  >  > >> I just downloaded imageio-ext to check how it does that but it looks like
>>  >  > I
>>  >  > >> don't need to do that now, I can take you report instead. Thank you very
>>  >  > >> much. I will take a look on array pinning for a start.
>>  >  > >>
>>  >  > >> I translated the GDAL Proximity [1] code to Java and I timed both of then
>>  >  > >> with the same input, a 1024x1024 byte image with just one pixel as feature
>>  >  > at
>>  >  > >> the center of the image.
>>  >  > >>
>>  >  > >> It took 0.3 seconds in C++ and 1.5 seconds in Java!
>>  >  > >>
>>  >  > >> I then translated the buffers to regular arrays and it went down a little
>>  >  > >> bit, 1.3 seconds.
>>  >  > >>
>>  >  > >> It is still a big disadvantage. I believe that the buffer-to-buffer
>>  >  > >> translation is the guilt time waster in that case.
>>  >  > >>
>>  >  > >> [1] http://trac.osgeo.org/gdal/browser/trunk/gdal/alg/gdalproximity.cpp
>>  >  > >>
>>  >  > >> My best regards,
>>  >  > >>
>>  >  > >> Ivan
>>  >  > >>
>>  >  > >>>  -------Original Message-------
>>  >  > >>>  From: Simone Giannecchini <simone.giannecchini at geo-solutions.it>
>>  >  > >>>  Subject: Re: [gdal-dev] Re: JAVA API - Performance
>>  >  > >>>  Sent: Nov 10 '09 12:36
>>  >  > >>>
>>  >  > >>>  Ciao Even,
>>  >  > >>>  just wanted to add my 2 cents.
>>  >  > >>>
>>  >  > >>>  As you know for the imageio-ext project we have been using the
>>  >  > >>>  GDAL-JNI bindings (actually a modified version of them) for a while in
>>  >  > >>>  order to allow Java users to leverage on GDAL using the ImageIO
>>  >  > >>>  framework which standard in Java.
>>  >  > >>>  This way we also enabled GeoTools and GeoServer to use GDAL as a
>>  >  > >> datasource.
>>  >  > >>>  In the past I have done quite some performance tests to add some
>>  >  > >>>  new/different methods to them and I can summarise our findings as
>>  >  > >>>  follows:
>>  >  > >>>
>>  >  > >>>  - DirectByteBuffer vs regular arrays -
>>  >  > >>>  DBB is expensive to allocate but prevent the VM from performing copies
>>  >  > >>>  when having to move data to and from java and native code since they
>>  >  > >>>  live on the native space not on the java heap; On the other side the
>>  >  > >>>  regular arrays are fast to allocate but they are "usually" copied when
>>  >  > >>>  moved across from/to java and native code since the JVM cannot leave
>>  >  > >>>  the native code mess with the java heap space since the garbage
>>  >  > >>>  collector would not be very happy about that. I said "usually" since
>>  >  > >>>  there is a technique called array pinning that we can suggest the JVM
>>  >  > >>>  to use to avoid the copy of regular array; however this mechanism is
>>  >  > >>>  not guaranteed to be implemented and/or to work on each call (same
>>  >  > >>>  reason as above, GC is not happy about this technique).
>>  >  > >>>
>>  >  > >>>  If you can pool the DBB  and/or use a few large DBB, where the cost of
>>  >  > >>>  the copy would overcome the cost of its creation then DBB are much
>>  >  > >>>  better than regular arrays. As an instance I noticed that using when
>>  >  > >>>  reading striped tiff files regular arrays where faster, but as the
>>  >  > >>>  tile size increases (and therefore the cost of a copy overcomes the
>>  >  > >>>  cost of a DBB creation) the DBB performs much better
>>  >  > >>>
>>  >  > >>>  - DirectByteBuffer and the impact on some JVM -
>>  >  > >>>  Now in the past we decided to stick with DBB and give
>>  >  > >>>  GeoServer/GeoTools users the capability to retile data on the fly.
>>  >  > >>>  However lately, during the WMS performance shootout we noticed on some
>>  >  > >>>  linux machines JVm soldi crashed, not nice (means restarting the
>>  >  > >>>  GeoServer!!!).
>>  >  > >>>  We investigated a bit in depth and the problem was that somehow the
>>  >  > >>>  JVM was failing to allocate some internal images during the rendering
>>  >  > >>>  process and then dying with a NullPointerException (apparently the SUN
>>  >  > >>>  Java2D engineers did not use to check for out of memory errors in the
>>  >  > >>>  java native space). Well, what happens is that if you use too much of
>>  >  > >>>  the Java native space for your own objects, it is likely that the JVM
>>  >  > >>>  itself will start to malfunction (you can find articles on the web on
>>  >  > >>>  the memory model of a Java process, I don't think I am good enough to
>>  >  > >>>  explain it ) since it cannot allocate its own objects.
>>  >  > >>>
>>  >  > >>>  In the end we decide to leave DBB and go back to regular arrays with
>>  >  > >>>  array pinning. This ensured us robustness and we did not see much
>>  >  > >>>  performance degradation (which means that array pinning in the end
>>  >  > >>>  works). This has been implemented by modifying the SWIG bindings for
>>  >  > >>>  GDAL in order to use a byte array instead of a DBB and then use
>>  >  > >>>  ByteArray utils to convert between different native type (short, int,
>>  >  > >>>  etc..).
>>  >  > >>>
>>  >  > >>>  - Conclusion -
>>  >  > >>>  We might want to spend some time in the mid term to contribute some of
>>  >  > >>>  this work back (or probably provide funding), but anyway, it would be
>>  >  > >>>  great to have the capability to switch between DBB and regular arrays
>>  >  > >>>  since both have flaws.
>>  >  > >>>  However atm if I were asked I would say to go with regular arrays as
>>  >  > >>>  we do in the imageio-ext project.
>>  >  > >>>
>>  >  > >>>  Ciao,
>>  >  > >>>  Simone.
>>  >  > >>>  -------------------------------------------------------
>>  >  > >>>  Ing. Simone Giannecchini
>>  >  > >>>  GeoSolutions S.A.S.
>>  >  > >>>  Founder - Software Engineer
>>  >  > >>>  Via Carignoni 51
>>  >  > >>>  55041  Camaiore (LU)
>>  >  > >>>  Italy
>>  >  > >>>
>>  >  > >>>  phone: +39 0584983027
>>  >  > >>>  fax:      +39 0584983027
>>  >  > >>>  mob:    +39 333 8128928
>>  >  > >>>
>>  >  > >>>
>>  >  > >>>  http://www.geo-solutions.it
>>  >  > >>>  http://geo-solutions.blogspot.com/
>>  >  > >>>  http://simboss.blogspot.com/
>>  >  > >>>  http://www.linkedin.com/in/simonegiannecchini
>>  >  > >>>
>>  >  > >>>  -------------------------------------------------------
>>  >  > >>>
>>  >  > >>>
>>  >  > >>>
>>  >  > >>>  On Tue, Nov 10, 2009 at 12:00 PM, Even Rouault
>>  >  > >>>  <even.rouault at mines-paris.org> wrote:
>>  >  > >>>  > Selon Ivan <ivan.lucena at pmldnet.com>:
>>  >  > >>>  >
>>  >  > >>>  > Ivan,
>>  >  > >>>  >
>>  >  > >>>  > thanks for your testing (CC'ing the list as it is of general
>>  >  > interest).
>>  >  > >>>  > Actually, I also read on some sites that using ByteBuffer object
>>  >  > versus
>>  >  > >> regular
>>  >  > >>>  > Java arrays is not always a win. Plus the fact that we must use a
>>  >  > direct
>>  >  > >> buffer
>>  >  > >>>  > that has an extra allocation cost according to the Javadoc. So
>>  >  > >> ByteBuffer might
>>  >  > >>>  > be interesting if you just want to pass big arrays between native
>>  >  > code,
>>  >  > >> for
>>  >  > >>>  > example if you read an array from a dataset and then write it to
>>  >  > another
>>  >  > >> one
>>  >  > >>>  > without accessing it from the Java side. When you mention that
>>  >  > accessing
>>  >  > >> through
>>  >  > >>>  > the byte[] array was faster, did you get it with the array() method
>>  >  > >> instead ?
>>  >  > >>>  > I'm wondering what the performance overhead of this call is.
>>  >  > >>>  >
>>  >  > >>>  > As ByteBuffer is not at all a requirement for the interface with the
>>  >  > >> native
>>  >  > >>>  > code, it would be technically possible to add an alternative API that
>>  >  > >> would use
>>  >  > >>>  > the regular Java array types.
>>  >  > >>>  >
>>  >  > >>>  > Would you mind opening an enhancement ticket about that ? Thanks
>>  >  > >>>  >
>>  >  > >>>  > Even
>>  >  > >>>  >
>>  >  > >>>  >> Even,
>>  >  > >>>  >>
>>  >  > >>>  >> I did some test with the GDAL Java API and some simple raster
>>  >  > >> operations
>>  >  > >>>  >> like the GDAL Proximity algorthm and I noticed that the performance
>>  >  > >> while
>>  >  > >>>  >> accessing pixels with <type>Buffer.get(i), <type>Buffer.put(i,value)
>>  >  > is
>>  >  > >> not
>>  >  > >>>  >> as good as if you copy then to (or from) a "regular" array, like
>>  >  > >> float[],
>>  >  > >>>  >> double[], integer[] and byte[].
>>  >  > >>>  >>
>>  >  > >>>  >> The reason for that is obvious, get() and put() are funtion calls and
>>  >  > >>>  >> contains a lot of code for range check.
>>  >  > >>>  >>
>>  >  > >>>  >> If I understand it correctly, ByteBuffer is the ideal or maybe the
>>  >  > only
>>  >  > >>>  >> way to get access to Buffers from C libraries thought a Java wrapper.
>>  >  > >> But
>>  >  > >>>  >> do you it would be possible to incapsulate the buffer conversion at
>>  >  > the
>>  >  > >>>  >> wrapper code so that users would be able to read and write direct to
>>  >  > >>>  >> regular Java arrays?
>>  >  > >>>  >>
>>  >  > >>>  >> Just a suggestion,
>>  >  > >>>  >>
>>  >  > >>>  >> Ivan
>>  >  > >>>  >>
>>  >  > >>>  >
>>  >  > >>>  >
>>  >  > >>>  > _______________________________________________
>>  >  > >>>  > gdal-dev mailing list
>>  >  > >>>  > gdal-dev at lists.osgeo.org
>>  >  > >>>  > http://lists.osgeo.org/mailman/listinfo/gdal-dev
>>  >  > >>>  >
>>  >  > >>>
>>  >  > >
>>  >  > >
>>  >  > >
>>  >  >
>>  >  >
>>  >
>>  >
>>  >
>>  _______________________________________________
>>  gdal-dev mailing list
>>  gdal-dev at lists.osgeo.org
>>  http://lists.osgeo.org/mailman/listinfo/gdal-dev
>>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/gdal-dev
>


More information about the gdal-dev mailing list