[gdal-dev] Re: JAVA API - Performance

Tue Nov 10 12:36:19 EST 2009

Ciao Even,
just wanted to add my 2 cents.

As you know for the imageio-ext project we have been using the
GDAL-JNI bindings (actually a modified version of them) for a while in
order to allow Java users to leverage on GDAL using the ImageIO
framework which standard in Java.
This way we also enabled GeoTools and GeoServer to use GDAL as a datasource.

In the past I have done quite some performance tests to add some
new/different methods to them and I can summarise our findings as
follows:

- DirectByteBuffer vs regular arrays -
DBB is expensive to allocate but prevent the VM from performing copies
when having to move data to and from java and native code since they
live on the native space not on the java heap; On the other side the
regular arrays are fast to allocate but they are "usually" copied when
moved across from/to java and native code since the JVM cannot leave
the native code mess with the java heap space since the garbage
collector would not be very happy about that. I said "usually" since
there is a technique called array pinning that we can suggest the JVM
to use to avoid the copy of regular array; however this mechanism is
not guaranteed to be implemented and/or to work on each call (same
reason as above, GC is not happy about this technique).

If you can pool the DBB  and/or use a few large DBB, where the cost of
the copy would overcome the cost of its creation then DBB are much
better than regular arrays. As an instance I noticed that using when
reading striped tiff files regular arrays where faster, but as the
tile size increases (and therefore the cost of a copy overcomes the
cost of a DBB creation) the DBB performs much better

- DirectByteBuffer and the impact on some JVM -
Now in the past we decided to stick with DBB and give
GeoServer/GeoTools users the capability to retile data on the fly.
However lately, during the WMS performance shootout we noticed on some
linux machines JVm soldi crashed, not nice (means restarting the
GeoServer!!!).
We investigated a bit in depth and the problem was that somehow the
JVM was failing to allocate some internal images during the rendering
process and then dying with a NullPointerException (apparently the SUN
Java2D engineers did not use to check for out of memory errors in the
java native space). Well, what happens is that if you use too much of
the Java native space for your own objects, it is likely that the JVM
itself will start to malfunction (you can find articles on the web on
the memory model of a Java process, I don't think I am good enough to
explain it ) since it cannot allocate its own objects.

In the end we decide to leave DBB and go back to regular arrays with
array pinning. This ensured us robustness and we did not see much
performance degradation (which means that array pinning in the end
works). This has been implemented by modifying the SWIG bindings for
GDAL in order to use a byte array instead of a DBB and then use
ByteArray utils to convert between different native type (short, int,
etc..).

- Conclusion -
We might want to spend some time in the mid term to contribute some of
this work back (or probably provide funding), but anyway, it would be
great to have the capability to switch between DBB and regular arrays
since both have flaws.
However atm if I were asked I would say to go with regular arrays as
we do in the imageio-ext project.

Ciao,
Simone.
-------------------------------------------------------
Ing. Simone Giannecchini
GeoSolutions S.A.S.
Founder - Software Engineer
Via Carignoni 51
55041  Camaiore (LU)
Italy

phone: +39 0584983027
fax:      +39 0584983027
mob:    +39 333 8128928

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://simboss.blogspot.com/
http://www.linkedin.com/in/simonegiannecchini

-------------------------------------------------------

On Tue, Nov 10, 2009 at 12:00 PM, Even Rouault
<even.rouault at mines-paris.org> wrote:
> Selon Ivan <ivan.lucena at pmldnet.com>:
>
> Ivan,
>
> thanks for your testing (CC'ing the list as it is of general interest).
> Actually, I also read on some sites that using ByteBuffer object versus regular
> Java arrays is not always a win. Plus the fact that we must use a direct buffer
> that has an extra allocation cost according to the Javadoc. So ByteBuffer might
> be interesting if you just want to pass big arrays between native code, for
> example if you read an array from a dataset and then write it to another one
> without accessing it from the Java side. When you mention that accessing through
> the byte[] array was faster, did you get it with the array() method instead ?
> I'm wondering what the performance overhead of this call is.
>
> As ByteBuffer is not at all a requirement for the interface with the native
> code, it would be technically possible to add an alternative API that would use
> the regular Java array types.
>
> Would you mind opening an enhancement ticket about that ? Thanks
>
> Even
>
>> Even,
>>
>> I did some test with the GDAL Java API and some simple raster operations
>> like the GDAL Proximity algorthm and I noticed that the performance while
>> accessing pixels with <type>Buffer.get(i), <type>Buffer.put(i,value) is not
>> as good as if you copy then to (or from) a "regular" array, like float[],
>> double[], integer[] and byte[].
>>
>> The reason for that is obvious, get() and put() are funtion calls and
>> contains a lot of code for range check.
>>
>> If I understand it correctly, ByteBuffer is the ideal or maybe the only
>> way to get access to Buffers from C libraries thought a Java wrapper. But
>> do you it would be possible to incapsulate the buffer conversion at the
>> wrapper code so that users would be able to read and write direct to
>> regular Java arrays?
>>
>> Just a suggestion,
>>
>> Ivan
>>
>
>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/gdal-dev
>