[gdal-dev] Re: JAVA API - Performance

Fri Nov 20 06:54:48 EST 2009

Ciao Even,
please read below...
-------------------------------------------------------
Ing. Simone Giannecchini
GeoSolutions S.A.S.
Founder - Software Engineer
Via Carignoni 51
55041  Camaiore (LU)
Italy

phone: +39 0584983027
fax:      +39 0584983027
mob:    +39 333 8128928

http://www.geo-solutions.it
http://geo-solutions.blogspot.com/
http://simboss.blogspot.com/
http://www.linkedin.com/in/simonegiannecchini

-------------------------------------------------------

On Wed, Nov 11, 2009 at 12:33 PM, Even Rouault
<even.rouault at mines-paris.org> wrote:
> Selon Simone Giannecchini <simone.giannecchini at geo-solutions.it>:
>
> Simone,
>
> thanks for all this interesting information. I'm doing some experiment with Sun
> JRE 1.6 and I observe that array pinning doesn't really work. I mean that the
> jenv->GetByteArrayElements($input, &isCopy) call always return isCopy ==
> JNI_TRUE, which seems to be confirmed by
> http://elliotth.blogspot.com/2007/03/optimizing-jni-array-access.html (although
> it is a bit outdated now) : "[...] current Sun JVMs never give you a direct
> pointer. They always copy".

I kind of remember that we tested this in the past, but I am not 100%
sure. What I can tell you is that I
did not test this with the modified GDAL bindings.

>
> I've tried GetPrimitiveArrayCritical() / ReleasePrimitiveArrayCritical() and
> I've observed they return non-copy pointers, which make them good candidates.
> Looking at
> http://java.sun.com/j2se/1.3/docs/guide/jni/jni-12.html#GetPrimitiveArrayCritical,
> it appears that there are restrictions to use them. But this is probably OK as
> we don't callback Java code in the RasterIO operations. But I'm not sure if the
> native operation can be considered to be fast enough to use those primitives. It
> depends on the query size and the driver. Perhaps it is not worth the risk and
> that the copy made by the JVM when calling GetByteArrayElements() is not that
> significant for modest size arrays.
>

I decide not to take the risk. The problem with that call is that It
should block the GarbageCollector and this is highly counterproductive
in server side application (you tend to prefer less pause times more
often than  a long pause time once ia while).

> Another possibility is that in the ReadRaster() case, we don't care about the
> initial values of the array. We just want to update it in fact, so we could
> allocate a working buffer in the JNI code, call the native RasterIO() on it and
> upload its content into the Java array with SetByteArrayRegion(). This would
> save one copy.

For the reasons I said in my previous email I would refrain from
allocating raster data inside the native code, the JVM seems to not
like that too much.
Moreover, the Java flag to control the native memory space is prefixed
XX which means that it might stop orking with no notice in a newer
java release.
I guess that the best thing, IMHO, is to stick with pinning and hope
for the best, however, it might be interesting to do kind of a testing
session having the geoserver hitting the gdal code for something like
ECW and see what happens with the different options in term of
stability/performance/robustness, what do you think?

Simone.
>
> Have you also observed this ? Which JNI calls do you use ?
>
> Best regards,
> Even
>
>> Ciao Even,
>> just wanted to add my 2 cents.
>>
>> As you know for the imageio-ext project we have been using the
>> GDAL-JNI bindings (actually a modified version of them) for a while in
>> order to allow Java users to leverage on GDAL using the ImageIO
>> framework which standard in Java.
>> This way we also enabled GeoTools and GeoServer to use GDAL as a datasource.
>>
>> In the past I have done quite some performance tests to add some
>> new/different methods to them and I can summarise our findings as
>> follows:
>>
>> - DirectByteBuffer vs regular arrays -
>> DBB is expensive to allocate but prevent the VM from performing copies
>> when having to move data to and from java and native code since they
>> live on the native space not on the java heap; On the other side the
>> regular arrays are fast to allocate but they are "usually" copied when
>> moved across from/to java and native code since the JVM cannot leave
>> the native code mess with the java heap space since the garbage
>> collector would not be very happy about that. I said "usually" since
>> there is a technique called array pinning that we can suggest the JVM
>> to use to avoid the copy of regular array; however this mechanism is
>> not guaranteed to be implemented and/or to work on each call (same
>> reason as above, GC is not happy about this technique).
>>
>> If you can pool the DBB  and/or use a few large DBB, where the cost of
>> the copy would overcome the cost of its creation then DBB are much
>> better than regular arrays. As an instance I noticed that using when
>> reading striped tiff files regular arrays where faster, but as the
>> tile size increases (and therefore the cost of a copy overcomes the
>> cost of a DBB creation) the DBB performs much better
>>
>> - DirectByteBuffer and the impact on some JVM -
>> Now in the past we decided to stick with DBB and give
>> GeoServer/GeoTools users the capability to retile data on the fly.
>> However lately, during the WMS performance shootout we noticed on some
>> linux machines JVm soldi crashed, not nice (means restarting the
>> GeoServer!!!).
>> We investigated a bit in depth and the problem was that somehow the
>> JVM was failing to allocate some internal images during the rendering
>> process and then dying with a NullPointerException (apparently the SUN
>> Java2D engineers did not use to check for out of memory errors in the
>> java native space). Well, what happens is that if you use too much of
>> the Java native space for your own objects, it is likely that the JVM
>> itself will start to malfunction (you can find articles on the web on
>> the memory model of a Java process, I don't think I am good enough to
>> explain it ) since it cannot allocate its own objects.
>>
>> In the end we decide to leave DBB and go back to regular arrays with
>> array pinning. This ensured us robustness and we did not see much
>> performance degradation (which means that array pinning in the end
>> works). This has been implemented by modifying the SWIG bindings for
>> GDAL in order to use a byte array instead of a DBB and then use
>> ByteArray utils to convert between different native type (short, int,
>> etc..).
>>
>> - Conclusion -
>> We might want to spend some time in the mid term to contribute some of
>> this work back (or probably provide funding), but anyway, it would be
>> great to have the capability to switch between DBB and regular arrays
>> since both have flaws.
>> However atm if I were asked I would say to go with regular arrays as
>> we do in the imageio-ext project.
>>
>> Ciao,
>> Simone.
>> -------------------------------------------------------
>> Ing. Simone Giannecchini
>> GeoSolutions S.A.S.
>> Founder - Software Engineer
>> Via Carignoni 51
>> 55041  Camaiore (LU)
>> Italy
>>
>> phone: +39 0584983027
>> fax:      +39 0584983027
>> mob:    +39 333 8128928
>>
>>
>> http://www.geo-solutions.it
>> http://geo-solutions.blogspot.com/
>> http://simboss.blogspot.com/
>> http://www.linkedin.com/in/simonegiannecchini
>>
>> -------------------------------------------------------
>>
>>
>>
>> On Tue, Nov 10, 2009 at 12:00 PM, Even Rouault
>> <even.rouault at mines-paris.org> wrote:
>> > Selon Ivan <ivan.lucena at pmldnet.com>:
>> >
>> > Ivan,
>> >
>> > thanks for your testing (CC'ing the list as it is of general interest).
>> > Actually, I also read on some sites that using ByteBuffer object versus
>> regular
>> > Java arrays is not always a win. Plus the fact that we must use a direct
>> buffer
>> > that has an extra allocation cost according to the Javadoc. So ByteBuffer
>> might
>> > be interesting if you just want to pass big arrays between native code, for
>> > example if you read an array from a dataset and then write it to another
>> one
>> > without accessing it from the Java side. When you mention that accessing
>> through
>> > the byte[] array was faster, did you get it with the array() method instead
>> ?
>> > I'm wondering what the performance overhead of this call is.
>> >
>> > As ByteBuffer is not at all a requirement for the interface with the native
>> > code, it would be technically possible to add an alternative API that would
>> use
>> > the regular Java array types.
>> >
>> > Would you mind opening an enhancement ticket about that ? Thanks
>> >
>> > Even
>> >
>> >> Even,
>> >>
>> >> I did some test with the GDAL Java API and some simple raster operations
>> >> like the GDAL Proximity algorthm and I noticed that the performance while
>> >> accessing pixels with <type>Buffer.get(i), <type>Buffer.put(i,value) is
>> not
>> >> as good as if you copy then to (or from) a "regular" array, like float[],
>> >> double[], integer[] and byte[].
>> >>
>> >> The reason for that is obvious, get() and put() are funtion calls and
>> >> contains a lot of code for range check.
>> >>
>> >> If I understand it correctly, ByteBuffer is the ideal or maybe the only
>> >> way to get access to Buffers from C libraries thought a Java wrapper. But
>> >> do you it would be possible to incapsulate the buffer conversion at the
>> >> wrapper code so that users would be able to read and write direct to
>> >> regular Java arrays?
>> >>
>> >> Just a suggestion,
>> >>
>> >> Ivan
>> >>
>> >
>> >
>> > _______________________________________________
>> > gdal-dev mailing list
>> > gdal-dev at lists.osgeo.org
>> > http://lists.osgeo.org/mailman/listinfo/gdal-dev
>> >
>>
>
>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/gdal-dev
>