[SoC] gdalwarp OpenCL Performance (Week 9)
Seth Price
seth at pricepages.org
Tue Jul 27 04:08:45 EDT 2010
I just finished the first performance tests of my gdalwarp OpenCL
code. It's doing better than I expected. I used this command:
"time gdalwarp -q -r lanczos -t_srs '+proj=merc +a=6378137.0
+b=6378137.0 +nadgrids=@null +wktext +units=m' big_test.tif
big_test.out.tif"
I can compile the OpenCL code two different ways. I can run OpenCL
code on the CPU and distribute it across processors by selecting the
CPU as the device. This compiles a multithreaded version of the code.
By selecting the GPU device, the OpenCL code compiles to run on my Mac
Pro's graphics card, a GeForce GTX 285. To test, I used a 80 MB RGB
raster, with 8 bits per channel.
With the original lanczos resampler code I get 5:31, with OpenCL on my
Mac Pro's 16 cores 0:39, and with OpenCL on my GTX 285 0:10. That's a
36x speedup.
Using cubicspline resampling, the original code takes 0:59, the OpenCL
CPU code takes 0:13, and the OpenCL GPU code takes 0:08. Still a
significant speedup.
And with cubic resampling, the original code takes 0:19, OpenCL CPU
takes 0:09, and OpenCL GPU takes 0:07. Still better than twice as fast.
Basically, the OpenCL GPU code in all cases is I/O bound. The GPU is
laughing and requesting more difficult work.
I haven't tested all different types of data and commands. If anyone
has any samples and warping commands for testing, now would be the
time to send them to me. I don't know of any GPU bugs in the current
code.
Here is my current code:
http://github.com/mailseth/OpenCL-integration-for-GRASS---GDAL
~Seth
More information about the SoC
mailing list