[SoC] gdalwarp OpenCL Performance (Week 9)

Seth Price seth at pricepages.org
Tue Jul 27 04:08:45 EDT 2010


I just finished the first performance tests of my gdalwarp OpenCL  
code. It's doing better than I expected. I used this command:
"time gdalwarp -q -r lanczos -t_srs '+proj=merc +a=6378137.0  
+b=6378137.0 +nadgrids=@null +wktext +units=m' big_test.tif  
big_test.out.tif"

I can compile the OpenCL code two different ways. I can run OpenCL  
code on the CPU and distribute it across processors by selecting the  
CPU as the device. This compiles a multithreaded version of the code.  
By selecting the GPU device, the OpenCL code compiles to run on my Mac  
Pro's graphics card, a GeForce GTX 285. To test, I used a 80 MB RGB  
raster, with 8 bits per channel.

With the original lanczos resampler code I get 5:31, with OpenCL on my  
Mac Pro's 16 cores 0:39, and with OpenCL on my GTX 285 0:10. That's a  
36x speedup.

Using cubicspline resampling, the original code takes 0:59, the OpenCL  
CPU code takes 0:13, and the OpenCL GPU code takes 0:08. Still a  
significant speedup.

And with cubic resampling, the original code takes 0:19, OpenCL CPU  
takes 0:09, and OpenCL GPU takes 0:07. Still better than twice as fast.

Basically, the OpenCL GPU code in all cases is I/O bound. The GPU is  
laughing and requesting more difficult work.

I haven't tested all different types of data and commands. If anyone  
has any samples and warping commands for testing, now would be the  
time to send them to me. I don't know of any GPU bugs in the current  
code.

Here is my current code:
http://github.com/mailseth/OpenCL-integration-for-GRASS---GDAL

~Seth


More information about the SoC mailing list