[gdal-dev] Re: OpenCL, GDAL, and You

Mon Dec 6 18:12:03 EST 2010

Hi Seth,

I gave a try to Frank's integration of your work with my ATI Radeon HD 5400. I 
got the ATI SDK 2.2 correctly installed with latest ATI drivers (10-11). The 
few OpenCL demos provided with the SDK I tried work on the GPU device.

(Note: "thanks" to the errors bellow, I've made a few cleanups in 
http://trac.osgeo.org/gdal/changeset/21205 to improve error reporting and 
clean-up in case of error. I also added missing initialization of 2 members of 
the warper struct)

Then I tried a simple warp and I got the following error :

"""
0ERROR 1: Error: Failed to build program executable!
Build Log:
Warning: invalid option: -cl-fast-relaxed-math

Warning: invalid option: -Werror

/tmp/OCL3QLDj2.cl(193): warning: double-precision constant is represented as
          single-precision constant because double is not enabled
  return (float2)(-99.0, -99.0);
                   ^

/tmp/OCL3QLDj2.cl(193): warning: double-precision constant is represented as
          single-precision constant because double is not enabled
  return (float2)(-99.0, -99.0);
                          ^

/tmp/OCL3QLDj2.cl(195): error: bad argument type to opencl image op: expected
          sampler_t
  CLK_NORMALIZED_COORDS_TRUE |
  ^

1 error detected in the compilation of "/tmp/OCL3QLDj2.cl".

ERROR 1: Error at file gdalwarpkernel_opencl.c line 2228: 
CL_BUILD_PROGRAM_FAILURE
""""

Hum, then I looked at similar code in the neighbourhood and I came with the 
following change that solves the compilation. It is not commited yet, does it 
look ok to you ?

Index: alg/gdalwarpkernel_opencl.c
===================================================================

--- alg/gdalwarpkernel_opencl.c	(révision 21205)
+++ alg/gdalwarpkernel_opencl.c	(copie de travail)
@@ -724,13 +724,13 @@
     // Check & return when the thread group overruns the image size
     "if (nDstX >= iDstWidth || nDstY >= iDstHeight)\n"
         "return (float2)(-99.0, -99.0);\n"
+
+    "const sampler_t samp =  CLK_NORMALIZED_COORDS_TRUE |\n"
+                            "CLK_ADDRESS_CLAMP_TO_EDGE |\n"
+                            "CLK_FILTER_LINEAR;\n"
+
+    "float4  fSrcCoords = read_imagef(srcCoords,samp,fDst);\n"
     
-    "float4  fSrcCoords = read_imagef(srcCoords,\n"
-                                     "CLK_NORMALIZED_COORDS_TRUE |\n"
-                                        "CLK_ADDRESS_CLAMP_TO_EDGE |\n"
-                                        "CLK_FILTER_LINEAR,\n"
-                                     "fDst);\n"
-    
     "return (float2)(fSrcCoords.x, fSrcCoords.y);\n"
 "}\n";

     
After solving the compilation error, I'm stuck with  :

"""
0ERROR 1: Error at file gdalwarpkernel_opencl.c line 1391: 
CL_INVALID_IMAGE_SIZE
ERROR 1: Error at file gdalwarpkernel_opencl.c line 1391: CL_INVALID_IMAGE_SIZE
ERROR 1: Error at file gdalwarpkernel_opencl.c line 2292: CL_INVALID_IMAGE_SIZE
ERROR 1: Error at file gdalwarpkernel_opencl.c line 1391: CL_INVALID_IMAGE_SIZE
ERROR 1: OpenCL routines reported failure (-40) on line 2570.
"""

The revelant line is :
"""
        //Make a fake image so we don't have a NULL pointer
        (*srcImag) = clCreateImage2D(warper->context, CL_MEM_READ_ONLY, 
&imgFmt,
                                     1, 1, sz, NULL, &err);
        handleErr(err);
"""

Any ideas ? Did you try with ATI or NVidia cards ?

I've attached the output of CLInfo if it can help.

Best regards,

Even

Le lundi 06 décembre 2010 15:50:04, Seth Price a écrit :
> Over the summer I rewrote the warper to use OpenCL. There was a 2x to
> 50x speedup. Here is a description of what I did:
> http://osgeo-org.1803224.n2.nabble.com/gdal-dev-gdalwarp-OpenCL-Performance
> -Week-9-td5341226.html
> 
> ~Seth
> 
> On Dec 6, 2010, at 5:10 AM, Konstantin Baumann wrote:
> > Hi,
> > 
> > what benefit/improvement would the OpenCL integration bring to GDAL?
> > Additional functionality or a speedup of existing functions?
> > Probably only operations on images and/or rasters are supported;
> > reprojection/warping and filtering would be good candidates, right?
> > What concrete operations would be supported?
> > 
> > Kosta
> > 
> > -----Original Message-----
> > From: gdal-dev-bounces at lists.osgeo.org
> > [mailto:gdal-dev-bounces at lists.osgeo.org ] On Behalf Of Frank Warmerdam
> > (External)
> > Sent: Monday, December 06, 2010 1:43 AM
> > To: Seth Price
> > Cc: Philippe Vachon; gdal-dev; Wolf Bergenheim
> > Subject: [gdal-dev] Re: OpenCL, GRASS, GDAL, and You
> > 
> > On 10-09-23 10:09 AM, Seth Price wrote:
> >> Hey all, I was just wondering if there was any progress in
> >> integrating
> >> the OpenCL code into trunk in each project? I haven't heard anything,
> >> but it would be a shame to just leave the code sit, or wait until the
> >> code branches have significantly diverged.
> >> ~Seth
> > 
> > Seth,
> > 
> > Last week I went out and bought a new AMD/ATI machine in the hopes
> > (amoung
> > other goals) that OpenCL would work on it with the ATI OpenCL SDK.
> > Unfortunately I have discovered that the ATI Radion 4200 HD is not
> > supported
> > for OpenCL stuff.  :-(
> > 
> > Nevertheless, with some persistance I was able to build the ATI SDK,
> > and
> > configure GDAL to build against it.  So I have integrated OpenCL
> > support
> > in trunk.  It is not enabled by default, but you can enable it with
> > the
> > --with-opencl directive.  If the include files and libraries are in a
> > non-standard location you can also use the --with-opencl-include and
> > 
> > --with-opencl-lib directives to configure like this:
> >     --with-opencl \
> >     --with-opencl-include=/home/warmerda/pkg/ati-stream-sdk-v2.2-
> > 
> > lnx64/include \
> > 
> > --with-opencl-lib="-L/home/warmerda/pkg/ati-stream-sdk-v2.2-lnx64/
> > lib/x86_64
> > -lOpenCL" \
> > 
> > I ran into a few issues:
> > 
> > 1) It seems the include file from ATI is <cl/opencl.h> not <OpenCL/
> > OpenCL.h>
> > as it is on the Mac.  I've put a platform dependent ifdef but I
> > don't now
> > what the situation will be on other machines.
> > 
> > 2) In your get_device() function you were passing NULL in for the
> > platform.
> > The online docs indicate this as an option but warn that behavior
> > then is
> > platform dependent.  The ATI SDK just fails with an invalid platform
> > error.
> > So I updated the code to fetch a platform id and use that.
> > 
> > 3) On my system it falls back to using the CPU but it turns out the
> > CPU
> > does not offer "image" support in my case.  I added some extra logic
> > to
> > look for this capability so a better error could be reported.
> > 
> > 4) I restructured things a bit so that the OpenCL warper case can
> > return
> > CE_Warning to indicate to the high level warper that OpenCL should be
> > skipped and other mechanisms used.  That is what it does not if it
> > fails
> > to find a suitable device, or some of the other specific checks.
> > 
> > 5) I made a few changes to use CPLDebug instead of printf for debug
> > output.
> > 
> > I haven't tried this yet on your Mac.  I avoided using the account you
> > kindly offered because I find the Mac is often a perverse build
> > environment
> > and I didn't want to establish the "norm" based on it.  I might try
> > it out
> > tonight though.
> > 
> > I have also not yet tried it on windows, and likely won't in the
> > near future.
> > Perhaps someone else will pick up the ball there.  The code itself
> > just
> > depends on having HAVE_OPENCL defined at least in the alg directory
> > and
> > of course appropriate include and link options.
> > 
> > (cc:ed to the list so everyone is aware of the availability).
> > 
> > Best regards,
> > 
> > _______________________________________________
> > gdal-dev mailing list
> > gdal-dev at lists.osgeo.org
> > http://lists.osgeo.org/mailman/listinfo/gdal-dev
-------------- next part --------------
Number of platforms:				 1
  Platform Profile:				 FULL_PROFILE
  Platform Version:				 OpenCL 1.1 ATI-Stream-v2.2 (302)
  Platform Name:					 ATI Stream
  Platform Vendor:				 Advanced Micro Devices, Inc.
  Platform Extensions:			 cl_khr_icd cl_amd_event_callback


  Platform Name:					 ATI Stream
Number of devices:				 2
  Device Type:					 CL_DEVICE_TYPE_CPU
  Device ID:					 4098
  Max compute units:				 4
  Max work items dimensions:			 3
    Max work items[0]:				 1024
    Max work items[1]:				 1024
    Max work items[2]:				 1024
  Max work group size:				 1024
  Preferred vector width char:			 16
  Preferred vector width short:			 8
  Preferred vector width int:			 4
  Preferred vector width long:			 2
  Preferred vector width float:			 4
  Preferred vector width double:		 0
  Max clock frequency:				 1200Mhz
  Address bits:					 64
  Max memory allocation:			 1073741824
  Image support:				 No
  Max size of kernel argument:			 4096
  Alignment (bits) of base address:		 1024
  Minimum alignment (bytes) for any datatype:	 128
  Single precision floating point capability
    Denorms:					 Yes
    Quiet NaNs:					 Yes
    Round to nearest even:			 Yes
    Round to zero:				 Yes
    Round to +ve and infinity:			 Yes
    IEEE754-2008 fused multiply-add:		 No
  Cache type:					 Read/Write
  Cache line size:				 64
  Cache size:					 32768
  Global memory size:				 3221225472
  Constant buffer size:				 65536
  Max number of constant args:			 8
  Local memory type:				 Global
  Local memory size:				 32768
  Profiling timer resolution:			 1
  Device endianess:				 Little
  Available:					 Yes
  Compiler available:				 Yes
  Execution capabilities:				 
    Execute OpenCL kernels:			 Yes
    Execute native function:			 Yes
  Queue properties:				 
    Out-of-Order:				 No
    Profiling :					 Yes
  Platform ID:					 0x7f0d9d76db20
  Name:						 Intel(R) Core(TM) i5 CPU         750  @ 2.67GHz
  Vendor:					 GenuineIntel
  Driver version:				 2.0
  Profile:					 FULL_PROFILE
  Version:					 OpenCL 1.1 ATI-Stream-v2.2 (302)
  Extensions:					 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_printf 
  Device Type:					 CL_DEVICE_TYPE_GPU
  Device ID:					 4098
  Max compute units:				 2
  Max work items dimensions:			 3
    Max work items[0]:				 128
    Max work items[1]:				 128
    Max work items[2]:				 128
  Max work group size:				 128
  Preferred vector width char:			 16
  Preferred vector width short:			 8
  Preferred vector width int:			 4
  Preferred vector width long:			 2
  Preferred vector width float:			 4
  Preferred vector width double:		 0
  Max clock frequency:				 0Mhz
  Address bits:					 32
  Max memory allocation:			 134217728
  Image support:				 Yes
  Max number of images read arguments:	 128
  Max number of images write arguments:	 8
  Max image 2D width:			 8192
  Max image 2D height:			 8192
  Max image 3D width:			 2048
  Max image 3D height:	 2048
  Max image 3D depth:			 2048
  Max samplers within kernel:		 16
  Max size of kernel argument:			 1024
  Alignment (bits) of base address:		 32768
  Minimum alignment (bytes) for any datatype:	 128
  Single precision floating point capability
    Denorms:					 No
    Quiet NaNs:					 Yes
    Round to nearest even:			 Yes
    Round to zero:				 Yes
    Round to +ve and infinity:			 Yes
    IEEE754-2008 fused multiply-add:		 Yes
  Cache type:					 None
  Cache line size:				 0
  Cache size:					 0
  Global memory size:				 536870912
  Constant buffer size:				 65536
  Max number of constant args:			 8
  Local memory type:				 Scratchpad
  Local memory size:				 32768
  Profiling timer resolution:			 1
  Device endianess:				 Little
  Available:					 Yes
  Compiler available:				 Yes
  Execution capabilities:				 
    Execute OpenCL kernels:			 Yes
    Execute native function:			 No
  Queue properties:				 
    Out-of-Order:				 No
    Profiling :					 Yes
  Platform ID:					 0x7f0d9d76db20
  Name:						 Cedar
  Vendor:					 Advanced Micro Devices, Inc.
  Driver version:				 CAL 1.4.880
  Profile:					 FULL_PROFILE
  Version:					 OpenCL 1.1 ATI-Stream-v2.2 (302)
  Extensions:					 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_printf cl_amd_media_ops 


Passed!