[gdal-dev] Re: OpenCL, GDAL, and You
Even Rouault
even.rouault at mines-paris.org
Mon Dec 6 18:12:03 EST 2010
Hi Seth,
I gave a try to Frank's integration of your work with my ATI Radeon HD 5400. I
got the ATI SDK 2.2 correctly installed with latest ATI drivers (10-11). The
few OpenCL demos provided with the SDK I tried work on the GPU device.
(Note: "thanks" to the errors bellow, I've made a few cleanups in
http://trac.osgeo.org/gdal/changeset/21205 to improve error reporting and
clean-up in case of error. I also added missing initialization of 2 members of
the warper struct)
Then I tried a simple warp and I got the following error :
"""
0ERROR 1: Error: Failed to build program executable!
Build Log:
Warning: invalid option: -cl-fast-relaxed-math
Warning: invalid option: -Werror
/tmp/OCL3QLDj2.cl(193): warning: double-precision constant is represented as
single-precision constant because double is not enabled
return (float2)(-99.0, -99.0);
^
/tmp/OCL3QLDj2.cl(193): warning: double-precision constant is represented as
single-precision constant because double is not enabled
return (float2)(-99.0, -99.0);
^
/tmp/OCL3QLDj2.cl(195): error: bad argument type to opencl image op: expected
sampler_t
CLK_NORMALIZED_COORDS_TRUE |
^
1 error detected in the compilation of "/tmp/OCL3QLDj2.cl".
ERROR 1: Error at file gdalwarpkernel_opencl.c line 2228:
CL_BUILD_PROGRAM_FAILURE
""""
Hum, then I looked at similar code in the neighbourhood and I came with the
following change that solves the compilation. It is not commited yet, does it
look ok to you ?
Index: alg/gdalwarpkernel_opencl.c
===================================================================
--- alg/gdalwarpkernel_opencl.c (révision 21205)
+++ alg/gdalwarpkernel_opencl.c (copie de travail)
@@ -724,13 +724,13 @@
// Check & return when the thread group overruns the image size
"if (nDstX >= iDstWidth || nDstY >= iDstHeight)\n"
"return (float2)(-99.0, -99.0);\n"
+
+ "const sampler_t samp = CLK_NORMALIZED_COORDS_TRUE |\n"
+ "CLK_ADDRESS_CLAMP_TO_EDGE |\n"
+ "CLK_FILTER_LINEAR;\n"
+
+ "float4 fSrcCoords = read_imagef(srcCoords,samp,fDst);\n"
- "float4 fSrcCoords = read_imagef(srcCoords,\n"
- "CLK_NORMALIZED_COORDS_TRUE |\n"
- "CLK_ADDRESS_CLAMP_TO_EDGE |\n"
- "CLK_FILTER_LINEAR,\n"
- "fDst);\n"
-
"return (float2)(fSrcCoords.x, fSrcCoords.y);\n"
"}\n";
After solving the compilation error, I'm stuck with :
"""
0ERROR 1: Error at file gdalwarpkernel_opencl.c line 1391:
CL_INVALID_IMAGE_SIZE
ERROR 1: Error at file gdalwarpkernel_opencl.c line 1391: CL_INVALID_IMAGE_SIZE
ERROR 1: Error at file gdalwarpkernel_opencl.c line 2292: CL_INVALID_IMAGE_SIZE
ERROR 1: Error at file gdalwarpkernel_opencl.c line 1391: CL_INVALID_IMAGE_SIZE
ERROR 1: OpenCL routines reported failure (-40) on line 2570.
"""
The revelant line is :
"""
//Make a fake image so we don't have a NULL pointer
(*srcImag) = clCreateImage2D(warper->context, CL_MEM_READ_ONLY,
&imgFmt,
1, 1, sz, NULL, &err);
handleErr(err);
"""
Any ideas ? Did you try with ATI or NVidia cards ?
I've attached the output of CLInfo if it can help.
Best regards,
Even
Le lundi 06 décembre 2010 15:50:04, Seth Price a écrit :
> Over the summer I rewrote the warper to use OpenCL. There was a 2x to
> 50x speedup. Here is a description of what I did:
> http://osgeo-org.1803224.n2.nabble.com/gdal-dev-gdalwarp-OpenCL-Performance
> -Week-9-td5341226.html
>
> ~Seth
>
> On Dec 6, 2010, at 5:10 AM, Konstantin Baumann wrote:
> > Hi,
> >
> > what benefit/improvement would the OpenCL integration bring to GDAL?
> > Additional functionality or a speedup of existing functions?
> > Probably only operations on images and/or rasters are supported;
> > reprojection/warping and filtering would be good candidates, right?
> > What concrete operations would be supported?
> >
> > Kosta
> >
> > -----Original Message-----
> > From: gdal-dev-bounces at lists.osgeo.org
> > [mailto:gdal-dev-bounces at lists.osgeo.org ] On Behalf Of Frank Warmerdam
> > (External)
> > Sent: Monday, December 06, 2010 1:43 AM
> > To: Seth Price
> > Cc: Philippe Vachon; gdal-dev; Wolf Bergenheim
> > Subject: [gdal-dev] Re: OpenCL, GRASS, GDAL, and You
> >
> > On 10-09-23 10:09 AM, Seth Price wrote:
> >> Hey all, I was just wondering if there was any progress in
> >> integrating
> >> the OpenCL code into trunk in each project? I haven't heard anything,
> >> but it would be a shame to just leave the code sit, or wait until the
> >> code branches have significantly diverged.
> >> ~Seth
> >
> > Seth,
> >
> > Last week I went out and bought a new AMD/ATI machine in the hopes
> > (amoung
> > other goals) that OpenCL would work on it with the ATI OpenCL SDK.
> > Unfortunately I have discovered that the ATI Radion 4200 HD is not
> > supported
> > for OpenCL stuff. :-(
> >
> > Nevertheless, with some persistance I was able to build the ATI SDK,
> > and
> > configure GDAL to build against it. So I have integrated OpenCL
> > support
> > in trunk. It is not enabled by default, but you can enable it with
> > the
> > --with-opencl directive. If the include files and libraries are in a
> > non-standard location you can also use the --with-opencl-include and
> >
> > --with-opencl-lib directives to configure like this:
> > --with-opencl \
> > --with-opencl-include=/home/warmerda/pkg/ati-stream-sdk-v2.2-
> >
> > lnx64/include \
> >
> > --with-opencl-lib="-L/home/warmerda/pkg/ati-stream-sdk-v2.2-lnx64/
> > lib/x86_64
> > -lOpenCL" \
> >
> > I ran into a few issues:
> >
> > 1) It seems the include file from ATI is <cl/opencl.h> not <OpenCL/
> > OpenCL.h>
> > as it is on the Mac. I've put a platform dependent ifdef but I
> > don't now
> > what the situation will be on other machines.
> >
> > 2) In your get_device() function you were passing NULL in for the
> > platform.
> > The online docs indicate this as an option but warn that behavior
> > then is
> > platform dependent. The ATI SDK just fails with an invalid platform
> > error.
> > So I updated the code to fetch a platform id and use that.
> >
> > 3) On my system it falls back to using the CPU but it turns out the
> > CPU
> > does not offer "image" support in my case. I added some extra logic
> > to
> > look for this capability so a better error could be reported.
> >
> > 4) I restructured things a bit so that the OpenCL warper case can
> > return
> > CE_Warning to indicate to the high level warper that OpenCL should be
> > skipped and other mechanisms used. That is what it does not if it
> > fails
> > to find a suitable device, or some of the other specific checks.
> >
> > 5) I made a few changes to use CPLDebug instead of printf for debug
> > output.
> >
> > I haven't tried this yet on your Mac. I avoided using the account you
> > kindly offered because I find the Mac is often a perverse build
> > environment
> > and I didn't want to establish the "norm" based on it. I might try
> > it out
> > tonight though.
> >
> > I have also not yet tried it on windows, and likely won't in the
> > near future.
> > Perhaps someone else will pick up the ball there. The code itself
> > just
> > depends on having HAVE_OPENCL defined at least in the alg directory
> > and
> > of course appropriate include and link options.
> >
> > (cc:ed to the list so everyone is aware of the availability).
> >
> > Best regards,
> >
> > _______________________________________________
> > gdal-dev mailing list
> > gdal-dev at lists.osgeo.org
> > http://lists.osgeo.org/mailman/listinfo/gdal-dev
-------------- next part --------------
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 1.1 ATI-Stream-v2.2 (302)
Platform Name: ATI Stream
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback
Platform Name: ATI Stream
Number of devices: 2
Device Type: CL_DEVICE_TYPE_CPU
Device ID: 4098
Max compute units: 4
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 1024
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 4
Preferred vector width double: 0
Max clock frequency: 1200Mhz
Address bits: 64
Max memory allocation: 1073741824
Image support: No
Max size of kernel argument: 4096
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: No
Cache type: Read/Write
Cache line size: 64
Cache size: 32768
Global memory size: 3221225472
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Global
Local memory size: 32768
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: Yes
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 0x7f0d9d76db20
Name: Intel(R) Core(TM) i5 CPU 750 @ 2.67GHz
Vendor: GenuineIntel
Driver version: 2.0
Profile: FULL_PROFILE
Version: OpenCL 1.1 ATI-Stream-v2.2 (302)
Extensions: cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_device_fission cl_amd_device_attribute_query cl_amd_printf
Device Type: CL_DEVICE_TYPE_GPU
Device ID: 4098
Max compute units: 2
Max work items dimensions: 3
Max work items[0]: 128
Max work items[1]: 128
Max work items[2]: 128
Max work group size: 128
Preferred vector width char: 16
Preferred vector width short: 8
Preferred vector width int: 4
Preferred vector width long: 2
Preferred vector width float: 4
Preferred vector width double: 0
Max clock frequency: 0Mhz
Address bits: 32
Max memory allocation: 134217728
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 8192
Max image 2D height: 8192
Max image 3D width: 2048
Max image 3D height: 2048
Max image 3D depth: 2048
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 32768
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: No
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: None
Cache line size: 0
Cache size: 0
Global memory size: 536870912
Constant buffer size: 65536
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 32768
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue properties:
Out-of-Order: No
Profiling : Yes
Platform ID: 0x7f0d9d76db20
Name: Cedar
Vendor: Advanced Micro Devices, Inc.
Driver version: CAL 1.4.880
Profile: FULL_PROFILE
Version: OpenCL 1.1 ATI-Stream-v2.2 (302)
Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_printf cl_amd_media_ops
Passed!
More information about the gdal-dev
mailing list