[Gdal-dev] [RFC] RasterIO speedup patch

Tim Beckmann tbeckman at usgs.gov
Wed Feb 5 17:31:02 EST 2003


Frank and the rest of the list,

We've been working to incorporate GDAL into our software.  One drawback 
we've found so far is that the performance of gdal when it needs to do 
data type conversions is extremely slow.  Here is a patch that I've 
generated that seems to work to improve the performance.  It works by 
changing the GDALRasterBand::IRasterIO to attempt to convert more than a 
single pixel for every call to GDALCopyWords.  For the testing I've done, 
it seems to work well.  I'm not sure if I've covered all the possible 
combinations though.  There are still cases where it falls back to calling 
GDALCopyWords once for each pixel.

Some preliminary timings show that the runtime to convert a 100 MB Float32 
image to Byte data took 49 seconds with the original code, but only 27 
seconds with the patch below.  This is using our unreleased LAS driver on 
an SGI Origin 2000 with 250 MHz processors.

Any comments?

Oh yeah, once for the record - I really dislike the naming convention used 
for variables in the routine.  It makes it much harder to understand the 
code.  Also, some of the variable names could use some comments to explain 
what they do (I'd add them, but I'm not 100% sure).  Example: the 
nSrcPixelOffset parameter to GDALCopyWords.  If I understand it correctly, 
it isn't an offset, it is the value to stride through the data with.

I also have another set of changes that cut the time nearly in half 
again...  But one step at a time.

Patch (hopefully won't be mangled by my email program):
---------------------------------------------------------------------------------------------------------------
--- ../../../remotesensing_gdal/core/rasterio.cpp       Mon Nov 11 
10:02:06 2002
+++ rasterio.cpp        Wed Feb  5 15:58:02 2003
@@ -103,6 +113,8 @@
 
 {
     int         nBandDataSize = GDALGetDataTypeSize( eDataType ) / 8;
+    int bytesPerDataPixel = GDALGetDataTypeSize( eDataType ) / 8;
+    int bytesPerBufPixel = GDALGetDataTypeSize( eBufType ) / 8;
     GByte       *pabySrcBlock = NULL;
     GDALRasterBlock *poBlock;
     int         nLBlockX=-1, nLBlockY=-1, iBufYOff, iBufXOff, iSrcY;
@@ -231,6 +243,26 @@
                     return CE_Failure;
             }
 
+            /* calculate buffer span in the current block and handle as 
many
+               pixels as possible in one call to the GDALCopyWords 
routine.
+               Note: this makes sure that if the data is being subsampled 
at
+               somethings besides an integer interval that only one pixel 
at a
+               time is grabbed.  This is very inefficient.  It would 
likely be
+               better to convert the entire line, then pull out the 
pixels that
+               are needed for the subsampled version. */
+            int iSrcXInc = (int)dfSrcXInc;
+            int wordCount = 1;
+            if ( dfSrcXInc == iSrcXInc)
+            {
+                /* integer multiple for subsampling, so calculate the 
number
+                   of words to copy.  Also make sure the end of the 
requested
+                   bytes isn't exceeded. */
+                int endXBlockIndex = (nLBlockX + 1) * nBlockXSize;
+                wordCount = (endXBlockIndex - iSrcX)/iSrcXInc;
+                if (wordCount > (nBufXSize - iBufXOff))
+                    wordCount = nBufXSize - iBufXOff;
+            }
+
 /* -------------------------------------------------------------------- 
*/
 /*      Copy over this pixel of data. */
 /* -------------------------------------------------------------------- 
*/
@@ -239,29 +271,38 @@
 
             if( eDataType == eBufType )
             {
+                /* FIXME: it might be better to call GDALCopyWords to 
process
+                   all the pixels from the current buffer */
                 if( eRWFlag == GF_Read )
                     memcpy( ((GByte *) pData) + iBufOffset,
                             pabySrcBlock + iSrcOffset, nBandDataSize );
                 else
                     memcpy( pabySrcBlock + iSrcOffset, 
                             ((GByte *) pData) + iBufOffset, nBandDataSize 
);
+                iBufOffset += nPixelSpace;
             }
             else
             {
-                /* type to type conversion ... ouch, this is expensive 
way
-                   of handling single words */
- 
+                /* copy all the words for the current image block */
                 if( eRWFlag == GF_Read )
-                    GDALCopyWords( pabySrcBlock + iSrcOffset, eDataType, 
0,
-                                   ((GByte *) pData) + iBufOffset, 
eBufType, 0,
-                                   1 );
+                {
+                    GDALCopyWords( pabySrcBlock + iSrcOffset, eDataType, 
+                                   bytesPerDataPixel * iSrcXInc,
+                                   ((GByte *) pData) + iBufOffset, 
eBufType, 
+                                   bytesPerBufPixel * iSrcXInc, wordCount 
);
+                }
                 else
-                    GDALCopyWords( ((GByte *) pData) + iBufOffset, 
eBufType, 0,
-                                   pabySrcBlock + iSrcOffset, eDataType, 
0,
-                                   1 );
+                {
+                    GDALCopyWords( ((GByte *) pData) + iBufOffset, 
eBufType, 
+                                   bytesPerBufPixel * iSrcXInc,
+                                   pabySrcBlock + iSrcOffset, eDataType, 
+                                   bytesPerDataPixel * iSrcXInc, 
wordCount );
+                }
+                /* adjust the loop counter to account for copying 
multiple
+                   words - ugly I know */
+                iBufXOff += (wordCount - 1);
+                iBufOffset += (nPixelSpace * wordCount);
             }
-
-            iBufOffset += nPixelSpace;
         }
     }
-------------------------------------------------------------------------------------------------------------------------


Interesting timing data for converting a 100 MB image with Float32 data to 
Byte data (using our LAS driver that hasn't been released yet): 
Quick summary: original code took 49 seconds.  patched code took 27 
seconds.

*****Original timing:
 -------------------------------------------------------------------------
Summary of ideal time data (ideal)--
                    9537549636: Total number of instructions executed
                   12281429794: Total computed cycles
                        49.126: Total computed execution time (secs.)
                         1.288: Average cycles / instruction
-------------------------------------------------------------------------
Function list, in descending order by exclusive ideal time
-------------------------------------------------------------------------
 [index]   excl.secs   excl.%     cum.%        cycles  instructions calls 
function  (dso: file, line)

     [1]      25.279    51.5%     51.5%    6319710000    5075915000 
25020000  GDALCopyWords (libgdal.1.1.so: rasterio.cpp, 369)
     [2]      18.711    38.1%     89.5%    4677785000    2951650000 10000 
GDALRasterBand::IRasterIO(GDALRWFlag,int,int,int,int,void*,int,int,GDALDataType,int,int) 
(libgdal.1.1.so: rasterio.cpp, 114)
     [3]       4.934    10.0%     99.6%    1233501577    1460605855 
50030709  memcpy (libc.so.1: bcopy.s, 329)
     [4]       0.019     0.0%     99.6%       4874107       5695887 5062 
memset (libc.so.1: bzero.s, 98)

***** Timing with the patch:
-------------------------------------------------------------------------
Summary of ideal time data (ideal)--
                    5839329727: Total number of instructions executed
                    6783873165: Total computed cycles
                        27.135: Total computed execution time (secs.)
                         1.162: Average cycles / instruction
-------------------------------------------------------------------------
Function list, in descending order by exclusive ideal time
-------------------------------------------------------------------------
 [index]   excl.secs   excl.%     cum.%        cycles  instructions calls 
function  (dso: file, line)

     [1]      21.979    81.0%     81.0%    5494875000    4326065000 25000 
GDALCopyWords (libgdal.1.1.so: rasterio.cpp, 400)
     [2]       4.934    18.2%     99.2%    1233501840    1460606163 
50030721  memcpy (libc.so.1: bcopy.s, 329)
     [3]       0.019     0.1%     99.3%       4874107       5695887 5062 
memset (libc.so.1: bzero.s, 98)
     [4]       0.018     0.1%     99.3%       4515000       2710000 10000 
GDALRasterBand::IRasterIO(GDALRWFlag,int,int,int,int,void*,int,int,GDALDataType,int,int) 
(libgdal.1.1.so: rasterio.cpp, 114)


--------------------------------------------------------------------------
Tim Beckmann               tbeckman at usgs.gov
Software Project Lead
SAIC
EROS Data Center, Sioux Falls, SD 57198
605-594-2521    Phone
605-594-6940    Fax




More information about the Gdal-dev mailing list