[Gdal-dev] [RFC] RasterIO speedup patch
Tim Beckmann
tbeckman at usgs.gov
Wed Feb 5 17:31:02 EST 2003
Frank and the rest of the list,
We've been working to incorporate GDAL into our software. One drawback
we've found so far is that the performance of gdal when it needs to do
data type conversions is extremely slow. Here is a patch that I've
generated that seems to work to improve the performance. It works by
changing the GDALRasterBand::IRasterIO to attempt to convert more than a
single pixel for every call to GDALCopyWords. For the testing I've done,
it seems to work well. I'm not sure if I've covered all the possible
combinations though. There are still cases where it falls back to calling
GDALCopyWords once for each pixel.
Some preliminary timings show that the runtime to convert a 100 MB Float32
image to Byte data took 49 seconds with the original code, but only 27
seconds with the patch below. This is using our unreleased LAS driver on
an SGI Origin 2000 with 250 MHz processors.
Any comments?
Oh yeah, once for the record - I really dislike the naming convention used
for variables in the routine. It makes it much harder to understand the
code. Also, some of the variable names could use some comments to explain
what they do (I'd add them, but I'm not 100% sure). Example: the
nSrcPixelOffset parameter to GDALCopyWords. If I understand it correctly,
it isn't an offset, it is the value to stride through the data with.
I also have another set of changes that cut the time nearly in half
again... But one step at a time.
Patch (hopefully won't be mangled by my email program):
---------------------------------------------------------------------------------------------------------------
--- ../../../remotesensing_gdal/core/rasterio.cpp Mon Nov 11
10:02:06 2002
+++ rasterio.cpp Wed Feb 5 15:58:02 2003
@@ -103,6 +113,8 @@
{
int nBandDataSize = GDALGetDataTypeSize( eDataType ) / 8;
+ int bytesPerDataPixel = GDALGetDataTypeSize( eDataType ) / 8;
+ int bytesPerBufPixel = GDALGetDataTypeSize( eBufType ) / 8;
GByte *pabySrcBlock = NULL;
GDALRasterBlock *poBlock;
int nLBlockX=-1, nLBlockY=-1, iBufYOff, iBufXOff, iSrcY;
@@ -231,6 +243,26 @@
return CE_Failure;
}
+ /* calculate buffer span in the current block and handle as
many
+ pixels as possible in one call to the GDALCopyWords
routine.
+ Note: this makes sure that if the data is being subsampled
at
+ somethings besides an integer interval that only one pixel
at a
+ time is grabbed. This is very inefficient. It would
likely be
+ better to convert the entire line, then pull out the
pixels that
+ are needed for the subsampled version. */
+ int iSrcXInc = (int)dfSrcXInc;
+ int wordCount = 1;
+ if ( dfSrcXInc == iSrcXInc)
+ {
+ /* integer multiple for subsampling, so calculate the
number
+ of words to copy. Also make sure the end of the
requested
+ bytes isn't exceeded. */
+ int endXBlockIndex = (nLBlockX + 1) * nBlockXSize;
+ wordCount = (endXBlockIndex - iSrcX)/iSrcXInc;
+ if (wordCount > (nBufXSize - iBufXOff))
+ wordCount = nBufXSize - iBufXOff;
+ }
+
/* --------------------------------------------------------------------
*/
/* Copy over this pixel of data. */
/* --------------------------------------------------------------------
*/
@@ -239,29 +271,38 @@
if( eDataType == eBufType )
{
+ /* FIXME: it might be better to call GDALCopyWords to
process
+ all the pixels from the current buffer */
if( eRWFlag == GF_Read )
memcpy( ((GByte *) pData) + iBufOffset,
pabySrcBlock + iSrcOffset, nBandDataSize );
else
memcpy( pabySrcBlock + iSrcOffset,
((GByte *) pData) + iBufOffset, nBandDataSize
);
+ iBufOffset += nPixelSpace;
}
else
{
- /* type to type conversion ... ouch, this is expensive
way
- of handling single words */
-
+ /* copy all the words for the current image block */
if( eRWFlag == GF_Read )
- GDALCopyWords( pabySrcBlock + iSrcOffset, eDataType,
0,
- ((GByte *) pData) + iBufOffset,
eBufType, 0,
- 1 );
+ {
+ GDALCopyWords( pabySrcBlock + iSrcOffset, eDataType,
+ bytesPerDataPixel * iSrcXInc,
+ ((GByte *) pData) + iBufOffset,
eBufType,
+ bytesPerBufPixel * iSrcXInc, wordCount
);
+ }
else
- GDALCopyWords( ((GByte *) pData) + iBufOffset,
eBufType, 0,
- pabySrcBlock + iSrcOffset, eDataType,
0,
- 1 );
+ {
+ GDALCopyWords( ((GByte *) pData) + iBufOffset,
eBufType,
+ bytesPerBufPixel * iSrcXInc,
+ pabySrcBlock + iSrcOffset, eDataType,
+ bytesPerDataPixel * iSrcXInc,
wordCount );
+ }
+ /* adjust the loop counter to account for copying
multiple
+ words - ugly I know */
+ iBufXOff += (wordCount - 1);
+ iBufOffset += (nPixelSpace * wordCount);
}
-
- iBufOffset += nPixelSpace;
}
}
-------------------------------------------------------------------------------------------------------------------------
Interesting timing data for converting a 100 MB image with Float32 data to
Byte data (using our LAS driver that hasn't been released yet):
Quick summary: original code took 49 seconds. patched code took 27
seconds.
*****Original timing:
-------------------------------------------------------------------------
Summary of ideal time data (ideal)--
9537549636: Total number of instructions executed
12281429794: Total computed cycles
49.126: Total computed execution time (secs.)
1.288: Average cycles / instruction
-------------------------------------------------------------------------
Function list, in descending order by exclusive ideal time
-------------------------------------------------------------------------
[index] excl.secs excl.% cum.% cycles instructions calls
function (dso: file, line)
[1] 25.279 51.5% 51.5% 6319710000 5075915000
25020000 GDALCopyWords (libgdal.1.1.so: rasterio.cpp, 369)
[2] 18.711 38.1% 89.5% 4677785000 2951650000 10000
GDALRasterBand::IRasterIO(GDALRWFlag,int,int,int,int,void*,int,int,GDALDataType,int,int)
(libgdal.1.1.so: rasterio.cpp, 114)
[3] 4.934 10.0% 99.6% 1233501577 1460605855
50030709 memcpy (libc.so.1: bcopy.s, 329)
[4] 0.019 0.0% 99.6% 4874107 5695887 5062
memset (libc.so.1: bzero.s, 98)
***** Timing with the patch:
-------------------------------------------------------------------------
Summary of ideal time data (ideal)--
5839329727: Total number of instructions executed
6783873165: Total computed cycles
27.135: Total computed execution time (secs.)
1.162: Average cycles / instruction
-------------------------------------------------------------------------
Function list, in descending order by exclusive ideal time
-------------------------------------------------------------------------
[index] excl.secs excl.% cum.% cycles instructions calls
function (dso: file, line)
[1] 21.979 81.0% 81.0% 5494875000 4326065000 25000
GDALCopyWords (libgdal.1.1.so: rasterio.cpp, 400)
[2] 4.934 18.2% 99.2% 1233501840 1460606163
50030721 memcpy (libc.so.1: bcopy.s, 329)
[3] 0.019 0.1% 99.3% 4874107 5695887 5062
memset (libc.so.1: bzero.s, 98)
[4] 0.018 0.1% 99.3% 4515000 2710000 10000
GDALRasterBand::IRasterIO(GDALRWFlag,int,int,int,int,void*,int,int,GDALDataType,int,int)
(libgdal.1.1.so: rasterio.cpp, 114)
--------------------------------------------------------------------------
Tim Beckmann tbeckman at usgs.gov
Software Project Lead
SAIC
EROS Data Center, Sioux Falls, SD 57198
605-594-2521 Phone
605-594-6940 Fax
More information about the Gdal-dev
mailing list