[Gdal-dev] trivial (?) optimization for gdal.i:GDALReadRaster()

Wed Feb 4 06:38:01 EST 2004

Frank,

i've started analysing the performance of my gdal-related pyhton code with 
oprofile (http://oprofile.sf.net) and found what looks like an easy 
optimization for a ~20% speed increase on reads.

i have a simple python script that reads a 500Mb CInt16 dataset in ~40Mb 
chuncks. the file fits confortably in the linux kernel cache on my machine 
and i get quite repeatably the following timing on an otherwise idle system:

real    0m36.355s
user    0m26.925s
sys     0m8.593s

oprofile breaks down the cpu time as:

alan:$ opreport -t 1 -l
CPU: CPU with timer interrupt, speed 1000.06 MHz (estimated)
Profiling through timer interrupt
vma      samples  %           app name                 symbol name
001323b0 8035     22.1380     libgdal.so.1.1.9         GDALCopyWords
00077990 7829     21.5705     libc-2.3.2.so            memcpy
000772e0 6211     17.1125     libc-2.3.2.so            memmove
c01af1b0 4162     11.4671     vmlinux                  fast_clear_page
00077370 3487      9.6074     libc-2.3.2.so            memset
c01af6d0 2266      6.2433     vmlinux                  __copy_to_user_ll
001322d0 855       2.3557     libgdal.so.1.1.9         GDALSwapWords
c013bd50 412       1.1351     vmlinux                  buffered_rmqueue
c0117d30 366       1.0084     vmlinux                  do_page_fault

which says that my script is spending most of it's time basically copying 
around the initial data (fetched from the kernel memory by __copy_to_user_ll, 
which accounts for only 6% of the total time). the call to GDALCopyWords is 
ok because it also does type conversion to CFloat32.

the attached patch (you need to regenerate the gdal_wrap.c file with swig) cut 
the real time well below 30 seconds (~20% better) by skipping one of the 
copys of the original buffer. the call to PyString_AsString() is legitimate 
acording to the python C-API manual because we just created the object.

results are:

real    0m28.409s
user    0m20.557s
sys     0m6.678s

CPU: CPU with timer interrupt, speed 1000.06 MHz (estimated)
Profiling through timer interrupt
vma      samples  %           app name                 symbol name
001323b0 7959     28.3329     libgdal.so.1.1.9         GDALCopyWords
000772e0 6226     22.1637     libc-2.3.2.so            memmove
00077370 3558     12.6660     libc-2.3.2.so            memset
c01af1b0 2722      9.6899     vmlinux                  fast_clear_page
c01af6d0 2236      7.9598     vmlinux                  __copy_to_user_ll
00077990 1464      5.2116     libc-2.3.2.so            memcpy
001322d0 820       2.9191     libgdal.so.1.1.9         GDALSwapWords

memcpy usage dropped by 6300+ samples and also fast_clear_page usage goes down 
a bit.

i think i know where the remaining memmove-memset are sitting (see 
pytmod/gdalnumeric.py:154), but i have no work-around for that.

this is my first time at playing with python extensions, so Frank please 
double check the patch. anyhow, it has been tested and works here.

cheers,
alessandro

BTW: which version of swig are you using? i cannot produce the gdal_wrap.c 
file with version 1.3.19.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: python-read-speed.diff
Type: text/x-diff
Size: 789 bytes
Desc: not available
Url : http://lists.osgeo.org/pipermail/gdal-dev/attachments/20040204/820b72ef/python-read-speed.bin