[gdal-dev] Python WriteArray()

Even Rouault even.rouault at mines-paris.org
Thu Jul 14 13:18:05 EDT 2011


Selon "Jay L." <jzl5325 at psu.edu>:

It is always difficult to diagnose performance issues at distance, but an
explanation could be that you hit cache trashing when alternating read and
writes. Whereas when you read several blocks and write them afterwards, you can
benefit from the read cached blocks. You could try increasing the GDAL cache
size. Or try to better align on the natural block dimension of your datasets.
Ideally your output dataset would also have the same block dimensions as your
input dataset.


> I am working on processing some raster images and have been really
> struggling with processing times.  I have tracked the issue down to GDAL's
> WriteArray() function.  I am cutting the image into blocks and processing in
> small chunks.  This is facilitate working with really large images where the
> entire array can not be loaded into memory.
>
> If I try to call WriteArray() after processing each block processing time is
> 3+ hours for a 1GB image.  By loading the processed block (numpy arrays)
> into a python list and then writing out a number at the same time I have cut
> processing time to sub 2 minutes.
>
> What is GDAL doing when it calls writeArray() that is requiring so much
> time?  Has anyone else encountered this and been able to speed up GDAL's
> array writing?
>
> Thanks Jay
>
> Here is an example is pseudocode:
>
> open the raster
>
> iterate through the bands using dataset.GetRasterBand(j)
>
> iterate through the rows
>     iterate through the columns
>
>     readAsArray(a block of rows and columns)
>
>     process the array in numpy
>
>     #The more often this is called the slower the entire script runs...why?
>     write out the array with outdataset.GetRasterBand(j).WriteArray(the
> proper place to insert the modified array)
>




More information about the gdal-dev mailing list