[gdal-dev] Extracting cell values from a big image and many smaller images

Lucena, Ivan ivan.lucena at pmldnet.com
Fri Feb 29 14:02:23 EST 2008


Frank, Limei,

I am doing something very similar in Python. The goal is to test some 
alternative of data storage for large Time Series data.

The Python script should goes through all the bands of a BigTiff or 
through all the files of single-band formats and read a small windows of 
pixel's value, ex.:

% profiler.py -d \\NAS\DATA_FOLDER -a [[100,100],[101,101]]
file_1,10,19,10,20
file_2,10,19,10,20
file_3,40,19,16,20
file_4,10,89,10,20
file_5,10,19,10,10
...

or for multi-band:

% profiler.py -a [[100,100],[101,101]] file_multb.tif
band_1,10,19,10,20
band_2,10,19,10,20
band_3,40,19,16,20
band_4,10,89,10,20
band_5,10,19,10,10
...

Not that I don't trust your advise to use BigTiff or IMG. :) I just want 
to prove it, so I intend to time it up to compare the performance.

Therefor, any advise on the Python + GDAL + numpy reading technique 
would be well appreciated.

Do you think that reducing the cache would be a good idea in that case?

Best regards,

Ivan


Frank Warmerdam wrote:
> Limei Ran wrote:
>>
>> Hi:
>>
>> I am using GDAL cpp library to create a program. The program will 
>> ultimately generate a statistic table with cell values from a very big 
>> modeling grid domain image (almost whole US) and many smaller land use 
>> images within the big image.
>>
>> I need to go through all the small image pixels to match grid cell 
>> values in the big image. There are many ways to read image data in 
>> line and blocks from GDALRasterBand class.
>>
>> Since I am new in using GDAL libraries, I appreciate any suggestion 
>> you might have in accessing the images efficiently.
> 
> Limei,
> 
> I'm not exactly clear on what you want to do, but a couple hints:
> 
>  - Avoid doing many one pixel reads with RasterIO().  There is quite
>    a bit of overhead in each call and so you should only do one pixel
>    reads when that is all you really need.  I believe even with caching
>    using one pixel reads to read a whole scanline would be easily an
>    order of magnitude slower than doing one full scanline read.
> 
>  - If you will be accessing your huge image in local chunks, consider
>    organizing it as a tiled image.  Perhaps an imagine (HFA) file or
>    a BigTIFF with tiling.
> 
>  - If you need precision, and your small land use images are for
>    reasonably small areas, I would suggest just loading all the data from
>    the big image that matches the area for the small image in one gulp
>    (one RasterIO() call).  Then do your matching analysis and then move
>    on to the next.
> 
> Best regards,


More information about the gdal-dev mailing list