<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div>Michael,</div><div>Scott is right. &nbsp;Not sure if this is the preferred approach, but I accomplished this for large datasets by specifying buffer sizes for ReadAsArray. &nbsp;The doc I consulted is here:&nbsp;<a href="http://gdal.org/python/osgeo.gdal_array-module.html#BandReadAsArray">http://gdal.org/python/osgeo.gdal_array-module.html#BandReadAsArray</a>. &nbsp;</div><div>I used masked arrays to exclude nodata values - you may not need to worry about with this.</div><div>-David</div><div><br></div><div>Excerpt from my script:</div><div><br></div><div>src_ds = gdal.Open(src_fn, gdal.GA_ReadOnly)<br>b = src_ds.GetRasterBand(1)<br>ndv = b.GetNoDataValue()<br>ns = src_ds.RasterXSize<br>nl = src_ds.RasterYSize<br><br>#Don't want to load the entire dataset for stats computation</div><div>#This is maximum dimension for reduced resolution array<br>max_dim = 1024.<br><br>scale_ns = ns/max_dim<br>scale_nl = nl/max_dim<br>scale_max = max(scale_ns, scale_nl)<br><br>if scale_max &gt; 1:<br>&nbsp; &nbsp; nl = round(nl/scale_max)<br>&nbsp; &nbsp; ns = round(ns/scale_max)<br><br>#The buf_size parameters determine the final array dimensions<br>bm = numpy.ma.masked_equal(numpy.array(b.ReadAsArray(buf_xsize=ns, buf_ysize=nl)), ndv)</div><div><br></div><div><br></div><div><div>On Apr 11, 2012, at 11:17 AM, Scott Arko wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">Hi Michael,<div><br></div><div><br></div><div>I may be missing your question, but why aren't you just using ReadAsArray? &nbsp;It has an option to return a smaller array from the input array. &nbsp;Now, I'm not sure how it does the resampling (you could look to see), but you can make a call like</div>

<div><br></div><div>data = banddata.ReadAsArray(0,0,filehandle.RasterXSize,filehandle.RasterYSize,xsize,ysize)</div><div><br></div><div>where xsize and ysize are smaller than the true RasterXSize or RasterYSize. &nbsp;I haven't looked at this in a while, but I'm pretty sure this will work. &nbsp;Did I miss the point of what you were asking?</div>

<div><br></div><div><br></div><div>Thanks,</div><div>Scott</div><div><br><br><div class="gmail_quote">On Wed, Apr 11, 2012 at 6:31 AM, K.-Michael Aye <span dir="ltr">&lt;<a href="mailto:kmichael.aye@gmail.com">kmichael.aye@gmail.com</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Dear all,<br>

<br>

is there a Python API for downsampling a huge dataset?<br>

What I would like to do:<br>

<br>

* get my dataset<br>

* read out RasterXSize and RasterYSize<br>

* calculate how many lines and rows I need to skip to get a quick overview image, e.g. 10 lines to skip.<br>

* Have a ReadAsArray interface where I can say something like this:<br>

** data = ds.ReadAsArray(xoffset, yoffset, 10000, 10000, skipping=10)<br>

<br>

which in numpy terms would give me every 10nth line like this: array[:,:,10]<br>

<br>

I really don't need quality at all, just speed, for a rough overview for further zooming in with lassos, as the images I deal with sometimes have more than 200 MPixels.<br>

<br>

Is this possible in Python?<br>

I was thinking now, maybe one could use numpy's memmap somehow for this, don't know much about it, though…<br>

<br>

Thanks for any hints!<br>

<br>

Best regards,<br>

Michael<br>

<br>

<br>

______________________________<u></u>_________________<br>

gdal-dev mailing list<br>

<a href="mailto:gdal-dev@lists.osgeo.org" target="_blank">gdal-dev@lists.osgeo.org</a><br>

<a href="http://lists.osgeo.org/mailman/listinfo/gdal-dev" target="_blank">http://lists.osgeo.org/<u></u>mailman/listinfo/gdal-dev</a><br>

</blockquote></div><br><br><br>

</div>

_______________________________________________<br>gdal-dev mailing list<br><a href="mailto:gdal-dev@lists.osgeo.org">gdal-dev@lists.osgeo.org</a><br>http://lists.osgeo.org/mailman/listinfo/gdal-dev</blockquote></div><br></body></html>