[Qgis-developer] Large raster (ecw) identify very long

Even Rouault even.rouault at mines-paris.org
Wed Aug 22 05:37:35 PDT 2012


Selon Radim Blazek <radim.blazek at gmail.com>:

> On Wed, Aug 22, 2012 at 12:29 PM, haubourg
> <regis.haubourg at eau-adour-garonne.fr> wrote:
> >
> > Radim Blazek-2 wrote
> >>
> >>
> >> QGIS is using GDALRasterIO() which reads a single pixel on original
> >> resolution. AFAIK, ECW is using tiles internally so it should be all
> >> very fast. I can imagine 2 problems:
> >>
> >> - the tiles in ECW file are too big - can you verify somehow how big
> >> are the tiles?
> >>
> > Hi Radim, I'm not ecw specialist, but I understand it is a wavelet
> > compression algorithm, not  a tiled one.
> > There is one big image with pyramids inside.
>
> I am not sure, but I thought that it is using wavelet, pyramids and
> tiles. Without tiles it could not be fast enough on higher resolution.
>
> > Radim Blazek-2 wrote
> >>
> >> - GDAL is reading bigger portion of data than necessary (cc to GDAL list)
> >>
> >> Is it drawing of the raster also so slow (on raster resolution zoom)?
> >>
> >> Radim
> >>
> >>
> > No everything is fine when drawing data, whatever resolution I use, this is
> > the big thing with ecw.
> > I must say I find that it was a bit faster with 1.7.4 than 1.8 and later.
> > But this could be related to other factors like internal network
> > capacities.. I haven't benched on it yet.
>
> I found in GDAL ecwdataset.cpp that it is treating  single row
> requests in IRasterIO in a special way:
>     if( nYSize == 1 )
>     {
>         return GDALRasterBand::IRasterIO(eRWFlag, nXOff, nYOff, nXSize,
> nYSize,
>                                          pData, nBufXSize, nBufYSize,
>                                          eBufType, nPixelSpace, nLineSpace );
>     }


Regis,

what are the dimensions (in pixels) of your big ECW ?

Radim,

I tried the following Python script that must be representative of how QGIS must
do picking (I guess it does a RasterIO(, .... x, y, 1, 1, ... 1, 1) )

from osgeo import gdal
import random
import sys

ds = gdal.Open(sys.argv[1])
xsize = ds.RasterXSize
ysize = ds.RasterYSize

while True:
    x = random.randint(0, xsize)
    y = random.randint(0, ysize)
    data = ds.ReadRaster(x,y,1,1)

After some time, the python process occupies the 1/4 of the total RAM. This is
the default behaviour of the ECW SDK documented in
http://gdal.org/frmt_ecw.html. I then set ECW_CACHE_MAXMEM=1000000 (1 million
bytes) and the memory usage was very small as expected. So I believe there is no
memory leak in the driver. Note: in the old ECW SDK 3.3 (I don't know for the
newer ones), there was a bug in some cases : if the RAM was > 8 GB, RAM / 4 > 2
GB which overflowed a 32 bit, resulting in the SDK allocating memory without
limit.

The ECW driver is quite complex in its reading strategy to establish "views",
but from what I've captured and you noticed, when you ask a window with a 1
pixel height, it goes to IReadBlock() which will fetch one entire line and put
it in the GDAL cache. This is to avoid issuing a SetView() each time a line is
read. The intent here is to be clever for the common line-by-line pattern
access. The drawback of this is that if the raster has a big width and you only
want to read one single pixel, the cost for reading the whole line might be much
greater than the cost of establishing SetView() for your single pixel.

I suppose your workaround in QGIS will be to read 1x2 pixel or something like
that.

Ultimately, we would probably need the following GDAL patch. I can't really say
if it will improve performance a lot because my biggest ECW is "just"
10000x10000.

Index: ecwdataset.cpp
===================================================================
--- ecwdataset.cpp	(revision 24824)
+++ ecwdataset.cpp	(working copy)
@@ -246,8 +246,10 @@
 /*      We will drop down to the block oriented API if only a single    */
 /*      scanline was requested. This is based on the assumption that    */
 /*      doing lots of single scanline windows is expensive.             */
+/*      But for single pixel reading (picking use case), this is not a  */
+/*      good strategy for big rasters.                                  */
 /* -------------------------------------------------------------------- */
-    if( nYSize == 1 )
+    if( nYSize == 1 && nXSize != 1 )
     {
 #ifdef NOISY_DEBUG
         CPLDebug( "ECWRasterBand",
@@ -1038,7 +1040,7 @@
 /*      If we are requesting a single line at 1:1, we do a multi-band   */
 /*      AdviseRead() and then TryWinRasterIO() again.                   */
 /* -------------------------------------------------------------------- */
-    if( nYSize == 1 && nBufYSize == 1 && nBandCount > 1 )
+    if( nYSize == 1 && nXSize != 1 && nBufYSize == 1 && nBandCount > 1 )
     {
         CPLErr eErr;

@@ -1064,8 +1066,8 @@
 /*      band case where we post a big window for the view, and allow    */
 /*      sequential reads.                                               */
 /* -------------------------------------------------------------------- */
-    if( nXSize < nBufXSize || nYSize < nBufYSize || nYSize == 1
-        || nBandCount > 100 || nBandCount == 1 || nBufYSize == 1
+    if( nXSize < nBufXSize || nYSize < nBufYSize || (nYSize == 1 && nXSize !=
1)
+        || nBandCount > 100 || nBandCount == 1 || (nBufYSize == 1 && nBufXSize
!= 1)
         || nBandCount > GetRasterCount() )
     {
         return



Best regards,

Even


More information about the Qgis-developer mailing list