[Gdal-dev] gdal_translate versus mrsiddecode

Frank Warmerdam fwarmerdam at gmail.com
Fri Jul 22 10:13:02 EDT 2005


On 7/21/05, Chris G. Nicholas <cgn at globexplorer.com> wrote:
> Just checked out the latest/greatest CVS snapshot, as well as the same for MrSid DSDK, and not sure why, but mrsiddecode is *_way_* faster for some simple cases; I'm just trying to make thumbnails of some monster images we have sitting around, and doing some simple non-geographic extracts.
> 
> Something like:
> 
> gdal_translate -of jpeg -outsize 500 500 /data/f04/contentapu1/apu01_2ft_aus_2.sid /tmp/foo.jpg
> 
> Input file size is 188933, 47333
> 
> goes out to lunch for several minutes. mrsiddecode does the equivalent in under a second.

Chris, 

The problem has to do with how the data is requested by gdal_translate
from the MrSID driver and how the mrsid driver handles that.  The 
above request results in a call to the CreateCopy() method on the GeoTIFF
driver which then requests the data one scanline at a time.  This results
in 500 individual calls into the MrSID driver asking for a chunk of imagery
to be rendered down to one line.  

Each request results in a separate windowed request into the MrSID library.
And setting up a request window in the MrSID API is relatively expensive.  

If you set the config option GDAL_FORCE_CACHING to YES then GDAL will
instead skip the "optimized" access and read the MrSID data into the block 
cache in fairly big chunks (usually 1024x128 chunks), then satisfy requests
from this cache.  In a test case I ran that was somewhat similar using this
option reduced a gdal_translate run from 26s to 1.5s.  In this case it basically
filled a 767x512 "overview level" cache in 4 MrSID requests.  All scanline 
requests then came from this cache. 

However, in other circumstances going through the CACHE is a bad idea.
For instance, MapServer makes it's entire read in one request, and it is
better to set a window specific to that request and do the read from the
MrSID in one go.  If caching is on, then the request gets broken into alot
of little requests. 

The MrSID driver *attempts* to decide which approach to use with some
heuristics.  Currently the logic is to use block cached IO if the request
is for only one scanline from the input file, or if the total request size
is less than 100 pixels.   It is attempting to treat small requests through the
caching mechanism. 

In our case of reducing to a 500 x 500 file though they request windows
are much more than one line even though the output of the request
is a single scanline.   For instance, with CPL_DEBUG set to ON I 
get reports like this in my case:

MrSID: RasterIO() - using optimized dataset level IO.
MrSID: Dataset:IRasterIO(0,8317 12268x17 -> 767x2 -> 500x1, zoom=16)

This indicates that the application request a 12268x17 window to be
read into a 500x1 buffer.  Because the input window was 17 lines the
logic handles it as a direct windowed read.  BTW, the 767x2 intermediate
buffer is due to having to do the read from a specific intermediate resolution
level from which a futher subsampling step is done.

For highly subsampled requests, I think it would be good for me to 
also look at the size of the output buffer, so I have added the following
rule and committed it this morning. 

    if( nBufYSize == 1 || nBufXSize * ((double) nBufYSize) < 100.0 )
        bUseBlockedIO = TRUE;

This forces "cached io" if the output buffer is only one scanline or
if the output window is less than 100 pixels.  Hopefully this will not
have to many unanticipated negative effects!  

In general, a number of drivers suffer from this sort of "cached/blocked
vs. direct window request" tradeoff.  Notably the ECW, JPEG2000 and
OGDI drivers.   The GDAL_FORCE_CACHING should force caching 
with them all. 

There is also a configuration option for "encouraging" direct windowed
reads.  That is the GDAL_ONE_BIG_READ config option.  This basically
tells the drivers that we are just making one big request so try and
satisfy it efficiently in isolation.  Don't expect it to be part of a pattern of
additional requests.  In effect this is the opposite of forcing cached IO
but it is only noticed but a couple of drivers ... the MrSID driver being
one of them.  So I sometimes advise MapServer uses to set this
config option, and I may make MapServer do it explicitly someday.

BTW, config options can be set as environment variables, or programatically
with CPLSetConfigOption() or on the commandline with the --config option
to most programs. 

eg. 

gdal_translate --config GDAL_ONE_BIG_READ YES  in.tif out.tif

Hopefully this advise will help others with MrSID, ECW and JPEG2000
performance issues to see some of what can happen "under the covers".

Best regards,
-- 
---------------------------------------+--------------------------------------
I set the clouds in motion - turn up   | Frank Warmerdam, warmerdam at pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush    | Geospatial Programmer for Rent




More information about the Gdal-dev mailing list