[gdal-dev] Re: CUDA PyCUDA and GDAL

Mon Dec 7 13:24:57 EST 2009

Doug_Newcomb at fws.gov wrote:
> 
>  >Hi Doug,
> 
>  >I finally tried your parameters and they did work fine for me also.  I
>  >had something like  hundred geotiffs, 400 MB each, and I was pushing
>  >them to bigtiff mosaic. I tried first with your *.tif selection and then
>  >again by using a virtual raster file as source, created from Mapserver
>  >tileindex shapefile with gdalbuildvrt. My Windows computer was handling
>  >about 20 GB/hour with cubic resampling (-rc) this time.  Parameters
> ?-wo "SKIP_NOSOURCE"  --config "GDAL_CACHEMAX=500" -wm=5000
>  >seem to have a big influence on efficiency.  I wonder if there are some
>  >rules of thumb for selecting values of GDAL_CACHEMAX and -wm.  You said
>  >cachemax is good to be close to the maximum input file size, how about
>  >-wm?
> 
>  >-Jukka Rahkonen-
> 
> Jukka,
> I remember seeing someone mention on the mailing list ( Can't recall who 
> at the moment) that setting GDAL_CACHEMAX close to the maximum size of 
> the input files gave the best performance.  I did try bumping 
> GDAL_CACHEMAX up to 2000 to see what would happen ( while dropping -wm 
> down to 3000, only 6GB of RAM on that computer) , but none of the input 
> files I was processing were larger than 500MB and I saw no increase in 
> performance.  For the -wm parameter, I just gave it the rest of the RAM 
> available on the computer, and I did not benchmark while varying that 
> number.
> 
> Doug
>  
> 
> Doug Newcomb
> USFWS
> Raleigh, NC
> 919-856-4520 ext. 14 doug_newcomb at fws.gov
> ---------------------------------------------------------------------------------------------------------
> The opinions I express are my own and are not representative of the 
> official policy of the U.S.Fish and Wildlife Service or Dept. of the 
> Interior. Life is too short for undocumented, proprietary data formats.
> Inactive hide details for "Rahkonen Jukka" 
> <Jukka.Rahkonen at mmmtike.fi>"Rahkonen Jukka" <Jukka.Rahkonen at mmmtike.fi>
> 
> 
>                         *"Rahkonen Jukka" <Jukka.Rahkonen at mmmtike.fi>*
> 
>                         12/07/2009 05:06 AM
> 
> 	
> 
> To
> 	
> <gdal-dev at lists.osgeo.org>, <Doug_Newcomb at fws.gov>
> 
> cc
> 	
> 
> Subject
> 	
> Re: CUDA PyCUDA and GDAL
> 
> 	
> 
> 
>  >  <Doug_Newcomb <at> fws.gov> writes:
>  >
>  > Hi Folks,Here's the gdal command (gdal 1.6.2) I used to merge ~3500 1
> meter NAIP
>  > quarter quads (uncompressed geotiff TIFF) in 3 UTM projections into
> one Bigtiff Image
>  > in the USGS Albers projection.  It took about 15 hours ( on a 3 year
> old Intel Core2
>  > Duo 64 bit Centos 5.3 Linux box with 6GB RAM) and created an
> uncompressed, tiled,
>  > bigtiff file of 485 GB. About 32 GB/hr.
>  > gdalwarp -t_srs "+proj=aea +lat_1=29.5+lat_2=45.5 +lat_0=23.0
> +lon_0=-96
>  > +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m no_defs <>"
>  > -wo "SKIP_NOSOURCE"  --config "GDAL_CACHEMAX=500" -wm=5000
>  > -co "TILED=YES" */*.tif /biggis/albers/nc_naip2008.tif
>  >
>  > In the above command, -t_srs "+proj=aea +lat_1=29.5 +lat_2=45.5
> +lat_0=23.0
>  > +lon_0=-96 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m no_defs
> <>"
>  > indicates the target projection, -wo "SKIP_NOSOURCE" don't write
>  > in areas for which there is no data for the current file,
>  > --config "GDAL_CACHEMAX=500" set the cache memory at 500MB ( set this
> close
>  > to the maximum input file size), -wm=5000 set the warp memory to
> 5000MB ,
>  > -co "TILED=YES" create a tiled tiff as output, */*.tif, use all of the
> tiffs
>  > in all of the subdirectories as input files ( in this case there was
> one
>  > directory for each of the 3 utm zones) ,
> /biggis/albers/nc_naip2008.tif,
>  > the output file name and location.
>  >
> 
> Hi Doug,
> 
> I finally tried your parameters and they did work fine for me also.  I
> had something like  hundred geotiffs, 400 MB each, and I was pushing
> them to bigtiff mosaic. I tried first with your *.tif selection and then
> again by using a virtual raster file as source, created from Mapserver
> tileindex shapefile with gdalbuildvrt. My Windows computer was handling
> about 20 GB/hour with cubic resampling (-rc) this time.  Parameters
> -wo "SKIP_NOSOURCE"  --config "GDAL_CACHEMAX=500" -wm=5000
> seem to have a big influence on efficiency.  I wonder if there are some
> rules of thumb for selecting values of GDAL_CACHEMAX and -wm.  You said
> cachemax is good to be close to the maximum input file size, how about
> -wm?

Folks,

I would note that large -wm values can be very counter productive when
used in combination with SKIP_NOSOURCE.  The problem is that the larger
the chunk size, you run into the chance that a large window will intersect
a small amount of data and the whole window ends up being processed - most
of it without any real work to do.

Best regards,
-- 
---------------------------------------+--------------------------------------
I set the clouds in motion - turn up   | Frank Warmerdam, warmerdam at pobox.com
light and sound - activate the windows | http://pobox.com/~warmerdam
and watch the world go round - Rush    | Geospatial Programmer for Rent