[GRASS-dev] disabling compression
Benjamin Ducke
benducke at fastmail.fm
Wed Apr 25 04:25:20 EDT 2012
According to the docs, GRASS_INT_ZLIB can be set to use
LZW instead of RLE compression. But as far as I can see,
there is not yet a way to completely prevent rasters from
being compressed at creation time?
Ben
On 04/25/2012 09:24 AM, Maris Nartiss wrote:
> There's already an existing GRASS_INT_ZLIB env variable. There should
> be only one env variable to enable/disable raster compression.
>
> Just my 0.002 Verizon cents.
> Maris.
>
> 2012. gada 24. aprīlis 18:02 Jim Regetz<regetz at nceas.ucsb.edu> rakstīja:
>> Chagrined by a performance hit apparently involving zlib compression, I
>> patched my local GRASS 7.0 to accept an environment variable that disables
>> raster compression. At least for the particular DCELL rasters I've been
>> using, this yields a ~5x improvement in run time during write operations, at
>> a cost of some extra disk usage that I'm often more than happy to incur. See
>> sample timing outputs below my sig.
>>
>> Admittedly, the speedup factor drops to ~2.5x if the timing comparisons
>> include a forced sync to disk, because uncompressed output means more IO.
>> But that's still a nice speedup, and the disk IO cost may be of little
>> consequence in cases where the raster can fit comfortably in the OS page
>> cache and is an intermediate output that gets read back in during a
>> subsequent step of a particular processing workflow (and perhaps then
>> removed before ever being flushed to disk).
>>
>> My demo-purposes patch is attached. It just adds a GRASS_NO_COMPRESSION
>> environment variable and then injects a new conditional dispatch into each
>> of the three Rast_open{_,_fp_,_c_}new functions. For cleaner semantics, it
>> might be better to keep the original functions but rename them as
>> *_compressed (paralleling the existing *_uncompressed versions) for callers
>> who really want/need to force compression (e.g., r.compress, which my patch
>> in some sense "breaks" when the environment variable is set), but I didn't
>> do this here. And I haven't looked hard to see if other modules/etc truly
>> depend on the existing compression behavior.
>>
>> Any chance something like this could make it into trunk?
>>
>> As a real world example, I recently wrote a Python module that relies on
>> r.mapcalc, r.neighbors, and r.samp.stats. With GRASS_NO_COMPRESSION set,
>> total runtime dropped from 20 minutes to 10 minutes on a 12K by 12K input
>> raster, with a disk usage differential that peaked at ~4GB during
>> processing. Outputs were identical other than compression.
>>
>> Cheers,
>> Jim
>>
>> ------------------------------
>> James Regetz, Ph.D.
>> Scientific Programmer/Analyst
>> National Center for Ecological Analysis& Synthesis
>> 735 State St, Suite 300
>> Santa Barbara, CA 93101
>>
>>
>> # timings performed on Ubuntu 10.04 with ample RAM and a recent
>> # build of GRASS 7.0-svn with the applied patch
>>
>> # describe the 'test' raster used below; based on some 90m SRTM
>> # data coerced to double precision
>> GRASS 7.0.svn (tmp):~> r.info -g test
>> ...
>> rows=4801
>> cols=4801
>> cells=23049601
>> datatype=DCELL
>>
>> GRASS 7.0.svn (tmp):~> r.univar test
>> total null and non-null cells: 23049601
>> total null cells: 0
>>
>> Of the non-null cells:
>> ----------------------
>> n: 23049601
>> minimum: 500
>> maximum: 3139
>> range: 2639
>> mean: 1445.04
>> mean of absolute values: 1445.04
>> standard deviation: 336.437
>> ...
>>
>>
>> # using (default) zlib compression on write
>> GRASS 7.0.svn (tmp):~> g.gisenv set="OVERWRITE=1"
>> GRASS 7.0.svn (tmp):~> g.region rast=test
>> GRASS 7.0.svn (tmp):~> unset GRASS_NO_COMPRESSION
>> GRASS 7.0.svn (tmp):~> sync; echo 3> /proc/sys/vm/drop_caches
>> GRASS 7.0.svn (tmp):~> time r.mapcalc "foo = test" --quiet
>>
>> real 0m13.209s
>> user 0m12.660s
>> sys 0m0.400s
>>
>>
>> # after disabling compression on write
>> GRASS 7.0.svn (tmp):~> g.gisenv set="OVERWRITE=1"
>> GRASS 7.0.svn (tmp):~> g.region rast=test
>> GRASS 7.0.svn (tmp):~> export GRASS_NO_COMPRESSION=1
>> GRASS 7.0.svn (tmp):~> sync; echo 3> /proc/sys/vm/drop_caches
>> GRASS 7.0.svn (tmp):~> time r.mapcalc "foo = test" --quiet
>>
>> real 0m2.514s
>> user 0m2.320s
>> sys 0m0.170s
>>
>>
>> _______________________________________________
>> grass-dev mailing list
>> grass-dev at lists.osgeo.org
>> http://lists.osgeo.org/mailman/listinfo/grass-dev
> _______________________________________________
> grass-dev mailing list
> grass-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/grass-dev
--
Benjamin Ducke
{*} Geospatial Consultant
{*} GIS Developer
benducke at fastmail.fm
More information about the grass-dev
mailing list