[GRASS-dev] disabling compression

Jim Regetz regetz at nceas.ucsb.edu
Tue Apr 24 11:02:17 EDT 2012


Chagrined by a performance hit apparently involving zlib compression, I 
patched my local GRASS 7.0 to accept an environment variable that 
disables raster compression. At least for the particular DCELL rasters 
I've been using, this yields a ~5x improvement in run time during write 
operations, at a cost of some extra disk usage that I'm often more than 
happy to incur. See sample timing outputs below my sig.

Admittedly, the speedup factor drops to ~2.5x if the timing comparisons 
include a forced sync to disk, because uncompressed output means more 
IO. But that's still a nice speedup, and the disk IO cost may be of 
little consequence in cases where the raster can fit comfortably in the 
OS page cache and is an intermediate output that gets read back in 
during a subsequent step of a particular processing workflow (and 
perhaps then removed before ever being flushed to disk).

My demo-purposes patch is attached. It just adds a GRASS_NO_COMPRESSION 
environment variable and then injects a new conditional dispatch into 
each of the three Rast_open{_,_fp_,_c_}new functions. For cleaner 
semantics, it might be better to keep the original functions but rename 
them as *_compressed (paralleling the existing *_uncompressed versions) 
for callers who really want/need to force compression (e.g., r.compress, 
which my patch in some sense "breaks" when the environment variable is 
set), but I didn't do this here. And I haven't looked hard to see if 
other modules/etc truly depend on the existing compression behavior.

Any chance something like this could make it into trunk?

As a real world example, I recently wrote a Python module that relies on 
r.mapcalc, r.neighbors, and r.samp.stats. With GRASS_NO_COMPRESSION set, 
total runtime dropped from 20 minutes to 10 minutes on a 12K by 12K 
input raster, with a disk usage differential that peaked at ~4GB during 
processing. Outputs were identical other than compression.

Cheers,
Jim

------------------------------
James Regetz, Ph.D.
Scientific Programmer/Analyst
National Center for Ecological Analysis & Synthesis
735 State St, Suite 300
Santa Barbara, CA 93101


# timings performed on Ubuntu 10.04 with ample RAM and a recent
# build of GRASS 7.0-svn with the applied patch

# describe the 'test' raster used below; based on some 90m SRTM
# data coerced to double precision
GRASS 7.0.svn (tmp):~ > r.info -g test
...
rows=4801
cols=4801
cells=23049601
datatype=DCELL

GRASS 7.0.svn (tmp):~ > r.univar test
total null and non-null cells: 23049601
total null cells: 0

Of the non-null cells:
----------------------
n: 23049601
minimum: 500
maximum: 3139
range: 2639
mean: 1445.04
mean of absolute values: 1445.04
standard deviation: 336.437
...


# using (default) zlib compression on write
GRASS 7.0.svn (tmp):~ > g.gisenv set="OVERWRITE=1"
GRASS 7.0.svn (tmp):~ > g.region rast=test
GRASS 7.0.svn (tmp):~ > unset GRASS_NO_COMPRESSION
GRASS 7.0.svn (tmp):~ > sync; echo 3 > /proc/sys/vm/drop_caches
GRASS 7.0.svn (tmp):~ > time r.mapcalc "foo = test" --quiet

real	0m13.209s
user	0m12.660s
sys	0m0.400s


# after disabling compression on write
GRASS 7.0.svn (tmp):~ > g.gisenv set="OVERWRITE=1"
GRASS 7.0.svn (tmp):~ > g.region rast=test
GRASS 7.0.svn (tmp):~ > export GRASS_NO_COMPRESSION=1
GRASS 7.0.svn (tmp):~ > sync; echo 3 > /proc/sys/vm/drop_caches
GRASS 7.0.svn (tmp):~ > time r.mapcalc "foo = test" --quiet

real	0m2.514s
user	0m2.320s
sys	0m0.170s

-------------- next part --------------
A non-text attachment was scrubbed...
Name: nocompression.patch
Type: text/x-patch
Size: 2062 bytes
Desc: not available
Url : http://lists.osgeo.org/pipermail/grass-dev/attachments/20120424/c200dc38/nocompression-0001.bin


More information about the grass-dev mailing list