[GRASS-dev] disabling compression
Jim Regetz
regetz at nceas.ucsb.edu
Tue Apr 24 11:02:17 EDT 2012
Chagrined by a performance hit apparently involving zlib compression, I
patched my local GRASS 7.0 to accept an environment variable that
disables raster compression. At least for the particular DCELL rasters
I've been using, this yields a ~5x improvement in run time during write
operations, at a cost of some extra disk usage that I'm often more than
happy to incur. See sample timing outputs below my sig.
Admittedly, the speedup factor drops to ~2.5x if the timing comparisons
include a forced sync to disk, because uncompressed output means more
IO. But that's still a nice speedup, and the disk IO cost may be of
little consequence in cases where the raster can fit comfortably in the
OS page cache and is an intermediate output that gets read back in
during a subsequent step of a particular processing workflow (and
perhaps then removed before ever being flushed to disk).
My demo-purposes patch is attached. It just adds a GRASS_NO_COMPRESSION
environment variable and then injects a new conditional dispatch into
each of the three Rast_open{_,_fp_,_c_}new functions. For cleaner
semantics, it might be better to keep the original functions but rename
them as *_compressed (paralleling the existing *_uncompressed versions)
for callers who really want/need to force compression (e.g., r.compress,
which my patch in some sense "breaks" when the environment variable is
set), but I didn't do this here. And I haven't looked hard to see if
other modules/etc truly depend on the existing compression behavior.
Any chance something like this could make it into trunk?
As a real world example, I recently wrote a Python module that relies on
r.mapcalc, r.neighbors, and r.samp.stats. With GRASS_NO_COMPRESSION set,
total runtime dropped from 20 minutes to 10 minutes on a 12K by 12K
input raster, with a disk usage differential that peaked at ~4GB during
processing. Outputs were identical other than compression.
Cheers,
Jim
------------------------------
James Regetz, Ph.D.
Scientific Programmer/Analyst
National Center for Ecological Analysis & Synthesis
735 State St, Suite 300
Santa Barbara, CA 93101
# timings performed on Ubuntu 10.04 with ample RAM and a recent
# build of GRASS 7.0-svn with the applied patch
# describe the 'test' raster used below; based on some 90m SRTM
# data coerced to double precision
GRASS 7.0.svn (tmp):~ > r.info -g test
...
rows=4801
cols=4801
cells=23049601
datatype=DCELL
GRASS 7.0.svn (tmp):~ > r.univar test
total null and non-null cells: 23049601
total null cells: 0
Of the non-null cells:
----------------------
n: 23049601
minimum: 500
maximum: 3139
range: 2639
mean: 1445.04
mean of absolute values: 1445.04
standard deviation: 336.437
...
# using (default) zlib compression on write
GRASS 7.0.svn (tmp):~ > g.gisenv set="OVERWRITE=1"
GRASS 7.0.svn (tmp):~ > g.region rast=test
GRASS 7.0.svn (tmp):~ > unset GRASS_NO_COMPRESSION
GRASS 7.0.svn (tmp):~ > sync; echo 3 > /proc/sys/vm/drop_caches
GRASS 7.0.svn (tmp):~ > time r.mapcalc "foo = test" --quiet
real 0m13.209s
user 0m12.660s
sys 0m0.400s
# after disabling compression on write
GRASS 7.0.svn (tmp):~ > g.gisenv set="OVERWRITE=1"
GRASS 7.0.svn (tmp):~ > g.region rast=test
GRASS 7.0.svn (tmp):~ > export GRASS_NO_COMPRESSION=1
GRASS 7.0.svn (tmp):~ > sync; echo 3 > /proc/sys/vm/drop_caches
GRASS 7.0.svn (tmp):~ > time r.mapcalc "foo = test" --quiet
real 0m2.514s
user 0m2.320s
sys 0m0.170s
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nocompression.patch
Type: text/x-patch
Size: 2062 bytes
Desc: not available
Url : http://lists.osgeo.org/pipermail/grass-dev/attachments/20120424/c200dc38/nocompression-0001.bin
More information about the grass-dev
mailing list