[GRASS-dev] [GRASS GIS] #2750: LZ4 when writing raster rows; better than double I/O bound r.mapcalc speed
GRASS GIS
trac at osgeo.org
Tue Nov 10 14:21:31 PST 2015
#2750: LZ4 when writing raster rows; better than double I/O bound r.mapcalc speed
--------------------------+---------------------------
Reporter: sprice | Owner: grass-dev@…
Type: enhancement | Status: new
Priority: normal | Milestone: 7.1.0
Component: Raster | Version: svn-trunk
Resolution: | Keywords: ZLIB LZ4 ZSTD
CPU: OSX/Intel | Platform: MacOSX
--------------------------+---------------------------
Comment (by mmetz):
Replying to [comment:20 wenzeslaus]:
> Replying to [comment:19 mmetz]:
> > I have implemented something like this recently and added support for
LZ4 (and BZIP2) compression to my local copy of GRASS trunk.
> >
> > I am attaching a patch for trunk r66775 and an archive with new files
to go to lib/gis
>
> The design in the patch looks really good. I did tests and benchmark but
it was not as successful as I hoped for.
>
> The benchmark was the same as [comment:10 before] but modified for this
patch. It is more for testing than benchmark anyway. It was on 30,000,000
cells but perhaps the previous one was on more and it is not completely
precise overall due to some other computations running at the same time
(although the result is from 10 runs aggregated by ''perf'').
>
> || type || write || read ||
> || NONE || 2.58 || 0.72 ||
> || ZLIB || 1.52 || 0.93 ||
> || LZ4 || 1.56 || 0.85 ||
>
Some more explanation about the proposed new mechanism:
The proposed `G_compress()` interface provides a generic mechanism to data
compression in libgis, not restricted to raster data but generic. Built-in
compression methods would be no compression, RLE, ZLIB, ZL4. BZIP2
compression would be available if GRASS is configured --with-bzlib. Other
compression methods could be added by cloning lib/gis/flate.c and adding
new `G_*_compress()` and `G_*_expand()` functions to
lib/gis/compress.[h|c]. The raster lib does not need to be modified any
more.
As before, the rasterlib makes only partial use of the generic compression
methods: no compression and RLE is handled by the rasterlib internally,
and RLE is not supported for fp maps. Creating uncompressed raster maps
has been and should be only possible with `Rast_open_new_uncompressed()`.
That means that the behaviour of the rasterlib with using
GRASS_COMPRESSOR=NONE needs to be defined: really use no compression or
use default compression instead? Using GRASS_COMPRESSOR=RLE affects only
new CELL maps. For fp maps, compression_type = 1 (RLE) is as before
interpreted as ZLIB compression.
If you want to know the original amount of data passed to any compressor,
you need to use
{{{
GRASS_COMPRESSOR=ZLIB
GRASS_ZLIB_LEVEL=0
}}}
ZLIB level = 0 tells ZLIB to copy the data as is from source to
destination. With CELL maps, the rasterlib will then still trim high zero
bytes with trim_bytes() which can already reduce the data size
considerably, but ZLIB will not compress the data.
I modified gislib_compressor_benchmark.sh to use
{{{
GRASS_COMPRESSOR=ZLIB
GRASS_ZLIB_LEVEL=0
}}}
for no compression and discarded RLE because it is inefficient for CELL
maps and not supported for fp maps. I tested also ZLIB levels 1 (fastest)
and 6 (ZLIB default).
I used the nc_basic_spm_grass7 location and set the region with
{{{
g.region -p rast=elevation res=2.5
}}}
resulting in 32,000,000 cells
The test raster was generated with
{{{
r.mapcalc expression="test_rast_z_base = rand(double(-200.), 900)"
seed=100
}}}
random numbers very difficult to compress.
The write and read columns in the tables below have seconds as unit.
|| compressor || size MB || size % || write || read ||
|| NONE || 259.2 || 100 || 5.2 || 1.4 ||
|| ZLIB 1 || 247.9 || 95.6 || 14.3 || 2.5 ||
|| ZLIB 6 || 246.9 || 95.3 || 16.3 || 2.4 ||
|| LZ4 || 259.2 || 100 || 4.5 || 1.1 ||
|| BZIP2 || 249.4 || 96.2 || 63.4 || 19.9 ||
LZ4 is the fastest method, no method is really the best because these
random numbers could not be compressed to less than 95% of the original
size.
The next test raster was generated with
{{{
r.mapcalc expression="test_rast2_z_base = elevation"
}}}
which had 4x4 blocks of identical raster values, should be easy to
compress
|| compressor || size MB || size % || write || read ||
|| NONE || 259.2 || 100 || 4.2 || 1.1 ||
|| ZLIB 1 || 41.8 || 16.1 || 4.7 || 1.6 ||
|| ZLIB 6 || 32.1 || 12.4 || 11.1 || 1.4 ||
|| LZ4 || 71.5 || 27.6 || 2.2 || 0.9 ||
|| BZIP2 || 49.8 || 19.2 || 28.0 || 7.1 ||
LZ4 was again the fastest, and the best was ZLIB level 6. Here, the
performance of BZIP2 was not convincing: by far the slowest and not as
good as ZLIB.
Then I tested with MODIS land surface temperature for Europe, a bit more
than 400,000,000 cells:
LST as CELL
|| compressor || size MB || size % || write || read ||
|| NONE || 829 || 100 || 28.5 || 14.8 ||
|| ZLIB 1 || 269 || 32.4 || 30.5 || 14.9 ||
|| ZLIB 6 || 261 || 31.5 || 36.7 || 16.5 ||
|| LZ4 || 366 || 44.1 || 20.0 || 13.0 ||
|| BZIP2 || 175 || 21.1 || 89.8 || 29.2 ||
LZ4 was the fastest, BZIP2 was the best.
LST as DCELL
|| compressor || size MB || size % || write || read ||
|| NONE || 3300 || 100 || 85.5 || 52.9 ||
|| ZLIB 1 || 503 || 15.2 || 62.7 || 23.9 ||
|| ZLIB 6 || 370 || 11.2 || 129.8 || 21.6 ||
|| LZ4 || 629 || 19.1 || 29.5 || 13.9 ||
|| BZIP2 || 196 || 5.9 || 221 || 51.1 ||
Again, LZ4 was the fastest, BZIP2 was the best.
I am interested in having BZIP2 because for these LST data it compresses
30 - 50% better than the second best (ZLIB level 6).
--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2750#comment:23>
GRASS GIS <https://grass.osgeo.org>
More information about the grass-dev
mailing list