[GRASS-dev] [GRASS GIS] #2750: LZ4 when writing raster rows; better than double I/O bound r.mapcalc speed

GRASS GIS trac at osgeo.org
Tue Nov 10 14:21:31 PST 2015


#2750: LZ4 when writing raster rows; better than double I/O bound r.mapcalc speed
--------------------------+---------------------------
  Reporter:  sprice       |      Owner:  grass-dev@…
      Type:  enhancement  |     Status:  new
  Priority:  normal       |  Milestone:  7.1.0
 Component:  Raster       |    Version:  svn-trunk
Resolution:               |   Keywords:  ZLIB LZ4 ZSTD
       CPU:  OSX/Intel    |   Platform:  MacOSX
--------------------------+---------------------------

Comment (by mmetz):

 Replying to [comment:20 wenzeslaus]:
 > Replying to [comment:19 mmetz]:
 > > I have implemented something like this recently and added support for
 LZ4 (and BZIP2) compression to my local copy of GRASS trunk.
 > >
 > > I am attaching a patch for trunk r66775 and an archive with new files
 to go to lib/gis
 >
 > The design in the patch looks really good. I did tests and benchmark but
 it was not as successful as I hoped for.
 >
 > The benchmark was the same as [comment:10 before] but modified for this
 patch. It is more for testing than benchmark anyway. It was on 30,000,000
 cells but perhaps the previous one was on more and it is not completely
 precise overall due to some other computations running at the same time
 (although the result is from 10 runs aggregated by ''perf'').
 >
 > || type || write || read ||
 > || NONE || 2.58 || 0.72 ||
 > || ZLIB || 1.52 || 0.93 ||
 > || LZ4 || 1.56 || 0.85 ||
 >

 Some more explanation about the proposed new mechanism:

 The proposed `G_compress()` interface provides a generic mechanism to data
 compression in libgis, not restricted to raster data but generic. Built-in
 compression methods would be no compression, RLE, ZLIB, ZL4. BZIP2
 compression would be available if GRASS is configured --with-bzlib. Other
 compression methods could be added by cloning lib/gis/flate.c and adding
 new `G_*_compress()` and `G_*_expand()` functions to
 lib/gis/compress.[h|c]. The raster lib does not need to be modified any
 more.

 As before, the rasterlib makes only partial use of the generic compression
 methods: no compression and RLE is handled by the rasterlib internally,
 and RLE is not supported for fp maps. Creating uncompressed raster maps
 has been and should be only possible with `Rast_open_new_uncompressed()`.

 That means that the behaviour of the rasterlib with using
 GRASS_COMPRESSOR=NONE needs to be defined: really use no compression or
 use default compression instead? Using GRASS_COMPRESSOR=RLE affects only
 new CELL maps. For fp maps, compression_type = 1 (RLE) is as before
 interpreted as ZLIB compression.

 If you want to know the original amount of data passed to any compressor,
 you need to use

 {{{
 GRASS_COMPRESSOR=ZLIB
 GRASS_ZLIB_LEVEL=0
 }}}

 ZLIB level = 0 tells ZLIB to copy the data as is from source to
 destination. With CELL maps, the rasterlib will then still trim high zero
 bytes with trim_bytes() which can already reduce the data size
 considerably, but ZLIB will not compress the data.

 I modified gislib_compressor_benchmark.sh to use
 {{{
 GRASS_COMPRESSOR=ZLIB
 GRASS_ZLIB_LEVEL=0
 }}}
 for no compression and discarded RLE because it is inefficient for CELL
 maps and not supported for fp maps. I tested also ZLIB levels 1 (fastest)
 and 6 (ZLIB default).

 I used the nc_basic_spm_grass7 location and set the region with
 {{{
 g.region -p rast=elevation res=2.5
 }}}
 resulting in 32,000,000 cells

 The test raster was generated with
 {{{
 r.mapcalc expression="test_rast_z_base = rand(double(-200.), 900)"
 seed=100
 }}}
 random numbers very difficult to compress.

 The write and read columns in the tables below have seconds as unit.

 || compressor || size MB || size % || write || read ||
 || NONE || 259.2 || 100 || 5.2 || 1.4 ||
 || ZLIB 1 || 247.9 || 95.6 || 14.3 || 2.5 ||
 || ZLIB 6 || 246.9 || 95.3 || 16.3 || 2.4 ||
 || LZ4 || 259.2 || 100 || 4.5 || 1.1 ||
 || BZIP2 || 249.4 || 96.2 || 63.4 || 19.9 ||
 LZ4 is the fastest method, no method is really the best because these
 random numbers could not be compressed to less than 95% of the original
 size.

 The next test raster was generated with
 {{{
 r.mapcalc expression="test_rast2_z_base = elevation"
 }}}
 which had 4x4 blocks of identical raster values, should be easy to
 compress

 || compressor || size MB || size % || write || read ||
 || NONE || 259.2 || 100 || 4.2 || 1.1 ||
 || ZLIB 1 || 41.8 || 16.1 || 4.7 || 1.6 ||
 || ZLIB 6 || 32.1 || 12.4 || 11.1 || 1.4 ||
 || LZ4 || 71.5 || 27.6 || 2.2 || 0.9 ||
 || BZIP2 || 49.8 || 19.2 || 28.0 || 7.1 ||

 LZ4 was again the fastest, and the best was ZLIB level 6. Here, the
 performance of BZIP2 was not convincing: by far the slowest and not as
 good as ZLIB.

 Then I tested with MODIS land surface temperature for Europe, a bit more
 than 400,000,000 cells:

 LST as CELL
 || compressor || size MB || size % || write || read ||
 || NONE || 829 || 100 || 28.5 || 14.8 ||
 || ZLIB 1 || 269 || 32.4 || 30.5 || 14.9 ||
 || ZLIB 6 || 261 || 31.5 || 36.7 || 16.5 ||
 || LZ4 || 366 || 44.1 || 20.0 || 13.0 ||
 || BZIP2 || 175 || 21.1 || 89.8 || 29.2 ||

 LZ4 was the fastest, BZIP2 was the best.

 LST as DCELL
 || compressor || size MB || size % || write || read ||
 || NONE || 3300 || 100 || 85.5 || 52.9 ||
 || ZLIB 1 || 503 || 15.2 || 62.7 || 23.9 ||
 || ZLIB 6 || 370 || 11.2 || 129.8 || 21.6 ||
 || LZ4 || 629 || 19.1 || 29.5 || 13.9 ||
 || BZIP2 || 196 || 5.9 || 221 || 51.1 ||

 Again, LZ4 was the fastest, BZIP2 was the best.

 I am interested in having BZIP2 because for these LST data it compresses
 30 - 50% better than the second best (ZLIB level 6).

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2750#comment:23>
GRASS GIS <https://grass.osgeo.org>



More information about the grass-dev mailing list