[GRASS-dev] [GRASS GIS] #2750: LZ4 when writing raster rows; better than double I/O bound r.mapcalc speed

GRASS GIS trac at osgeo.org
Mon Nov 9 14:07:11 PST 2015

#2750: LZ4 when writing raster rows; better than double I/O bound r.mapcalc speed
  Reporter:  sprice       |      Owner:  grass-dev@…
      Type:  enhancement  |     Status:  new
  Priority:  normal       |  Milestone:  7.1.0
 Component:  Raster       |    Version:  svn-trunk
Resolution:               |   Keywords:  ZLIB LZ4 ZSTD
       CPU:  OSX/Intel    |   Platform:  MacOSX

Comment (by mmetz):

 Replying to [comment:14 glynn]:
 > Replying to [comment:13 wenzeslaus]:
 > > My question is if it wouldn't be more advantageous to create some
 wrapper which would take the all necessary inputs including compression
 type and do the necessary switches and format specific things.
 > Agreed.
 > In practical terms, there are only two distinct cases: uncompressed
 (where the size of the data read or written matches the size of the data
 stored in the file) and compressed (where the sizes differ). Everything
 else is just options.

 Without knowing about this ticket, I have implemented something like this
 recently and added support for LZ4 (and BZIP2) compression to my local
 copy of GRASS trunk. The motivation was to have a compressor that is
 substantially faster than ZLIB but still better than RLE, and another
 compressor that is substantially better (higher compression) than ZLIB but
 not exceedingly slow. XZ with lzma2 is 1) too slow, 2) uses too much
 memory, 3) does not compress binary raster data better than BZIP2.

 In particular, I have added support for new compressors to gislib, not
 rasterlib. As before gislib does the actual compression, not rasterlib. My
 gislib now also handles LZ4 and BZIP2 compression. The actual change to
 rasterlib is to replace `G_zlib_compress()` with `G_compress(..., int
 compressor)` and `G_zlib_expand()` with `G_expand(..., int compressor)`.
 `G_zlib_write()` and `G_zlib_read()` are now `G_write_compressed(..., int
 compressor)` and `G_read_compressed(..., int compressor)`. Here, "..."
 means same arguments as before. The new argument "compressor" is actually
 "cellhd.compressed" with the same meaning as before. The internal function
 `zlib_compress` is no longer needed.

 As before, the compressor type is encoded in cellhd.compressed with
 previously 0: no compression, 1: RLE, 2: ZLIB, now also 3: LZ4, 4: BZIP2.

 r.univar results for CELL, FCELL, and DCELL maps are identical,
 independent of the compressor. The new gislib interface to compress data
 is generic and it is easy to add any other compressor, e.g. LZ4HC or ZSTD.

 Generally, any new compression method should go into gislib and not into
 rasterlib, just like ZLIB compression has been done by gislib. This keeps
 changes to the rasterlib to a minimum and makes debugging easier.

 For fast storage devices with plenty of space, LZ4 is by far the fastest,
 at the same time providing some reasonable compression where possible.

 For slow storage devices, e.g. accessed over network, BZIP2 compression is
 the fastest (yes, faster than LZ4) because the amount of data is the least
 (50% - 70% of ZLIB). That reduces network traffic and saves disk space.
 For my work, it would be a big advantage to use LZ4 for actual processing
 on fast local disks and BZIP2 for storing the final results on sometimes
 very slow network attached storage.

 The compressor type for new raster maps could be selected with one other
 environment variable GRASS_COMPRESSOR, e.g. GRASS_COMPRESSOR=LZ4

 I am not sure about the pro's and con's for using compressors other than
 ZLIB. ZLIB is a good compromise of speed and compression. Adding other
 compressors to G7.1 means that raster data compressed with a new method
 can not be opened by G7.0 or earlier. New compression types should, if
 added to G7.1, be clearly marked as "use it only if you really know what
 you are doing". I would profit from the choice of other compressors, but
 on a standard laptop/desktop system the current G7 default of ZLIB is
 probably the best alround solution.

 I am attaching a patch for trunk r66775 and an archive with new files to
 go to lib/gis

Ticket URL: <https://trac.osgeo.org/grass/ticket/2750#comment:19>
GRASS GIS <https://grass.osgeo.org>

More information about the grass-dev mailing list