[GRASS-dev] [GRASS GIS] #2750: LZ4 when writing raster rows; better than double I/O bound r.mapcalc speed
GRASS GIS
trac at osgeo.org
Mon Nov 9 14:07:11 PST 2015
#2750: LZ4 when writing raster rows; better than double I/O bound r.mapcalc speed
--------------------------+---------------------------
Reporter: sprice | Owner: grass-dev@…
Type: enhancement | Status: new
Priority: normal | Milestone: 7.1.0
Component: Raster | Version: svn-trunk
Resolution: | Keywords: ZLIB LZ4 ZSTD
CPU: OSX/Intel | Platform: MacOSX
--------------------------+---------------------------
Comment (by mmetz):
Replying to [comment:14 glynn]:
> Replying to [comment:13 wenzeslaus]:
>
> > My question is if it wouldn't be more advantageous to create some
wrapper which would take the all necessary inputs including compression
type and do the necessary switches and format specific things.
>
> Agreed.
>
> In practical terms, there are only two distinct cases: uncompressed
(where the size of the data read or written matches the size of the data
stored in the file) and compressed (where the sizes differ). Everything
else is just options.
Without knowing about this ticket, I have implemented something like this
recently and added support for LZ4 (and BZIP2) compression to my local
copy of GRASS trunk. The motivation was to have a compressor that is
substantially faster than ZLIB but still better than RLE, and another
compressor that is substantially better (higher compression) than ZLIB but
not exceedingly slow. XZ with lzma2 is 1) too slow, 2) uses too much
memory, 3) does not compress binary raster data better than BZIP2.
In particular, I have added support for new compressors to gislib, not
rasterlib. As before gislib does the actual compression, not rasterlib. My
gislib now also handles LZ4 and BZIP2 compression. The actual change to
rasterlib is to replace `G_zlib_compress()` with `G_compress(..., int
compressor)` and `G_zlib_expand()` with `G_expand(..., int compressor)`.
`G_zlib_write()` and `G_zlib_read()` are now `G_write_compressed(..., int
compressor)` and `G_read_compressed(..., int compressor)`. Here, "..."
means same arguments as before. The new argument "compressor" is actually
"cellhd.compressed" with the same meaning as before. The internal function
`zlib_compress` is no longer needed.
As before, the compressor type is encoded in cellhd.compressed with
previously 0: no compression, 1: RLE, 2: ZLIB, now also 3: LZ4, 4: BZIP2.
r.univar results for CELL, FCELL, and DCELL maps are identical,
independent of the compressor. The new gislib interface to compress data
is generic and it is easy to add any other compressor, e.g. LZ4HC or ZSTD.
Generally, any new compression method should go into gislib and not into
rasterlib, just like ZLIB compression has been done by gislib. This keeps
changes to the rasterlib to a minimum and makes debugging easier.
For fast storage devices with plenty of space, LZ4 is by far the fastest,
at the same time providing some reasonable compression where possible.
For slow storage devices, e.g. accessed over network, BZIP2 compression is
the fastest (yes, faster than LZ4) because the amount of data is the least
(50% - 70% of ZLIB). That reduces network traffic and saves disk space.
For my work, it would be a big advantage to use LZ4 for actual processing
on fast local disks and BZIP2 for storing the final results on sometimes
very slow network attached storage.
The compressor type for new raster maps could be selected with one other
environment variable GRASS_COMPRESSOR, e.g. GRASS_COMPRESSOR=LZ4
I am not sure about the pro's and con's for using compressors other than
ZLIB. ZLIB is a good compromise of speed and compression. Adding other
compressors to G7.1 means that raster data compressed with a new method
can not be opened by G7.0 or earlier. New compression types should, if
added to G7.1, be clearly marked as "use it only if you really know what
you are doing". I would profit from the choice of other compressors, but
on a standard laptop/desktop system the current G7 default of ZLIB is
probably the best alround solution.
I am attaching a patch for trunk r66775 and an archive with new files to
go to lib/gis
--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2750#comment:19>
GRASS GIS <https://grass.osgeo.org>
More information about the grass-dev
mailing list