[GRASS-dev] [GRASS GIS] #2750: LZ4 when writing raster rows; better than double I/O bound r.mapcalc speed

GRASS GIS trac at osgeo.org
Sat Sep 26 16:08:11 PDT 2015


#2750: LZ4 when writing raster rows; better than double I/O bound r.mapcalc speed
---------------------------+-------------------------
 Reporter:  sprice         |      Owner:  grass-dev@…
     Type:  enhancement    |     Status:  new
 Priority:  normal         |  Milestone:  7.1.0
Component:  Raster         |    Version:  svn-trunk
 Keywords:  ZLIB LZ4 ZSTD  |        CPU:  OSX/Intel
 Platform:  MacOSX         |
---------------------------+-------------------------
 I've added the ability for reading/writing raster rows in compression
 formats LZ4, LZ4HC, & ZSTD (in addition to the existing RLE & ZLIB.) These
 new algorithms are extremely fast (an order of magnitude faster than ZLIB)
 and are a better fit for modern, fast, hard drives & SSDs.

 I've attached a .tgz file with the necessary added & changed files. The
 new algorithms can be enabled/disabled with environment vars the same way
 as ZLIB, as described in the r.compress documentation:
 https://grass.osgeo.org/grass70/manuals/r.compress.html

 Algorithm summary:

 LZ4 produces a slightly worse compression ratio than ZLIB, but it
 compresses about an order of magnitude faster than ZLIB. It decompresses
 even faster.

 LZ4-HC is supposed to produce a compression ratio similar to ZLIB, and at
 about the same speed as ZLIB. It decompresses as fast as the regular LZ4
 (>2 GB/s). Unfortunately, the improved compression ratio doesn't show in
 my tests, probably due to the fact we're compressing each row
 individually. This may change against floating point data if someone wants
 to test it.

 ZSTD is a new algorithm by the author of LZ4 that is intended to replace
 ZLIB. It compresses and decompresses extremely quickly, while maintaining
 a similar compression ratio as ZLIB. Unfortunately, it's still in beta.

 They are all under the BSD license. Links below show more info &
 performance numbers.

 https://github.com/Cyan4973/zstd

 https://github.com/Cyan4973/lz4

 http://www.lz4.org


 I recommend incorporating the attached changes into GRASS, and leaving it
 as optional for users. At some point in the future, after testing, GRASS
 should move to using ZSTD as default. Power users who want the best
 performance '''now''' (and have the disk space) can use LZ4 immediately.

 It would be trivial to further alter get_row.c & put_row.c to use LZ4 for
 floating point compression. (And I would recommend someone do that.)

 Note: I've decided to use LZ4_decompress_fast() instead of
 LZ4_decompress_safe(). In my test, it was noticeably faster. According to
 the documentation, it leaves LZ4 open to a malicious attack. If this is a
 serious concern in the GRASS GIS internal data structures, change the
 commenting in get_row.c to use the safer code.

 On my computer, I've better than halved (!) the runtime of a r.mapcalc
 identity operation when using LZ4. Below are my tests while working with a
 RapidEye scene.

 {{{
 > time r.mapcalc expression="out_test_zlib=out_test_lz4hc" --overwrite
  100%

 real    3m33.503s
 user    3m25.451s
 sys     0m6.750s
 > time r.mapcalc expression="out_test_zlib=out_test_lz4hc" --overwrite
  100%

 real    3m34.398s
 user    3m26.684s
 sys     0m6.138s
 > export GRASS_INT_LZ4=1
 > time r.mapcalc expression="out_test_lz4=out_test_lz4hc" --overwrite
  100%

 real    1m31.222s
 user    1m25.379s
 sys     0m5.035s
 > time r.mapcalc expression="out_test_lz4=out_test_lz4hc" --overwrite
  100%

 real    1m29.792s
 user    1m24.029s
 sys     0m4.858s
 > unset GRASS_INT_LZ4
 > export GRASS_INT_LZ4HC=1
 > time r.mapcalc expression="out_test_lz4hc2=out_test_lz4hc" --overwrite
  100%

 real    3m5.332s
 user    2m58.610s
 sys     0m5.603s
 > time r.mapcalc expression="out_test_lz4hc2=out_test_lz4hc" --overwrite
  100%

 real    3m3.710s
 user    2m56.606s
 sys     0m5.858s
 > unset GRASS_INT_LZ4HC
 > export GRASS_INT_ZSTD=1
 > time r.mapcalc expression="out_test_zstd=out_test_lz4hc" --overwrite
  100%

 real    1m38.322s
 user    1m32.654s
 sys     0m4.897s
 > time r.mapcalc expression="out_test_zstd=out_test_lz4hc" --overwrite
  100%

 real    1m42.370s
 user    1m35.487s
 sys     0m5.282s
 > unset GRASS_INT_ZSTD
 > ls -l vrt_test/PERMANENT/cell/out_test_*
 -rw-r--r--  1 sprice  staff  4080217012 Sep 26 14:01
 vrt_test/PERMANENT/cell/out_test_lz4
 -rw-r--r--  1 sprice  staff  4069728048 Sep 26 13:34
 vrt_test/PERMANENT/cell/out_test_lz4hc
 -rw-r--r--  1 sprice  staff  4069728048 Sep 26 14:08
 vrt_test/PERMANENT/cell/out_test_lz4hc2
 -rw-r--r--  1 sprice  staff  3737100577 Sep 26 13:57
 vrt_test/PERMANENT/cell/out_test_zlib
 -rw-r--r--  1 sprice  staff  3811356101 Sep 26 14:12
 vrt_test/PERMANENT/cell/out_test_zstd
 > time r.univar out_test_zlib
  100%
 total null and non-null cells: 3526771952
 total null cells: 1502448926

 Of the non-null cells:
 ----------------------
 n: 2024323026
 minimum: 807
 maximum: 32767
 range: 31960
 mean: 9385.79
 mean of absolute values: 9385.79
 standard deviation: 6620.52
 variance: 4.38312e+07
 variation coefficient: 70.5377 %
 sum: 18999862195879

 real    1m29.980s
 user    1m27.111s
 sys     0m2.589s
 > time r.univar out_test_lz4
  100%
 total null and non-null cells: 3526771952
 total null cells: 1502448926

 Of the non-null cells:
 ----------------------
 n: 2024323026
 minimum: 807
 maximum: 32767
 range: 31960
 mean: 9385.79
 mean of absolute values: 9385.79
 standard deviation: 6620.52
 variance: 4.38312e+07
 variation coefficient: 70.5377 %
 sum: 18999862195879

 real    1m9.883s
 user    1m7.559s
 sys     0m2.210s
 > time r.univar out_test_lz4hc
  100%
 total null and non-null cells: 3526771952
 total null cells: 1502448926

 Of the non-null cells:
 ----------------------
 n: 2024323026
 minimum: 807
 maximum: 32767
 range: 31960
 mean: 9385.79
 mean of absolute values: 9385.79
 standard deviation: 6620.52
 variance: 4.38312e+07
 variation coefficient: 70.5377 %
 sum: 18999862195879

 real    1m10.199s
 user    1m7.902s
 sys     0m2.173s
 > time r.univar out_test_zstd
  100%
 total null and non-null cells: 3526771952
 total null cells: 1502448926

 Of the non-null cells:
 ----------------------
 n: 2024323026
 minimum: 807
 maximum: 32767
 range: 31960
 mean: 9385.79
 mean of absolute values: 9385.79
 standard deviation: 6620.52
 variance: 4.38312e+07
 variation coefficient: 70.5377 %
 sum: 18999862195879

 real    1m25.206s
 user    1m21.351s
 sys     0m2.518s
 }}}

--
Ticket URL: <http://trac.osgeo.org/grass/ticket/2750>
GRASS GIS <https://grass.osgeo.org>



More information about the grass-dev mailing list