[GRASS-dev] [GRASS GIS] #2750: LZ4 when writing raster rows; better than double I/O bound r.mapcalc speed

GRASS GIS trac at osgeo.org
Fri Oct 9 20:55:34 PDT 2015


#2750: LZ4 when writing raster rows; better than double I/O bound r.mapcalc speed
--------------------------+---------------------------
  Reporter:  sprice       |      Owner:  grass-dev@…
      Type:  enhancement  |     Status:  new
  Priority:  normal       |  Milestone:  7.1.0
 Component:  Raster       |    Version:  svn-trunk
Resolution:               |   Keywords:  ZLIB LZ4 ZSTD
       CPU:  OSX/Intel    |   Platform:  MacOSX
--------------------------+---------------------------

Comment (by wenzeslaus):

 The tests are running well for me also with `GRASS_INT_LZ4HC=1`,
 `GRASS_INT_ZSTD=1` and `GRASS_INT_ZLIB=0` (RLE). I haven't tried
 `GRASS_COMPRESS_NULLS=1`.

 Here is a report (polished) created by the attached benchmark script based
 on what you posted which uses completely random data. I have used
 `GRASS_COMPRESS_NULLS=1` and region with 30,000,000 cells. The disk was
 SSD, OS was Linux.

 {{{
 #!rst
 ZLIB compression writing
 ========================

 Performance counter stats for 'r.mapcalc
 expression=test_rast_orig=double(test_rast_z_base)' (10 runs):

 ::


       10415.903368 task-clock (msec)         #    0.993 CPUs utilized
 ( +-  0.36% )
               2798 context-switches          #    0.269 K/sec
 ( +-  2.11% )
                 23 cpu-migrations            #    0.002 K/sec
 ( +-  4.20% )
             325702 page-faults               #    0.031 M/sec
 ( +-  0.00% )
        30804357778 cycles                    #    2.957 GHz
 ( +-  0.42% )
         9055572140 stalled-cycles-frontend   #   29.40% frontend cycles
 idle     ( +-  1.81% )
        47982328290 instructions              #    1.56  insns per cycle
                                              #    0.19  stalled cycles per
 insn  ( +-  0.02% )
         7087070642 branches                  #  680.409 M/sec
 ( +-  0.02% )
          341325584 branch-misses             #    4.82% of all branches
 ( +-  0.05% )

       10.489354952 seconds time elapsed
 ( +-  0.41% )

 RLE compression writing
 =======================

 Performance counter stats for 'r.mapcalc
 expression=test_rast_rle=double(test_rast_z_base)' (10 runs):

 ::


       10367.674362 task-clock (msec)         #    0.999 CPUs utilized
 ( +-  0.53% )
               1642 context-switches          #    0.158 K/sec
 ( +- 18.72% )
                 22 cpu-migrations            #    0.002 K/sec
 ( +-  5.32% )
             325702 page-faults               #    0.031 M/sec
 ( +-  0.00% )
        30666690391 cycles                    #    2.958 GHz
 ( +-  0.38% )
         8921313281 stalled-cycles-frontend   #   29.09% frontend cycles
 idle     ( +-  1.80% )
        47975696799 instructions              #    1.56  insns per cycle
                                              #    0.19  stalled cycles per
 insn  ( +-  0.02% )
         7085878436 branches                  #  683.459 M/sec
 ( +-  0.02% )
          340649966 branch-misses             #    4.81% of all branches
 ( +-  0.04% )

       10.382500561 seconds time elapsed
 ( +-  0.53% )

 LZ4 compression writing
 =======================

 Performance counter stats for 'r.mapcalc
 expression=test_rast_lz4=double(test_rast_z_base)' (10 runs):

 ::


        2490.815692 task-clock (msec)         #    0.999 CPUs utilized
 ( +-  0.23% )
                321 context-switches          #    0.129 K/sec
 ( +- 13.63% )
                 20 cpu-migrations            #    0.008 K/sec
 ( +-  5.02% )
                684 page-faults               #    0.274 K/sec
 ( +-  0.12% )
         7259170408 cycles                    #    2.914 GHz
 ( +-  0.12% )
         2305705372 stalled-cycles-frontend   #   31.76% frontend cycles
 idle     ( +-  0.20% )
        13796117271 instructions              #    1.90  insns per cycle
                                              #    0.17  stalled cycles per
 insn  ( +-  0.06% )
         2790495244 branches                  # 1120.314 M/sec
 ( +-  0.05% )
           33371582 branch-misses             #    1.20% of all branches
 ( +-  0.41% )

        2.492994675 seconds time elapsed
 ( +-  0.23% )

 LZ4HC compression writing
 =========================

 Performance counter stats for 'r.mapcalc
 expression=test_rast_lz4hc=double(test_rast_z_base)' (10 runs):

 ::


        6867.635439 task-clock (msec)         #    0.999 CPUs utilized
 ( +-  0.25% )
                648 context-switches          #    0.094 K/sec
 ( +-  0.29% )
                 21 cpu-migrations            #    0.003 K/sec
 ( +-  5.28% )
                745 page-faults               #    0.108 K/sec
 ( +-  0.18% )
        20199681252 cycles                    #    2.941 GHz
 ( +-  0.28% )
         6449729534 stalled-cycles-frontend   #   31.93% frontend cycles
 idle     ( +-  0.62% )
        31860120047 instructions              #    1.58  insns per cycle
                                              #    0.20  stalled cycles per
 insn  ( +-  0.03% )
         5196919230 branches                  #  756.726 M/sec
 ( +-  0.03% )
          184132785 branch-misses             #    3.54% of all branches
 ( +-  0.04% )

        6.873512386 seconds time elapsed
 ( +-  0.25% )

 ZSTD compression writing
 ========================

 Performance counter stats for 'r.mapcalc
 expression=test_rast_zstd=double(test_rast_z_base)' (10 runs):

 ::


        3540.287381 task-clock (msec)         #    0.999 CPUs utilized
 ( +-  0.20% )
                382 context-switches          #    0.108 K/sec
 ( +-  3.67% )
                 24 cpu-migrations            #    0.007 K/sec
 ( +-  5.61% )
                776 page-faults               #    0.219 K/sec
 ( +-  0.13% )
        10367186950 cycles                    #    2.928 GHz
 ( +-  0.05% )
         3160263203 stalled-cycles-frontend   #   30.48% frontend cycles
 idle     ( +-  0.10% )
        19098247069 instructions              #    1.84  insns per cycle
                                              #    0.17  stalled cycles per
 insn  ( +-  0.04% )
         3831842251 branches                  # 1082.353 M/sec
 ( +-  0.04% )
           35124859 branch-misses             #    0.92% of all branches
 ( +-  0.16% )

        3.543262199 seconds time elapsed
 ( +-  0.20% )

 Original raster map test
 ========================

 Performance counter stats for 'r.univar test_rast_z_base' (10 runs):

 ::


        2024.195978 task-clock (msec)         #    0.998 CPUs utilized
 ( +-  0.29% )
                646 context-switches          #    0.319 K/sec
 ( +-  0.64% )
                  0 cpu-migrations            #    0.000 K/sec
                457 page-faults               #    0.226 K/sec
 ( +-  0.04% )
         5934175598 cycles                    #    2.932 GHz
 ( +-  0.05% )
         1712911175 stalled-cycles-frontend   #   28.87% frontend cycles
 idle     ( +-  0.14% )
        11404604123 instructions              #    1.92  insns per cycle
                                              #    0.15  stalled cycles per
 insn  ( +-  0.00% )
         2280049632 branches                  # 1126.398 M/sec
 ( +-  0.00% )
           32906874 branch-misses             #    1.44% of all branches
 ( +-  0.37% )

        2.029035083 seconds time elapsed
 ( +-  0.28% )

 ZLIB compression reading
 ========================

 Performance counter stats for 'r.univar test_rast_orig' (10 runs):

 ::


        2000.246389 task-clock (msec)         #    0.998 CPUs utilized
 ( +-  0.42% )
                640 context-switches          #    0.320 K/sec
 ( +-  1.09% )
                  0 cpu-migrations            #    0.000 K/sec
                458 page-faults               #    0.229 K/sec
 ( +-  0.02% )
         5930779846 cycles                    #    2.965 GHz
 ( +-  0.08% )
         1716273412 stalled-cycles-frontend   #   28.94% frontend cycles
 idle     ( +-  0.18% )
        11406021691 instructions              #    1.92  insns per cycle
                                              #    0.15  stalled cycles per
 insn  ( +-  0.01% )
         2280208665 branches                  # 1139.964 M/sec
 ( +-  0.01% )
           32553520 branch-misses             #    1.43% of all branches
 ( +-  0.24% )

        2.005018871 seconds time elapsed
 ( +-  0.42% )

 RLE compression reading
 =======================

 Performance counter stats for 'r.univar test_rast_rle' (10 runs):

 ::


        2016.279711 task-clock (msec)         #    0.998 CPUs utilized
 ( +-  0.34% )
                653 context-switches          #    0.324 K/sec
 ( +-  1.34% )
                  0 cpu-migrations            #    0.000 K/sec
 ( +- 50.92% )
                458 page-faults               #    0.227 K/sec
 ( +-  0.04% )
         5931202618 cycles                    #    2.942 GHz
 ( +-  0.07% )
         1711592367 stalled-cycles-frontend   #   28.86% frontend cycles
 idle     ( +-  0.13% )
        11406103365 instructions              #    1.92  insns per cycle
                                              #    0.15  stalled cycles per
 insn  ( +-  0.01% )
         2280223560 branches                  # 1130.906 M/sec
 ( +-  0.01% )
           32763877 branch-misses             #    1.44% of all branches
 ( +-  0.41% )

        2.021075900 seconds time elapsed
 ( +-  0.34% )

 LZ4 compression reading
 =======================

 Performance counter stats for 'r.univar test_rast_lz4' (10 runs):

 ::


         690.267191 task-clock (msec)         #    0.998 CPUs utilized
 ( +-  0.37% )
                235 context-switches          #    0.341 K/sec
 ( +-  1.55% )
                  0 cpu-migrations            #    0.000 K/sec
                449 page-faults               #    0.650 K/sec
 ( +-  0.04% )
         2003905382 cycles                    #    2.903 GHz
 ( +-  0.11% )
          586090598 stalled-cycles-frontend   #   29.25% frontend cycles
 idle     ( +-  0.29% )
         3982189156 instructions              #    1.99  insns per cycle
                                              #    0.15  stalled cycles per
 insn  ( +-  0.04% )
          971667430 branches                  # 1407.669 M/sec
 ( +-  0.02% )
              64052 branch-misses             #    0.01% of all branches
 ( +-  2.47% )

        0.691904075 seconds time elapsed
 ( +-  0.37% )

 LZ4HC compression reading
 =========================

 Performance counter stats for 'r.univar test_rast_lz4hc' (10 runs):

 ::

         692.453563 task-clock (msec)         #    0.998 CPUs utilized
 ( +-  0.18% )
                243 context-switches          #    0.351 K/sec
 ( +-  0.96% )
                  0 cpu-migrations            #    0.000 K/sec
                449 page-faults               #    0.649 K/sec
 ( +-  0.03% )
         1999520778 cycles                    #    2.888 GHz
 ( +-  0.09% )
          581415687 stalled-cycles-frontend   #   29.08% frontend cycles
 idle     ( +-  0.23% )
         3982233099 instructions              #    1.99  insns per cycle
                                              #    0.15  stalled cycles per
 insn  ( +-  0.04% )
          971675561 branches                  # 1403.236 M/sec
 ( +-  0.02% )
              63306 branch-misses             #    0.01% of all branches
 ( +-  1.98% )

        0.694124867 seconds time elapsed
 ( +-  0.18% )

 ZSTD compression reading
 ========================


 Performance counter stats for 'r.univar test_rast_zstd' (10 runs):

 ::

        1168.682507 task-clock (msec)         #    0.998 CPUs utilized
 ( +-  0.41% )
                377 context-switches          #    0.323 K/sec
 ( +-  1.35% )
                  0 cpu-migrations            #    0.000 K/sec
                460 page-faults               #    0.394 K/sec
 ( +-  0.06% )
         3397563517 cycles                    #    2.907 GHz
 ( +-  0.06% )
          780090112 stalled-cycles-frontend   #   22.96% frontend cycles
 idle     ( +-  0.28% )
         8084726269 instructions              #    2.38  insns per cycle
                                              #    0.10  stalled cycles per
 insn  ( +-  0.02% )
         1325103816 branches                  # 1133.844 M/sec
 ( +-  0.02% )
             426732 branch-misses             #    0.03% of all branches
 ( +-  1.51% )

        1.171226998 seconds time elapsed
 ( +-  0.41% )

 Check of types and compression
 ==============================

 ::

     <test_rast_z_base> is compressed (level 2: DEFLATE). Data type:
 <DCELL>
     <test_rast_orig> is compressed (level 2: DEFLATE). Data type: <DCELL>
     <test_rast_rle> is compressed (level 1: RLE). Data type: <DCELL>
     <test_rast_lz4> is compressed (level 3: LZ4). Data type: <DCELL>
     <test_rast_lz4hc> is compressed (level 4: LZ4HC). Data type: <DCELL>
     <test_rast_zstd> is compressed (level 5: ZSTD). Data type: <DCELL>

 File sizes
 ==========

 ::

  240045009 Oct  9 23:26 fcell/test_rast_lz4
  240045009 Oct  9 23:27 fcell/test_rast_lz4hc
  229517654 Oct  9 23:24 fcell/test_rast_orig
  229517654 Oct  9 23:26 fcell/test_rast_rle
  229517654 Oct  9 23:22 fcell/test_rast_z_base
  227636390 Oct  9 23:28 fcell/test_rast_zstd
  105009 Oct  9 23:26 cell_misc/test_rast_lz4/null2
  105009 Oct  9 23:27 cell_misc/test_rast_lz4hc/null2
  125009 Oct  9 23:24 cell_misc/test_rast_orig/null2
  125009 Oct  9 23:26 cell_misc/test_rast_rle/null2
  125009 Oct  9 23:22 cell_misc/test_rast_z_base/null2
  175009 Oct  9 23:28 cell_misc/test_rast_zstd/null2
 }}}

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2750#comment:10>
GRASS GIS <https://grass.osgeo.org>



More information about the grass-dev mailing list