[GRASS-dev] [GRASS GIS] #2764: corrupt data written to FCELL and DCELL rasters, hard to re-produce

GRASS GIS trac at osgeo.org
Mon Jan 8 13:15:16 PST 2018


#2764: corrupt data written to FCELL and DCELL rasters, hard to re-produce
---------------------+-------------------------
  Reporter:  dylan   |      Owner:  grass-dev@…
      Type:  defect  |     Status:  new
  Priority:  normal  |  Milestone:  7.2.3
 Component:  Raster  |    Version:  unspecified
Resolution:          |   Keywords:
       CPU:  x86-64  |   Platform:  Linux
---------------------+-------------------------

Comment (by dylan):

 Replying to [comment:28 mmetz]:
 > This is a very simple r.mapcalc expression. You should be able to
 trigger the error by simply creating a number of maps in parallel with
 r.mapcalc, after that (serially) testing the outputs with r.univar. With
 r.mapcalc in daily-rad.sh, you are running several independent instances
 of r.mapcalc in parallel. With daily-rad.sh called by beam-rad-at-tile.sh
 you are not testing if GRASS is thread-safe, instead you are testing if
 your OS, filesystem and hard drive can handle multiple simultaneous IO
 requests. Please check your system messages and the health of your hard
 drives (e.g. with smartctl) first, before you proceed.

 Yeah, that is what I thought and what the original test scripts basically
 perform. As stated in [https://trac.osgeo.org/grass/ticket/2764#comment:2
 my update] circa 2 years ago, these tests run fine on both the RAID1 and
 SSD on this machine.

 I don't see any troubling messages reported by `dmesg` or `smrtctl`. Note
 that I don't have any issues with any other GRASS commands, or (as far as
 I can tell) general usage on this machine. I only see these errors when
 working with GRASS commands that:

   * take a long time to run: `r.sun` or `t.rast.mapcalc` ([http://osgeo-
 org.1560.x6.nabble.com/Error-reading-raster-data-for-row-xxx-only-when-
 using-r-series-and-t-rast-series-td5229569.html e.g. a couple of years
 ago])
   * operate on moderately large, floating-point maps
   * are done in parallel, either via GNU `parallel` or as implemented in
 the temporal suite of modules

 ...hence the extreme difficulty in recreating the errors or further
 debugging.

 For the record, I have been getting corrupt rows using LZ4 compression in
 1/8 attempts vs. 1/2 attempts when using ZLIB compression.

 Next error I'll re-compile without openmp (not using it, but just in case)
 and -g -Wall CFLAGS.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2764#comment:29>
GRASS GIS <https://grass.osgeo.org>



More information about the grass-dev mailing list