[GRASS-dev] [GRASS GIS] #2764: corrupt data written to FCELL and DCELL rasters, hard to re-produce

GRASS GIS trac at osgeo.org
Mon Jan 8 08:00:17 PST 2018


#2764: corrupt data written to FCELL and DCELL rasters, hard to re-produce
---------------------+-------------------------
  Reporter:  dylan   |      Owner:  grass-dev@…
      Type:  defect  |     Status:  new
  Priority:  normal  |  Milestone:  7.2.3
 Component:  Raster  |    Version:  unspecified
Resolution:          |   Keywords:
       CPU:  x86-64  |   Platform:  Linux
---------------------+-------------------------

Comment (by mmetz):

 Replying to [comment:25 dylan]:
 > I'll post an example set of tile data shortly.

 OK.
 >
 > Most of the errors are encountered in the final call to `r.series`
 within:
 >
 > {{{
 > bash beam-rad-at-tile.sh $tile_i
 > }}}

 That means r.sun has created corrupt output. BTW, considering that you are
 running several instances of r.sun in parallel, I wonder if you compiled
 GRASS with openmp and use the nprocs option of r.sun. In this case you
 would have several instances of r.sun and each instance of r.sun would be
 multi-threaded: no speed gain, more sources of potential errors.

 >
 > Well crud, just got this after a 10 hour run, returned by `r.series`:
 >
 > {{{
 > WARNING: LZ4 decompression error
 > ERROR: Error uncompressing fp raster data for row 3929 of <beam.106>:
 error
 >        code -1
 > }}}
 >
 > That is the first error using LZ4 compression after many successful
 tiles. I wonder if the faster compression results in a lower probability
 of row corruption? Within my current project, I seem to be encountering
 corrupt rows about 0.0001% of the time: 2 rows out of (5000 rows * 365
 calls to `r.sun`).

 Chances are very small that ZLIB and LZ4 have the same bug. It rather
 seems to be a write error when writing several files at (nearly) the same
 time.
 >
 > This makes me wonder about my hardware and OS...
 >
 > However, I recall having this kind of error (365 maps generated in
 parallel, and 1 or 2 rows in the entire set with corruption "discovered"
 by `r.series`) nearly every time I have used `r.sun` over the last 10
 years. It just so happens that this time I am performing the same
 operation 54 times vs. the typical single run. In every situation I was
 using some version of Debian or Xubuntu on fairly common (multi-processor
 or multi-core) hardware.
 >
 > Thinking back, all of the errors have been encountered with
 [https://en.wikipedia.org/wiki/Hyper-threading hyper-threading] enabled:
 both on a dual Xeon and currently i7 950.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2764#comment:26>
GRASS GIS <https://grass.osgeo.org>



More information about the grass-dev mailing list