[GRASS-dev] [GRASS GIS] #2764: corrupt data written to FCELL and DCELL rasters, hard to re-produce
GRASS GIS
trac at osgeo.org
Mon Jan 8 08:00:17 PST 2018
#2764: corrupt data written to FCELL and DCELL rasters, hard to re-produce
---------------------+-------------------------
Reporter: dylan | Owner: grass-dev@…
Type: defect | Status: new
Priority: normal | Milestone: 7.2.3
Component: Raster | Version: unspecified
Resolution: | Keywords:
CPU: x86-64 | Platform: Linux
---------------------+-------------------------
Comment (by mmetz):
Replying to [comment:25 dylan]:
> I'll post an example set of tile data shortly.
OK.
>
> Most of the errors are encountered in the final call to `r.series`
within:
>
> {{{
> bash beam-rad-at-tile.sh $tile_i
> }}}
That means r.sun has created corrupt output. BTW, considering that you are
running several instances of r.sun in parallel, I wonder if you compiled
GRASS with openmp and use the nprocs option of r.sun. In this case you
would have several instances of r.sun and each instance of r.sun would be
multi-threaded: no speed gain, more sources of potential errors.
>
> Well crud, just got this after a 10 hour run, returned by `r.series`:
>
> {{{
> WARNING: LZ4 decompression error
> ERROR: Error uncompressing fp raster data for row 3929 of <beam.106>:
error
> code -1
> }}}
>
> That is the first error using LZ4 compression after many successful
tiles. I wonder if the faster compression results in a lower probability
of row corruption? Within my current project, I seem to be encountering
corrupt rows about 0.0001% of the time: 2 rows out of (5000 rows * 365
calls to `r.sun`).
Chances are very small that ZLIB and LZ4 have the same bug. It rather
seems to be a write error when writing several files at (nearly) the same
time.
>
> This makes me wonder about my hardware and OS...
>
> However, I recall having this kind of error (365 maps generated in
parallel, and 1 or 2 rows in the entire set with corruption "discovered"
by `r.series`) nearly every time I have used `r.sun` over the last 10
years. It just so happens that this time I am performing the same
operation 54 times vs. the typical single run. In every situation I was
using some version of Debian or Xubuntu on fairly common (multi-processor
or multi-core) hardware.
>
> Thinking back, all of the errors have been encountered with
[https://en.wikipedia.org/wiki/Hyper-threading hyper-threading] enabled:
both on a dual Xeon and currently i7 950.
--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2764#comment:26>
GRASS GIS <https://grass.osgeo.org>
More information about the grass-dev
mailing list