[GRASS-dev] [GRASS GIS] #2764: corrupt data written to FCELL and DCELL rasters, hard to re-produce

GRASS GIS trac at osgeo.org
Wed Jan 10 14:04:08 PST 2018


#2764: corrupt data written to FCELL and DCELL rasters, hard to re-produce
---------------------+-------------------------
  Reporter:  dylan   |      Owner:  grass-dev@…
      Type:  defect  |     Status:  new
  Priority:  normal  |  Milestone:  7.2.3
 Component:  Raster  |    Version:  unspecified
Resolution:          |   Keywords:
       CPU:  x86-64  |   Platform:  Linux
---------------------+-------------------------

Comment (by mmetz):

 Replying to [comment:33 dylan]:
 > Replying to [comment:32 mmetz]:
 >
 > > Markus Neteler in particular spent a lot of time to fix various
 systems for parallel execution of GRASS commands. GRASS itself was never
 the problem, instead the main problem was that the multiple outputs to be
 written to a single storage device were too much for that storage device.
 >
 > OK. Good to know. Are there any other diagnostics for these kind of
 problems, other than looking through the output from `dmesg` or kernel
 messages? I typically run `dstat` while developing parallel processing
 scripts, but I haven't noticed any disk-thrashing in this case.

 Markus N might know of other places to look for problematic messages.

 >
 > **Update**
 > Looking at disk and I/O stats at a 1 second granularity via:
 > {{{
 > dstat -m -c --top-cpu --top-bio --top-io -d -D sdd1 -r --disk-util  1
 > }}}
 > I see that the SSD (sdd1) is idle for most of the time and then spikes
 to 30-80% of its "disk utilization" as reported by `dstat --disk-util`
 when several `r.sun` jobs finish (?).

 After `r.sun` comes `r.mapcalc` with a simple expresssion, producing
 output quite fast. This could be the reason for the spikes.

 > As a comparison, making a .tgz on the SSD yields ~ 12% disk utilization.
 >
 > Each instance of `r.sun` is writing out ~5000 rows of data over a period
 of about 8 minutes, so that is 5000 rows * 8 processes * 1/8 proc per
 minute * 1/60 minutes per second = ~80 write operations per second
 (assuming rows are written as processed). It would appear (from the
 `dstat` output) that rows are written in batches?

 No, rows are written out one at a time. The hard drive decides when
 pending changes are actually written out from the disk cache to the actual
 disk. As before, `r.mapcalc` might produce output quite fast.

 >
 > > > I know this is a lot to ask, but did you try testing using ZLIB
 compression and running it multiple times? It took a couple of tiles
 before I noticed the error.
 > >
 > > I did use ZLIB compression when running the test with the data and
 scripts provided. Do you mean I should run the test several times with the
 same data?
 >
 > Yes (please). It wasn't until I had ran a couple of tiles that I
 encountered errors.

 I have run the test again, letting it overwrite the results of the
 previous run (no prior g.remove because you stated earlier that this helps
 trigger the errors), and it finished successfully again.

 My system will be busy with other parallel GRASS processing tasks for the
 next couple of days, so no capacity here to test again soon.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2764#comment:34>
GRASS GIS <https://grass.osgeo.org>



More information about the grass-dev mailing list