[GRASS-dev] [GRASS GIS] #2764: corrupt data written to FCELL and DCELL rasters, hard to re-produce
GRASS GIS
trac at osgeo.org
Wed Jan 10 14:04:08 PST 2018
#2764: corrupt data written to FCELL and DCELL rasters, hard to re-produce
---------------------+-------------------------
Reporter: dylan | Owner: grass-dev@…
Type: defect | Status: new
Priority: normal | Milestone: 7.2.3
Component: Raster | Version: unspecified
Resolution: | Keywords:
CPU: x86-64 | Platform: Linux
---------------------+-------------------------
Comment (by mmetz):
Replying to [comment:33 dylan]:
> Replying to [comment:32 mmetz]:
>
> > Markus Neteler in particular spent a lot of time to fix various
systems for parallel execution of GRASS commands. GRASS itself was never
the problem, instead the main problem was that the multiple outputs to be
written to a single storage device were too much for that storage device.
>
> OK. Good to know. Are there any other diagnostics for these kind of
problems, other than looking through the output from `dmesg` or kernel
messages? I typically run `dstat` while developing parallel processing
scripts, but I haven't noticed any disk-thrashing in this case.
Markus N might know of other places to look for problematic messages.
>
> **Update**
> Looking at disk and I/O stats at a 1 second granularity via:
> {{{
> dstat -m -c --top-cpu --top-bio --top-io -d -D sdd1 -r --disk-util 1
> }}}
> I see that the SSD (sdd1) is idle for most of the time and then spikes
to 30-80% of its "disk utilization" as reported by `dstat --disk-util`
when several `r.sun` jobs finish (?).
After `r.sun` comes `r.mapcalc` with a simple expresssion, producing
output quite fast. This could be the reason for the spikes.
> As a comparison, making a .tgz on the SSD yields ~ 12% disk utilization.
>
> Each instance of `r.sun` is writing out ~5000 rows of data over a period
of about 8 minutes, so that is 5000 rows * 8 processes * 1/8 proc per
minute * 1/60 minutes per second = ~80 write operations per second
(assuming rows are written as processed). It would appear (from the
`dstat` output) that rows are written in batches?
No, rows are written out one at a time. The hard drive decides when
pending changes are actually written out from the disk cache to the actual
disk. As before, `r.mapcalc` might produce output quite fast.
>
> > > I know this is a lot to ask, but did you try testing using ZLIB
compression and running it multiple times? It took a couple of tiles
before I noticed the error.
> >
> > I did use ZLIB compression when running the test with the data and
scripts provided. Do you mean I should run the test several times with the
same data?
>
> Yes (please). It wasn't until I had ran a couple of tiles that I
encountered errors.
I have run the test again, letting it overwrite the results of the
previous run (no prior g.remove because you stated earlier that this helps
trigger the errors), and it finished successfully again.
My system will be busy with other parallel GRASS processing tasks for the
next couple of days, so no capacity here to test again soon.
--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2764#comment:34>
GRASS GIS <https://grass.osgeo.org>
More information about the grass-dev
mailing list