[GRASS-user] r.sun.daily with multiple CPU cores: error uncompressing raster data ...

Dylan Beaudette dylan.beaudette at gmail.com
Wed Jan 3 11:41:43 PST 2018


Update: after applying the latest patch, I now see

ERROR: Decompression failed with error -1

I found the map that fails decompression. Is there any way to inspect
the map in order to search for more clues as to what is wrong with it
or how it might have happened?


All of the maps in this project are using the default ZLIB
compression, along with compressed NULL files. Looking over the zlib
manual (https://www.zlib.net/manual.html), I see several references to
an error code of "-1":

----------------------------
#define Z_ERRNO        (-1)

Z_ERRNO if there is an error writing the flushed data

Z_ERRNO on a file operation error

ZEXTERN const char * ZEXPORT gzerror OF((gzFile file, int *errnum));

Returns the error message for the last error which occurred on the
given compressed file. errnum is set to zlib error number. If an error
occurred in the file system and not in the compression library, errnum
is set to Z_ERRNO and the application may consult errno to get the
exact error code.
----------------------------

The last note is interesting and suggests that this specific "problem"
may be happening at the file system or OS level. That said, I have
only encountered this error in the context of FCELL or DCELL
maps--which makes me think that it is some combination of GRASS and
the underlying file system.

Some OS information:

3.13.0-106-generic #153-Ubuntu SMP Tue Dec 6 15:44:32 UTC 2016 x86_64
x86_64 x86_64 GNU/Linux
Description:    Ubuntu 14.04.5 LTS
Release:        14.04


Thanks!
Dylan



On Sat, Dec 30, 2017 at 9:51 PM, Dylan Beaudette
<dylan.beaudette at gmail.com> wrote:
> Dang, this appears to be happening with any module run in parallel
> that generates FCELL or DCELL maps. In this case, r.horizon run in
> parallel.
>
> I have added additional commentary to #2764
>
> https://trac.osgeo.org/grass/ticket/2764#comment:9
>
> After application of Markus' latest patch the error message is now
> "Decompression failed with error 0".
>
> Dylan
>
> On Sat, Dec 30, 2017 at 9:34 PM, Dylan Beaudette
> <dylan.beaudette at gmail.com> wrote:
>> On Sat, Dec 30, 2017 at 2:32 PM, Markus Metz
>> <markus.metz.giswork at gmail.com> wrote:
>>>
>>>
>>> On Fri, Dec 29, 2017 at 5:07 PM, Dylan Beaudette <dylan.beaudette at gmail.com>
>>> wrote:
>>>>
>>>> Hi everyone,
>>>>
>>>> First of all, thanks in advance for the r.sun.daily module which is a
>>>> nice replacement for my amateurish attempts over the last 12 years.
>>>>
>>>> I am currently working on an annual beam radiance map for a large
>>>> geographic region, at 30m res: 70,953 x 46,964 cells. This is far too
>>>> large for a single pass of r.horizon or r.sun on my machine so I have
>>>> split the data into 5,000 x 5,000 cell tiles with 100 cells of
>>>> overlap. This seems to be sufficient for my purposes and the edge
>>>> effects are tolerable.
>>>>
>>>> At 8-15 minutes / tile / day (r.sun) and 54 tiles this job calls for
>>>> multiple CPU cores. All of the parallel processing that I use is (as
>>>> far as I know) contained within the same region.
>>>>
>>>>
>>>> I have had good success with running r.horizon in parallel via GNU
>>>> parallel like this:
>>>>
>>>> # 1: start angle
>>>> # 2: angle step
>>>> # 3: elevation tile
>>>> seq 0 $step 330 | parallel --gnu --progress "bash make-hz-maps.sh {}
>>>> $step $elev"
>>>>
>>>> Which is just a wrapper around r.horizon and run in parallel "within"
>>>> tiles.
>>>>
>>>> Next, I run r.sun.daily (8 CPU cores) within tiles:
>>>>
>>>> r.sun.daily --overwrite elevation=$elev aspect=$asp slope=$slope \
>>>> start_day=1 end_day=365 beam_rad=$beam horizon_basename=hzangle
>>>> horizon_step=$step nprocs=8
>>>>
>>>>
>>>> The r.sun.daily modules finishes without error about 50-60% of the
>>>> time, results look good. The other 50-40% of the the time I get this:
>>>>
>>>>
>>>> ERROR: Error uncompressing raster data for row 2605 of
>>>>        <r_sun_crop8255_beam_rad_tmp_352>
>>>> *** buffer overflow detected ***: g.message terminated
>>>> ======= Backtrace: =========
>>>> /lib/x86_64-linux-gnu/libc.so.6(+0x7329f)[0x7f8b9a70129f]
>>>> /lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x5c)[0x7f8b9a79c83c]
>>>> /lib/x86_64-linux-gnu/libc.so.6(+0x10d710)[0x7f8b9a79b710]
>>>> /lib/x86_64-linux-gnu/libc.so.6(+0x10cc19)[0x7f8b9a79ac19]
>>>> /lib/x86_64-linux-gnu/libc.so.6(_IO_default_xsputn+0xbc)[0x7f8b9a70961c]
>>>> /lib/x86_64-linux-gnu/libc.so.6(_IO_vfprintf+0x1cc5)[0x7f8b9a6d9905]
>>>> /lib/x86_64-linux-gnu/libc.so.6(__vsprintf_chk+0x84)[0x7f8b9a79aca4]
>>>>
>>>> /usr/local/grass-7.5.svn/lib/libgrass_gis.7.5.svn.so(+0x1343c)[0x7f8b9aa6a43c]
>>>>
>>>> /usr/local/grass-7.5.svn/lib/libgrass_gis.7.5.svn.so(G_fatal_error+0xbf)[0x7f8b9aa6accf]
>>>> g.message(main+0x254)[0x400dd4]
>>>> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f8b9a6aff45]
>>>> g.message[0x400ea2]
>>>>
>>>>
>>>> I can't seem to replicate the problem, as subsequent runs with the
>>>> same parameters and in the same tile are successful! This leads me to
>>>> think that:
>>>>
>>>> * some aspect of this approach is not thread safe
>>>> * there is something wrong with my computer
>>>> * there is a subtle bug in the raster writing / reading code when
>>>> invoked in parallel
>>>>
>>>>
>>>> I have encountered similar raster reading errors in the past,
>>>> typically in the context of parallel processing:
>>>>
>>>> https://trac.osgeo.org/grass/ticket/2762
>>>>
>>>> https://trac.osgeo.org/grass/ticket/2764
>>>>
>>>>
>>>> http://osgeo-org.1560.x6.nabble.com/Error-reading-raster-data-for-row-xxx-only-when-using-r-series-and-t-rast-series-td5229569i20.html
>>>>
>>>> https://lists.osgeo.org/pipermail/grass-dev/2015-July/075691.html
>>>>
>>>>
>>>> Here is some information on my system and version of GRASS:
>>>>
>>>>  ./configure  --enable-64bit --with-libs=/usr/lib --without-pthread
>>>> --without-odbc --without-mysql --with-readline --with-cxx
>>>> --enable-largefile --with-freetype
>>>> --with-freetype-includes=/usr/include/freetype2 --with-sqlite
>>>> --with-python --with-geos=/usr/local/bin/geos-config --without-opencl
>>>> --with-opencl-includes=/usr/include/CL/ --with-postgres
>>>> --with-postgres-includes=/usr/include/postgresql/
>>>> --with-postgres-libs=/usr/lib/
>>>> --with-proj-share=/usr/local/share/proj/
>>>>
>>>> version=7.5.svn
>>>> date=2017
>>>> revision=r71964
>>>> build_date=2017-12-21
>>>> build_platform=x86_64-pc-linux-gnu
>>>> build_off_t_size=8
>>>>
>>>>
>>>> Any ideas?
>>>
>>> Please try the patch attached to ticket #2764 helps to get closer to the
>>> problem.
>>>
>>> Markus M
>>>
>>
>> Hi Markus,
>>
>> Thank you for the quick reply and patch. I have recompiled with the
>> patch from #2764 and now waiting to see what happens.
>>
>> Happy New Year!
>>
>> Dylan


More information about the grass-user mailing list