[GRASS-dev] what is the meaning of: "Error reading raster data for row 239 of <MASK>"

Moritz Lennert mlennert at club.worldonline.be
Tue Jul 14 02:00:56 PDT 2015


On 14/07/15 09:46, Glynn Clements wrote:
>
> Moritz Lennert wrote:
>
>>>> I don't know how to debug this...
>>>
>>> Can you identify a repeatable test case?
>>>
>>> If I could make it happen, I could debug it.
>>
>> You can get a location names TEST here:
>>
>> http://tomahawk.ulb.ac.be/moritz/mask_bug_testlocation.tgz
>>
>> This contains only a PERMANENT mapset.
>>
>> In that mapset, launch the following command:
>>
>> r.mask vect=hull; for map in $(g.list rast pat="firm_rate*"); do echo
>> $map ; r.mapcalc "temp_prob = float($map) / sum_rates" --o --q; done;
>> r.mask -r
>>
>> I get the error arbitrarily for different firm_rate_* maps, sometimes
>> only for one, sometimes for many, but at each run its for different
>> maps.
>
> So it's non-deterministic (I'm getting one error for every 10-20
> passes over the data, i.e. every 1200-2500 commands), and only applies
> to r.mapcalc.
>
> My first guess was a race condition related to pthreads. I tried
>
> 	export WORKERS=0
>
> before running the test, and it hasn't happened since.
>
> And actually I'm now fairly certain as to the specific cause.
>
> When compiled with pthread support, r.mapcalc has a mutex for each map
> to prevent concurrent access to a single map from multiple threads.
>
> Concurrent access to different maps (and to core lib/gis and and
> lib/raster functionality) from different threads is supposed to be
> safe (see r34485 and the interval surrounding it), but the MASK was
> overlooked.
>
> If a MASK is in use, reading a row from any raster map will read the
> corresponding row from the MASK, and there's nothing to prevent
> different threads from concurrently accessing two different maps and
> thus accessing the MASK.
>
> So, in read_data_{compressed,uncompressed,read_data_fp_compressed} in
> lib/raster/get_row.c we have code like:
>
>      if (lseek(fcb->data_fd, (off_t) row * bufsize, SEEK_SET) == -1)
> 	G_fatal_error(_("Error reading raster data for row %d of <%s>"),
> 		      row, fcb->name);
>
>      if (read(fcb->data_fd, data_buf, bufsize) != bufsize)
> 	G_fatal_error(_("Error reading raster data for row %d of <%s>"),
> 		      row, fcb->name);
>
> If multiple threads execute this code concurrently, you can end up
> with the calls being interleaved like so:
>
> 	Thread 1	Thread 2
>
> 	lseek
> 			lseek
> 			read
> 	read
>
> meaning that the file offset has changed betwee the lseek() and the
> read() (this is why X/Open and POSIX added pread(), but that's still
> relatively new).
>
> This only results in an error at the end of the file (the first read()
> will leave the file offset at EOF, so the second read() fails), but in
> other situations it's likely causing the wrong row of the MASK to be
> read.
>
> A possible quick fix:
>
> 	if (R__.auto_mask > 0)
> 	    putenv("WORKERS=0");
>
> A slightly better fix would be to check for masking and if it's
> enabled, have a single mutex which guards *all* raster reads so that
> even concurrent access to different maps is blocked. Unlike the above
> hack, this still allows computations to be executed in parallel.
>
> Better still would be to guard access to the MASK so that the other
> aspects of raster input can be parallelised (raster I/O is still a
> major bottleneck, and mostly because of processing rather than actual
> disc access).
>
> But that would involve either adding pthread code directly into the
> base raster input code in lib/raster/get_row.c (undesirable) or at
> least adding a mechanism to allow r.mapcalc to hook into it to provide
> the mutex.
>

Thanks for the detailed analysis and explanation !

So, for me, the best solution at this stage is to just set WORKERS to 0 ?

The rest of your proposed solutions is above my head, so I couldn't help 
with implementation.

Moritz



More information about the grass-dev mailing list