[GRASS-dev] grass 7 and pixel random access

Glynn Clements glynn at gclements.plus.com
Tue Apr 29 11:55:16 EDT 2008


Hamish wrote:

> > >  I would like to know the planned changes for the raster library,
> > >  especially the random access of pixels in the raster.
> Markus:
> > Not sure if all of those are actually planned, but here is a list:
> > http://grass.osgeo.org/wiki/GRASS_7_ideas_collection#Raster
> > 
> > >  I wanted to work on it some months back, but my daily job got more
> > >  intense.
> > >  In the coming future, we will need to access easily any row for
> > >  parallel processing.
> 
> One thing I wonder about for parallel processing of (serial) raster
> modules- do we really need random read access to send each individual row
> into a separate thread? The overhead with that seems counter-productive.
> Couldn't we read some GRASS_NPROC envrio variable and then split the
> overall number of rows by that number and create a small number of
> threads, ie matching the system.

If you just want to speed up top-to-bottom processing, that doesn't
require random access, just a scrolling window (which several modules
already use, either via rowio or with their own cache).

For random access, the main issue is that you want to avoid performing
the decompression, format conversion and resampling steps more than
once. In practice, this means making a temporary "raw" copy of the
data, and then caching it.

Exactly how you cache it depends upon your expected access pattern. 
For truly random access, you probably want to cache it in rows. Where
there is some degree of locality, tiles will tend to produce better
results.

> another thing I still wonder about (see thread from a month or so back)
> is  where to start? Modify the libs to support the concept, then tackle
> each module on their own? ie concentrate on the non-I/O limited and
> can't-do- much-about-it but throw more processor at the problem modules,
> and leave non-number crunching modules alone? -- concentrate on areas
> where we'll get the most bang for the buck / pick off low hanging fruit /
> etc?

It depends upon whether we want to make the raster I/O operations
thread-safe. If we do, that could involve a significant amount of
work, particularly if we don't want to reduce efficiency.

One efficiency issue is that the library keeps a decompressed copy of
the last row which was read. This means that if you're up-sampling the
data (the current region has finer resolution than the raster),
adjacent rows which correspond to the same source row don't require
reading and de-compressing the data.

[However, the re-sampling and the conversion to the requested type
(CELL/FCELL/DCELL) are repeated for each row. Even though it's almost
inevitable, it isn't actually guaranteed that you'll request the same
format or the same resolution for each row.]

If you are trying to parallelise a top-to-bottom module, and one
thread requests a row that is in the middle of being read by another
thread, should it perform a redundant read, or simply wait for the
original thread to de-compress the row?

Also, the approach of having a single "slot" for the most recent row
won't extend to multiple threads. E.g. if you have 10 threads and
you're up-scaling the data 2:1, you would need 5 slots (each source
row will be consumed by two threads).

Parallelising the output is simpler. However, if you want to support
compressed files, there would need to be a critical section so that
each thread can reliably determine the offset at which its data is
written. Regardless of whether you want compressed files, if you don't
have pwrite(), you would need to make lseek() + write() into a
critical section.

If you have pwrite() and don't need compressed files, there are no
inherent concurrency issues. There might be issues with the existing
code using pre-allocated buffers, but those can be fixed.

BTW, for 7.x, can we assume that alloca() is available? It would make
it much easier to write re-entrant code by avoiding the need to
pre-allocate buffers (the alternative is lots of calls to malloc/free,
which could be a significant performance hit).

-- 
Glynn Clements <glynn at gclements.plus.com>


More information about the grass-dev mailing list