[GRASS-user] Parallel processes
Dylan Beaudette
dylan.beaudette at gmail.com
Mon Oct 19 13:11:51 PDT 2015
On Mon, Oct 19, 2015 at 12:05 PM, Glynn Clements
<glynn at gclements.plus.com> wrote:
>
> Dylan Beaudette wrote:
>
>> Are there any reasons to prefer sequential operations (that do not
>> alter the region) vs. parallel operations?
>
> Running additional jobs in parallel is only worthwhile if the
> resources which they would use (CPU, memory, I/O bandwidth) would
> otherwise be idle.
>
> Once you get to the point that a resource is saturated and jobs are
> contending for it, parallel execution will be less efficient than
> serial execution.
>
> Maybe the "parallel" command takes these factors into account
> sufficiently. If it only considers CPU cores (i.e. one job per core),
> you'd need to confirm that you aren't saturating I/O bandwidth or
> thrashing memory or CPU caches. Try running the same sequence of tasks
> with varying numbers of parallel jobs to determine the optimal value.
> Needless to say, this will vary according to the nature of the task
> (e.g. I/O-bound versus CPU-bound).
>
Thank you Glynn, your advice confirms some empirical notes:
1. parallel processes that use data from external USB disks quickly
saturate the capacity of the bus or mechanism of the drive
2. parallel processes that use data from an internal SSD can generally
saturate all 8 cores of my Intel i7
My main motivation for asking this question was to determine instances
where parallel operations in GRASS are _not_ safe. From my reading of
the wiki, manual pages, and your recent comments on GRASS-dev, it
would appear that the following operations may not be safe:
1. region-altering
2. calculations in the presence of a MASK
3. reading "external" (r.external) GDAL sources (?)
4. some mapcalc expressions
In order to simplify my testing, I have disabled pthread support and
invoke "parallelization" via backgrounding or GNU parallel. My
examples with GNU parallel stem from the tremendous (apparent) utility
of this tool, in that most "bash for loops" can be directly converted
into "smart" parallel jobs.
Thanks,
Dylan
More information about the grass-user
mailing list