[GRASS-user] v.net parallelisation issues
Mark Wynter
mark at dimensionaledge.com
Fri Feb 13 04:40:07 PST 2015
Hi Moritz
With the second approach (the code I shared in my post), I have 3500 discrete jobs, and I set the number of batches equal to the number of CPUs. Each batch job is despatched to a cpu, where it then pulls from a queue of job id’s that are processed in serial within each batch job. The thinking behind this approach was to allocate jobs across available CPUs as separate batch processes.
The other and preferred approach is to launch 1 batch job, and then GNU parallel draws down from the list of 3500 jobs, assigning jobs to worker functions as CPUs become available. This code pattern I’ve had much success with parallelising PostGIS queries etc.
As you have suspected, I get no benefit from additional CPUs.
Unfortunately I don’t have time on my side, and parallelisation is critical. A fallback is to spin up a cluster of 16 x 2 CPU machines and pre-allocate job-ids to machines, and then write the results back to the master node - but this is not ideal and pathway I am reticent about going down.
Do you know anyone who may have attempted to parallelise v.net?
I guess the most important question right now is - is it possible to do poor man’s parallelisation with v.net? Anyone?
Mark
On 13 Feb 2015, at 7:56 pm, Moritz Lennert <mlennert at club.worldonline.be> wrote:
> On 13/02/15 08:39, Mark Wynter wrote:
>> I’ve encountered a bottleneck somewhere with v.net <http://v.net> when
>> scaling out with GNU Parallel… not sure if its an underlying issue with
>> v.net <http://v.net> or the way I’m calling the batch jobs?
>>
>> I’ve got 32 CPUs and commensurate RAM. What I’m observing is v.net
>> <http://v.net> CPU utilisation dropping off in accordance with number of
>> jobs running.
>>
>
> And this means that you don't get any gain in duration ? Could it be that as you divide into more batches each batch is smaller and thus needs less CPU ?
>
> Moritz
>
More information about the grass-user
mailing list