[GRASS5] GRASSon OpenMOSIX cluster

Tue Jan 14 13:26:24 EST 2003

On Mon, Jan 13, 2003 at 10:59:08PM -0800, j harrop wrote:
> I should start by qualifying that we don't use Mosix, but I have a fairly 
> good guess about what its doing.  We are running a more conventional 
> Beowulf and the programs we have parallelized were largely with MPI, 
> although we also run entire serial programs on multiple nodes - similar to 
> what Mosix is doing.  The task Mosix takes on is to say the least 
> daunting.   Having sequential codes run with automatic load 
> balancing  across available, heterogeneous nodes is no small 
> undertaking!  When the sequential codes make certain assumptions about 
> having a machine to themselves, the Mosix may have problems.

The cluster running here consists of 20 identical machines, so it
should be fine (should!). There seems to be some issues with NFS
and MFS but that's probably unrelated to GRASS.

> I assume that you run a script with your "launcher" being called once per 
> image, and Mosix takes care of distributing this work across the nodes.

Right. We launch two GRASS jobs per node as there are two CPUs on each
machine.
>  I think what's happening is its doing this with a common binary AND a 
> common data system.  This means that when the first launcher starts it begins 
> executing on node 1 with data shared across all nodes.  Setting the region 
> is not problem and it goes on to process the first image.  Perhaps, once 
> the region is read at the beginning of the processing routine, it is kept 
> in memory and the routine is able to complete regardless of what might 
> happen to the region data stored on disk.

This is unfortunately not true for GRASS.
The sequence

(
 g.region something
 i.smap something
 r.colors comething
)

run as a job in parallel causes problems in a single mapset because 
each command has some G_get_region() at the beginning which may be
already changed by another job.

> The next instance of the 
> launcher does the same on node 2, but since the data is common, the region 
> is corrupted for node 1.  The remaining routines effectively have bad 
> regions.  That lag between setting the region and having it corrupted could 
> explain why the i.smap seems to generate mostly correct results.

Yes.

> I presume you have added the various exports at the beginning of the script 
> to perform the equivalent of the grass5 command.  (I thought about using 
> grass5 with command line settings in a script, but I gather that it starts 
> a new shell so grass5 cannot used in a script.)

You can simply set the variables in an own script.

> I suspect you need the 
> export so that Mosix knows about the variables when it creates a new 
> environment on the remote nodes.  I don't know exactly how Mosix decides 
> how much to run on each node.  If only the i.smap is run on the other nodes 
> it would fail.  PBS and other similar distributed systems have ways of 
> being told what environment variables need to be  created on the 
> remote/slave nodes.  Perhaps Mosix just copies the existing environment 
> from outside the launcher.
> 
> The way I'm looking at running multiple nodes is to share the binaries by 
> NFS, but have local data.  Then when you invoke a launcher, there is a 
> completely independent set of region and other system files.  While one 
> part of the problem has become simpler, others have not.  Your launcher 
> script would need to make some choices about how much to assign to each 
> node, and perhaps try to overlap communication and calculation by not 
> sending all the images before starting the processing.

But you need some more efforts to put the results together.
At time NFS causes some problems for us.

> Alternatively, you 
> might use a master/slave load balancing strategy and only assign an image 
> when a node indicates that it has finished the previous one.  I expect that 
> it would be quite difficult to get ideal load balancing by a priori 
> assignments across 20 nodes.  But the speedup would be significant and the 
> loss in efficiency might be of more academic than practical interest.
> 
> This would not be using Mosix in its best role, but if you can execute 
> commands on specific nodes and have rsync or equivalent I think you should 
> be able to use this approach.  I'll let you know how ours goes.  We are 
> under pressure currently to get ready for a Mining and Exploration 
> Conference in Vancouver, but I may have this running before that.  It would 
> give me another interesting grass example for our booth at the trade show 
> part ;-)

Yes, please let me know later,

Regards,

Markus Neteler