[GRASS-user] Multicore Processing and Temporary File Cleanup

Wed Feb 13 13:46:16 EST 2008

Thank you Markus your Wiki entry is most helpful.

It seems I need to make a few changes to my files and set up a large 
number of mapsets in every location.  Is it appropriate then to have 
multiple mapsets (one for each node) at a given location?  If so is 
there a way to automatically generate multiple mapsets in a given 
location such that I can jump straight into GRASS using a script command 
along the following in each of the processes (I will have thousands of 
processes)?

#!/bin/bash

declare -r PROCESS_NUM=__ #Some allocated process number - $SGE_TASK_ID 
for Sun Grid

# Other non-GRASS commands here - in my script there is a call to an 
external database
# to download parameter values

grass62 -text database/location/${PROCESS_NUM}_mapset <<!
    # Some grass commands here
!

In each mapset would then contain the spatial data that each process 
will use.  You suggest then copying the output into a single shared 
mapset such as PERMANENT.  For my purposes I'll probably just save them 
as text files (the data then gets transferred to another program for the 
next stages of processing).

Again many thanks,

Markus Neteler wrote:
> Joseph,
>
> I am using a cluster right now which is based on PBS to elaborate MODIS
> satellite data. Some answers below:
>
> On Feb 13, 2008 2:43 PM, joechip90 <joechip90 at googlemail.com> wrote:
>   
>> Dear All,
>>
>> I have looked around on other postings and it appears that the majority (if
>> not all) of the GRASS libraries are NOT thread safe.
>>     
>
> Yes, unfortunately true.
>
>   
>> Unfortunately I have a
>> very large processing job that would benefit from cluster processing.  I
>> have written a script that can be run on multiple processors whilst being
>> very careful not to allow different processes to try to modify the same data
>> at any point.  The same raster file is not accessed by different processes
>> at all in fact.
>>     
>
> Yes, fine. Essentially there are at least two approaches of "poor man"
> parallelization without modifying GRASS source code:
>
> - split map into spatial chunks (possibly with overlap to gain smooth results)
> - time series: run each map elaboration on a different node.
>
>   
>> However, I also realise that alone might not solve all my problems.  In any
>> one process some temporary files are created (by GRASS libraries) and then
>> these are deleted on statup (cleaning temporary files...).  Now I was
>> wondering what these temporary files were and if there might be a problem
>> with one process creating temporary files that it needs whilst another
>> process starts up GRASS and deletes them.  Is there any way to call GRASS in
>> a way that doesn't delete the temporary files?
>>     
>
> You could just modify the start script and remove that call for "clean_temp".
> BUT:
> I am currently elaborating some thousand maps for the same region (time
> series). I elaborate each map in the same location but a different mapset
> (simply using the map name as mapset name). At the end of the elaboration I
> call a second batch job which only contains g.copy to copy the result into a
> common mapset. There is a low risk of race condition here in case that two
> nodes finish at the same time but this could be even trapped in a loop which
> checks if the target mapset is locked and, if needed, launches g.copy again till
> success.
>
>   
>> I appreciate that I'm trying to do something that GRASS doesn't really
>> support but I was hoping that it might be possible to fiddle around and find
>> a way.  Any help would be gratefully received.
>>     
>
> To some extend GRASS supports what you need.
> I have drafted a related wiki page at:
> http://grass.gdf-hannover.de/wiki/Parallel_GRASS_jobs
>
> Feel free to hack that page!
>
> Good luck,
> Markus
>
>