[GRASS-user] Multicore Processing and Temporary File Cleanup
Joseph Chipperfield
joechip90 at googlemail.com
Wed Feb 13 13:46:16 EST 2008
Thank you Markus your Wiki entry is most helpful.
It seems I need to make a few changes to my files and set up a large
number of mapsets in every location. Is it appropriate then to have
multiple mapsets (one for each node) at a given location? If so is
there a way to automatically generate multiple mapsets in a given
location such that I can jump straight into GRASS using a script command
along the following in each of the processes (I will have thousands of
processes)?
#!/bin/bash
declare -r PROCESS_NUM=__ #Some allocated process number - $SGE_TASK_ID
for Sun Grid
# Other non-GRASS commands here - in my script there is a call to an
external database
# to download parameter values
grass62 -text database/location/${PROCESS_NUM}_mapset <<!
# Some grass commands here
!
In each mapset would then contain the spatial data that each process
will use. You suggest then copying the output into a single shared
mapset such as PERMANENT. For my purposes I'll probably just save them
as text files (the data then gets transferred to another program for the
next stages of processing).
Again many thanks,
Markus Neteler wrote:
> Joseph,
>
> I am using a cluster right now which is based on PBS to elaborate MODIS
> satellite data. Some answers below:
>
> On Feb 13, 2008 2:43 PM, joechip90 <joechip90 at googlemail.com> wrote:
>
>> Dear All,
>>
>> I have looked around on other postings and it appears that the majority (if
>> not all) of the GRASS libraries are NOT thread safe.
>>
>
> Yes, unfortunately true.
>
>
>> Unfortunately I have a
>> very large processing job that would benefit from cluster processing. I
>> have written a script that can be run on multiple processors whilst being
>> very careful not to allow different processes to try to modify the same data
>> at any point. The same raster file is not accessed by different processes
>> at all in fact.
>>
>
> Yes, fine. Essentially there are at least two approaches of "poor man"
> parallelization without modifying GRASS source code:
>
> - split map into spatial chunks (possibly with overlap to gain smooth results)
> - time series: run each map elaboration on a different node.
>
>
>> However, I also realise that alone might not solve all my problems. In any
>> one process some temporary files are created (by GRASS libraries) and then
>> these are deleted on statup (cleaning temporary files...). Now I was
>> wondering what these temporary files were and if there might be a problem
>> with one process creating temporary files that it needs whilst another
>> process starts up GRASS and deletes them. Is there any way to call GRASS in
>> a way that doesn't delete the temporary files?
>>
>
> You could just modify the start script and remove that call for "clean_temp".
> BUT:
> I am currently elaborating some thousand maps for the same region (time
> series). I elaborate each map in the same location but a different mapset
> (simply using the map name as mapset name). At the end of the elaboration I
> call a second batch job which only contains g.copy to copy the result into a
> common mapset. There is a low risk of race condition here in case that two
> nodes finish at the same time but this could be even trapped in a loop which
> checks if the target mapset is locked and, if needed, launches g.copy again till
> success.
>
>
>> I appreciate that I'm trying to do something that GRASS doesn't really
>> support but I was hoping that it might be possible to fiddle around and find
>> a way. Any help would be gratefully received.
>>
>
> To some extend GRASS supports what you need.
> I have drafted a related wiki page at:
> http://grass.gdf-hannover.de/wiki/Parallel_GRASS_jobs
>
> Feel free to hack that page!
>
> Good luck,
> Markus
>
>
More information about the grass-user
mailing list