[STATSGRASS] propagating temporary files

rsadler at cyllene.uwa.edu.au rsadler at cyllene.uwa.edu.au
Wed Jun 7 03:23:47 EDT 2006


Dear statsgrassians,

I seem to be getting propagating temporary files that belong to GRASS  
in both the GRASS database mapset .tmp directory and R's /tmp  
directory. For example,

# 1930 categories
clump of test51 in test

0.00 0.00 0.00 0.00

is reported in the GRASS's .tmp directory

whereas whole rasters as ASCII files are reported in R's /tmp.

Neither are deleted once command line code is executed, as should be  
the case from my understanding.

Since I am running lots of simulations this quickly junks up available  
disk space, and I believe is contributing to a slow down in performance.

I can disappear these files at run-time, but the strange thing is that  
this didn't appear to be a problem a couple of a weeks ago.

Any suggestions before I reinstall both softwares and packages?

Regards
Rohan

  PhD Student
  School of Plant Biology
  School of Mathematics and Statistics
  Bushfire Cooperative Research Centre
  The University of Western Australia

BTW: Joel and Roger, cheers for the below. Moving the GRASS location  
to the local hard drives made all the difference (duh!). A working  
speed on the cluster of 20000 simulations an hour was very useful,  
until this latest morsel arrived.

######################################################################
> Without knowing much about the R interface I would guess that it may  
> be slow due to all machines trying to access data off of the same  
> disk. Have you got any way to measure disk reads and the network  
> bandwidth to the machine hosting the disk?

I agree - I think the original process was also disk-bound, that is 500
per hour looks very much like 30 times 20 per hour? This would imply that
10 machines would do 50-60 an hour each. So spreading the compute load
doesn't help, if this is the case.

Roger

>
> -Joel
>
> On Friday 17 March 2006 2:39 pm, rsadler at cyllene.uwa.edu.au wrote:
> > Dear List,
> >
> > I have implemented Monte Carlo inference for a random closed set model
> > that mimics the different "phases" of vegetation patterning to be
> > found in images of a semi-arid grassland in north west Australia. The
> > procedure is implemented using the multiple sessions capability of
> > grass60 on a computing cluster with shared disk space. The problem is
> > that when running a single machine alone I can generate 500
> > simulations an hour. However when I run all 30 machines concurrently
> > simulation rate drops dramatically to 20 sessions an hour for a single
> > machine (all machines are the same).
> >
> > I am first contacting the statsgrass list because the procedure uses
> > the grass/R interface for a number of separate tasks. What I don't
> > know is whether the slow down is a result of the grass/R interface or
> > whether the slow done occurs on the grass side of things where there
> > is some shared file that all sessions use (like .grass.bashrc but not
> > that). The program is run as an R batch file using vanilla and slave,
> > with all output is being written to separate text files. All sessions
> > use different locations and therefore different mapsets.
> >
> > Please advise
> >
> > Regards
> > Rohan Sadler
> >






More information about the grass-stats mailing list