[STATSGRASS] propagating temporary files
rsadler at cyllene.uwa.edu.au
rsadler at cyllene.uwa.edu.au
Wed Jun 7 03:23:47 EDT 2006
Dear statsgrassians,
I seem to be getting propagating temporary files that belong to GRASS
in both the GRASS database mapset .tmp directory and R's /tmp
directory. For example,
# 1930 categories
clump of test51 in test
0.00 0.00 0.00 0.00
is reported in the GRASS's .tmp directory
whereas whole rasters as ASCII files are reported in R's /tmp.
Neither are deleted once command line code is executed, as should be
the case from my understanding.
Since I am running lots of simulations this quickly junks up available
disk space, and I believe is contributing to a slow down in performance.
I can disappear these files at run-time, but the strange thing is that
this didn't appear to be a problem a couple of a weeks ago.
Any suggestions before I reinstall both softwares and packages?
Regards
Rohan
PhD Student
School of Plant Biology
School of Mathematics and Statistics
Bushfire Cooperative Research Centre
The University of Western Australia
BTW: Joel and Roger, cheers for the below. Moving the GRASS location
to the local hard drives made all the difference (duh!). A working
speed on the cluster of 20000 simulations an hour was very useful,
until this latest morsel arrived.
######################################################################
> Without knowing much about the R interface I would guess that it may
> be slow due to all machines trying to access data off of the same
> disk. Have you got any way to measure disk reads and the network
> bandwidth to the machine hosting the disk?
I agree - I think the original process was also disk-bound, that is 500
per hour looks very much like 30 times 20 per hour? This would imply that
10 machines would do 50-60 an hour each. So spreading the compute load
doesn't help, if this is the case.
Roger
>
> -Joel
>
> On Friday 17 March 2006 2:39 pm, rsadler at cyllene.uwa.edu.au wrote:
> > Dear List,
> >
> > I have implemented Monte Carlo inference for a random closed set model
> > that mimics the different "phases" of vegetation patterning to be
> > found in images of a semi-arid grassland in north west Australia. The
> > procedure is implemented using the multiple sessions capability of
> > grass60 on a computing cluster with shared disk space. The problem is
> > that when running a single machine alone I can generate 500
> > simulations an hour. However when I run all 30 machines concurrently
> > simulation rate drops dramatically to 20 sessions an hour for a single
> > machine (all machines are the same).
> >
> > I am first contacting the statsgrass list because the procedure uses
> > the grass/R interface for a number of separate tasks. What I don't
> > know is whether the slow down is a result of the grass/R interface or
> > whether the slow done occurs on the grass side of things where there
> > is some shared file that all sessions use (like .grass.bashrc but not
> > that). The program is run as an R batch file using vanilla and slave,
> > with all output is being written to separate text files. All sessions
> > use different locations and therefore different mapsets.
> >
> > Please advise
> >
> > Regards
> > Rohan Sadler
> >
More information about the grass-stats
mailing list