[GRASSLIST:9002] Re: r.statistics vs v.rast.stats vs ??

Thu Nov 10 23:49:38 EST 2005

I was able to reduce these updates from about 3 seconds per pass
through the loop to less than 1 second (undetectable) by writing an
SQL update statement to file and then piping that into sqlite3
directly. For the 650 plots in my data set that saves me 30 minutes
per run. I could maybe save more by accumulating all of the updates
into a transaction and then committing the transaction in the final
part of the script, but that is probably splitting hairs.

Using SQL directly requires knowledge of the back end, so this
modification may not be useful for others. v.db.update is really slow
though for sqlite. Can this be improved?

David

On 11/10/05, David Finlayson <david.p.finlayson at gmail.com> wrote:
> In my case (very large DEM small vector areas) the creation of the
> MASK during each pass through the loop was the rate-limiting step. I
> added the following lines to reduce the size of the region prior to
> creating the MASK. It helped dramatically:
>
> # Reduce size of region prior to calculating mask
> g.remove vect=${VECTOR}_PTEMP
> v.extract input=${VECTOR} output=${VECTOR}_PTEMP type=area new=1 list=$i
> g.region vect=${VECTOR}_PTEMP
>
> I need to do more testing to see if it is getting the same answer as before.
>
> Calculating the statistics multiple times is redundant, but for my
> dataset it was not slow (less than 1 second for all of them). I wonder
> if my areas were larger would it take longer?
>
> Now the only slow part is updating sqlite. Seems to take a second or
> so for each update call.
>
> David
>
> On 11/10/05, Hamish <hamish_nospam at yahoo.com> wrote:
> > > I am trying to calculate statistics such as mean elevation, slope,
> > > maximum drainage area etc for a very large number of small plots. The
> > > plots are in a vector file and I have been trying to use v.rast.stats
> > > on the various rasters that represent elevation, slope, drainage area,
> > > etc.
> > >
> > > The problem is that the rasters are very high resolution, the plots
> > > are quite small, and I have more than 600 to process. If you look at
> > > the code for v.rast.stats, it converts each plot into a raster to use
> > > as a mask and then runs r.univar on the entire map (now mostly null
> > > values), once for each statistic. In my situation it can take several
> > > minutes for each plot. In other words hours for each of the above
> > > rasters that I am interested in and more than 2 days for the whole
> > > set.
> > >
> > > Is there a more efficient way of calculating these statistics? I have
> > > tried to use r.statistics and can't make sense of the inputs and
> > > outputs. Alternatively, if I could control the region size in the
> > > critical part of the v.rast.stats script I might be able to improve
> > > the speed of the script.
> >
> >
> > for i in $CATSLIST ; do
> >  echo "Processing category $i ($CURRNUM/$NUMBER)"
> >  g.remove MASK > /dev/null 2> /dev/null
> >  r.mapcalc "MASK=if(${VECTOR}_${TMPNAME} == $i, 1, null())" 2> /dev/null
> >  #n, min, max, range, mean, stddev, variance, coeff_var, sum
> >  n=`r.univar -g $RASTER | cut -d'=' -f2 | tr '\n' ' ' | cut -d' ' -f1`
> >  min=`r.univar -g $RASTER | cut -d'=' -f2 | tr '\n' ' ' | cut -d' ' -f2`
> >  max=`r.univar -g $RASTER | cut -d'=' -f2 | tr '\n' ' ' | cut -d' ' -f3`
> >  range=`r.univar -g $RASTER | cut -d'=' -f2 | tr '\n' ' ' | cut -d' ' -f4`
> >  mean=`r.univar -g $RASTER | cut -d'=' -f2 | tr '\n' ' ' | cut -d' ' -f5`
> >  stddev=`r.univar -g $RASTER | cut -d'=' -f2 | tr '\n' ' ' | cut -d' ' -f6`
> >  variance=`r.univar -g $RASTER | cut -d'=' -f2 | tr '\n' ' ' | cut -d' ' -f7`
> >  cf_var=`r.univar -g $RASTER | cut -d'=' -f2 | tr '\n' ' ' | cut -d' ' -f8`
> >  sum=`r.univar -g $RASTER | cut -d'=' -f2 | tr '\n' ' ' | cut -d' ' -f9`
> >
> >
> > a couple of things,
> >
> > 1) r.univar is running 9 times when it only needs to be run once:
> > eval `r.univar -g $RASTER`
> >
> > Then $cf_var gets renamed $coeff_var, needs to be carried through.
> >
> >
> > 2) if g.region zoom= respects MASK, we can do:
> > g.region save=${TMPNAME}_rastres
> >
> > for i in $CATSLIST ; do
> >  echo "Processing category $i ($CURRNUM/$NUMBER)"
> >  g.remove MASK > /dev/null 2> /dev/null
> >  r.mapcalc "MASK=if(${VECTOR}_${TMPNAME} == $i, 1, null())" 2> /dev/null
> >
> > +g.region zoom=$RASTER
> >
> >  #n, min, max, range, mean, stddev, variance, coeff_var, sum
> >  eval `r.univar -g $RASTER`
> >
> > +g.region ${TMPNAME}_rastres
> >
> >  ...
> > done
> >
> > g.remove region=${TMPNAME}_rastres
> >
> >
> >
> > but g.region zoom= can take a _long_ time so maybe no advantage/worse?
> >
> > Would need to check/reset NSRES and EWRES to raster's values or
> > is that redundant? (what about g.region -a?) Some checking needs
> > to be done that we're not screwing up the cells.. at minimum
> > a before/after check to see that the same answer comes out both
> > ways.
> >
> >
> > 3) rm twice??
> > cleanup()
> > {
> >  #restore settings:
> >  g.region region=${TMPNAME}
> >  g.remove region=${TMPNAME} > /dev/null
> >
> > twice??
> >
> > 4) isn't  -p > /dev/null  circular?
> > g.region nsres=$NSRES ewres=$EWRES -ap > /dev/null
> >
> >
> > 5) cleanup should happen before   echo "Done."
> >
> >
> > Hamish
> >
>
>
> --
> David Finlayson
> Marine Geology & Geophysics
> School of Oceanography
> Box 357940
> University of Washington
> Seattle, WA  98195-7940
> USA
>
> Office: Marine Sciences Building, Room 112
> Phone: (206) 616-9407
> Web: http://students.washington.edu/dfinlays
>

--
David Finlayson
Marine Geology & Geophysics
School of Oceanography
Box 357940
University of Washington
Seattle, WA  98195-7940
USA

Office: Marine Sciences Building, Room 112
Phone: (206) 616-9407
Web: http://students.washington.edu/dfinlays