[GRASS-user] r.neighbors velocity

Sat Jun 29 06:50:34 PDT 2013

Hi,
i have implemented a "real" average neighborhood algorithm that runs in
parallel using openmp. The source code and the benchmark shell script is
attached.

The neighbor program computes the average moving window of arbitrary size.
The size of the map rows x cols and the size of the moving window  (odd
number cols==rows) can be specified.

./neighbor rows cols mw_size

IMHO the new program is better for compiler comparison and neighborhood
operation performance.

This is the benchmark on my 5 year old AMD phenom 4 core computer using 1,
2 and 4 threads:

gcc -Wall -fopenmp -lgomp -Ofast main.c -o neighbor
export OMP_NUM_THREADS=1
time ./neighbor 5000 5000 23
real 0m37.211s
user 0m36.998s
sys 0m0.196s

export OMP_NUM_THREADS=2
time ./neighbor 5000 5000 23
real 0m19.907s
user 0m38.890s
sys 0m0.248s

export OMP_NUM_THREADS=4
time ./neighbor 5000 5000 23
real 0m10.170s
user 0m38.466s
sys 0m0.192s

Happy hacking, compiling and testing. :)

Best regards
Soeren

2013/6/29 Markus Metz <markus.metz.giswork at gmail.com>

> On Sat, Jun 29, 2013 at 1:26 PM, Hamish <hamish_b at yahoo.com> wrote:
> > Markus Metz wrote:
> >
> >> Some more results with Sören's test program on a Intel(R) Core(TM) i5
> >> CPU M450 @ 2.40GHz (2 real cores, 4 fake cores) with gcc 4.7.2 and
> >> clang 3.3
> >>
> >> gcc -O3
> >> v is 2.09131e+13
> >>
> >> real    2m0.393s
> >> user    1m57.610s
> >> sys    0m0.003s
> >>
> >> gcc -Ofast
> >> v is 2.09131e+13
> >>
> >> real    0m7.218s
> >> user    0m7.018s
> >> sys    0m0.017s
> >
> >
> > nice. one thing we need to remember though is that it's not entirely
> > free, one thing -Ofast turns on is -ffast-math,
> > """
> >  This option is not turned on by any -O option besides -Ofast since it
> can
> >  result in incorrect output for programs that depend on an exact
> >  implementation of IEEE or ISO rules/specifications for math functions.
> It
> >  may, however, yield faster code for programs that do not require the
> >  guarantees of these specifications.
> > """
> >
> > which may not be fit for our purposes.
> >
> >
> > With the ifort compiler there is '-fp-model precise' which allows only
> > optimizations which don't harm the results. Maybe gcc has something
> > similar.
>
> In gcc, you can turn of -ffoo with -fno-foo, maybe this way you can
> use -Ofast -fno-fast-math to preserve IEEE specifications.
> >
> > Glad to see -floop-parallelize-all in gcc 4.7, it will help us identify
> > places to focus OpenMP work on.
> >
> >
> > Hamish
> >
>
-------------- n�chster Teil --------------
Ein Dateianhang mit HTML-Daten wurde abgetrennt...
URL: <http://lists.osgeo.org/pipermail/grass-user/attachments/20130629/0d2ad429/attachment.html>
-------------- n�chster Teil --------------
Ein Dateianhang mit Bin�rdaten wurde abgetrennt...
Dateiname   : benchmark.sh
Dateityp    : application/x-sh
Dateigr��e  : 223 bytes
Beschreibung: nicht verf�gbar
URL         : <http://lists.osgeo.org/pipermail/grass-user/attachments/20130629/0d2ad429/attachment.sh>
-------------- n�chster Teil --------------
Ein Dateianhang mit Bin�rdaten wurde abgetrennt...
Dateiname   : main.c
Dateityp    : text/x-csrc
Dateigr��e  : 3664 bytes
Beschreibung: nicht verf�gbar
URL         : <http://lists.osgeo.org/pipermail/grass-user/attachments/20130629/0d2ad429/attachment.c>