[GRASS-user] r.neighbors velocity
Sören Gebbert
soerengebbert at googlemail.com
Sat Jun 29 06:50:34 PDT 2013
Hi,
i have implemented a "real" average neighborhood algorithm that runs in
parallel using openmp. The source code and the benchmark shell script is
attached.
The neighbor program computes the average moving window of arbitrary size.
The size of the map rows x cols and the size of the moving window (odd
number cols==rows) can be specified.
./neighbor rows cols mw_size
IMHO the new program is better for compiler comparison and neighborhood
operation performance.
This is the benchmark on my 5 year old AMD phenom 4 core computer using 1,
2 and 4 threads:
gcc -Wall -fopenmp -lgomp -Ofast main.c -o neighbor
export OMP_NUM_THREADS=1
time ./neighbor 5000 5000 23
real 0m37.211s
user 0m36.998s
sys 0m0.196s
export OMP_NUM_THREADS=2
time ./neighbor 5000 5000 23
real 0m19.907s
user 0m38.890s
sys 0m0.248s
export OMP_NUM_THREADS=4
time ./neighbor 5000 5000 23
real 0m10.170s
user 0m38.466s
sys 0m0.192s
Happy hacking, compiling and testing. :)
Best regards
Soeren
2013/6/29 Markus Metz <markus.metz.giswork at gmail.com>
> On Sat, Jun 29, 2013 at 1:26 PM, Hamish <hamish_b at yahoo.com> wrote:
> > Markus Metz wrote:
> >
> >> Some more results with Sören's test program on a Intel(R) Core(TM) i5
> >> CPU M450 @ 2.40GHz (2 real cores, 4 fake cores) with gcc 4.7.2 and
> >> clang 3.3
> >>
> >> gcc -O3
> >> v is 2.09131e+13
> >>
> >> real 2m0.393s
> >> user 1m57.610s
> >> sys 0m0.003s
> >>
> >> gcc -Ofast
> >> v is 2.09131e+13
> >>
> >> real 0m7.218s
> >> user 0m7.018s
> >> sys 0m0.017s
> >
> >
> > nice. one thing we need to remember though is that it's not entirely
> > free, one thing -Ofast turns on is -ffast-math,
> > """
> > This option is not turned on by any -O option besides -Ofast since it
> can
> > result in incorrect output for programs that depend on an exact
> > implementation of IEEE or ISO rules/specifications for math functions.
> It
> > may, however, yield faster code for programs that do not require the
> > guarantees of these specifications.
> > """
> >
> > which may not be fit for our purposes.
> >
> >
> > With the ifort compiler there is '-fp-model precise' which allows only
> > optimizations which don't harm the results. Maybe gcc has something
> > similar.
>
> In gcc, you can turn of -ffoo with -fno-foo, maybe this way you can
> use -Ofast -fno-fast-math to preserve IEEE specifications.
> >
> > Glad to see -floop-parallelize-all in gcc 4.7, it will help us identify
> > places to focus OpenMP work on.
> >
> >
> > Hamish
> >
>
-------------- n�chster Teil --------------
Ein Dateianhang mit HTML-Daten wurde abgetrennt...
URL: <http://lists.osgeo.org/pipermail/grass-user/attachments/20130629/0d2ad429/attachment.html>
-------------- n�chster Teil --------------
Ein Dateianhang mit Bin�rdaten wurde abgetrennt...
Dateiname : benchmark.sh
Dateityp : application/x-sh
Dateigr��e : 223 bytes
Beschreibung: nicht verf�gbar
URL : <http://lists.osgeo.org/pipermail/grass-user/attachments/20130629/0d2ad429/attachment.sh>
-------------- n�chster Teil --------------
Ein Dateianhang mit Bin�rdaten wurde abgetrennt...
Dateiname : main.c
Dateityp : text/x-csrc
Dateigr��e : 3664 bytes
Beschreibung: nicht verf�gbar
URL : <http://lists.osgeo.org/pipermail/grass-user/attachments/20130629/0d2ad429/attachment.c>
More information about the grass-user
mailing list