[GRASS-user] r.neighbors velocity

Sat Jun 29 03:19:28 PDT 2013

Some more results with Sören's test program on a Intel(R) Core(TM) i5
CPU M450 @ 2.40GHz (2 real cores, 4 fake cores) with gcc 4.7.2 and
clang 3.3

gcc -O3
v is 2.09131e+13

real	2m0.393s
user	1m57.610s
sys	0m0.003s

gcc -Ofast
v is 2.09131e+13

real	0m7.218s
user	0m7.018s
sys	0m0.017s

gcc -Ofast -floop-parallelize-all is as fast as gcc -Ofast

clang -Ofast
v is 2.09131e+13

real	0m18.701s
user	0m18.285s
sys	0m0.000s

Markus M


On Sat, Jun 29, 2013 at 8:35 AM, Hamish <hamish_b at yahoo.com> wrote:
> Hi,
>
> here are the same results for Soeren's test program, with the Open64
> compiler from AMD:
>
>  - Same AMD X6 CPU as below.
>  - Open64 compiler 4.5.2.1 from AMD  (GPLv2, LGPL)
>
> I just downloaded the pre-built RHEL5 binary tarball and they worked
> on Debian/squeeze, I just made an alias to the executable in the un-
> tarred bin/ dir to get it to work.
>  see also http://wiki.open64.net/index.php/Installation_on_Ubuntu
> Source is available of course, but according to the Debian ITP ticket
> it's a bit of a pain to build there.
>
>
> straight opencc:
>
> real  0m59.015s | 0m58.972s | 0m58.963s
> user  0m58.760s | 0m58.812s | 0m58.624s
> sys   0m0.248s  | 0m0.136s  | 0m0.300s
> --
>
> opencc -O3:
>
> real    0m35.203s | 0m35.173s | 0m35.204s
> user    0m35.206s | 0m35.174s | 0m35.206s
> sys     0m0.000s  | 0m0.000s  | 0m0.000s
> --
>
> opencc -Ofast (with or without -march=auto for native bytecode)
>
> real  0m13.389s | 0m13.402s | 0m13.435s
> user  0m13.389s | 0m13.405s | 0m13.437s
> sys   0m0.000s  | 0m0.000s  | 0m0.000s
> --
>
> opencc -Ofast -march=auto -apo on a 6-(real)-core CPU
> v is 2.09131e+13
>
> real  0m2.552s  | 0m2.595s  | 0m2.591s
> user  0m14.857s | 0m14.725s | 0m14.725s
> sys   0m0.008s  | 0m0.024s  | 0m0.016s
>
>
> '-apo' is autoparallelization, poorly documented, but it works!
> it adds OpenMP pragmas where it thinks it can && where it will
> cause a gain; I'm glad to see it's not just for the fotran
> compiler anymore.
>
>
> So the Open64 compiler is not quite as fast as Intel's one for this
> test case, but it's pretty close versus the more versatile gcc in the
> far distance. Executable file size for all of the above was less than
> 12kb, since it can link to local OS shared libs.
>
> I haven't tried it with llvm/clang.
>
> Now I wonder which flags to use to recreate -Ofast in gcc to make it
> a fairer comparison..
>
>
> Hamish
>
>
>> I also ran it on an AMD Phenom II X6 1090T  (icc -xHost --> -xSSSE3 ?)
>> All times "real"; all output was "v is 2.09131e+13".
>>
>> gcc 4.4.5 with standard-opts: 7kb binary
>>  == near parity single-threaded performance with the new i7 chip from
>>     the 2 year old AMD Phenom and older copy of gcc! (stock debian/squeeze)
>>   1m16.175s | 1m15.634s | 1m16.029s
>>
>> icc 12.1 with standard-opts:
>>   0m32.975s | 0m33.079s | 0m33.249s
>>
>> icc with "-fast" opt: (700kb binary)
>>   0m9.577s | 0m9.572s | 0m9.583s
>>
>> icc with -parallel auto-MP: (31kb binary)
>>  == again near parity with the new i7 chip! even with the Intel-biased
>>     compiler.  "user" cpu-time was actually less. the advantage of 6 real
>>     cores vs 4 real+4virtual ones.*
>>   0m6.406s  | 0m6.404s  | 0m6.404s
>>   0m37.106s | 0m37.170s | 0m37.106s
>>   0m0.044s  | 0m0.040s  | 0m0.028s
>>
>> icc with -fast and -parallel: (2mb binary)
>>   0m2.002s  | 0m2.002s  | 0m2.002s
>>   0m10.765s | 0m10.769s | 0m10.769s
>>   0m0.016s  | 0m0.012s  | 0m0.008s
> _______________________________________________
> grass-user mailing list
> grass-user at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/grass-user