<div dir="ltr">Hey Folks,<div>many thanks for pointing to the important influence of different compiler and compiler options. But please be aware that my tiny little program is not representative for a neighbor analysis implementation, it was simply a demonstration of <span style="font-family:arial,sans-serif;font-size:13px"> </span><span style="font-family:arial,sans-serif;font-size:13px">12 billion ops</span>:</div>

<div><br></div><div>1. I use fixed loop sizes, it is really easy for a compiler to optimize that</div><div>2. It is pretty simple to parallelize since only a simple reduction is done in the inner loop</div><div><br></div>

<div>3. Most important: The statement of Ivan was a window size of 501 ... as MarkusN IMHO correctly interpreted this leads to a moving window of 501x501 pixel if this is an option for r.neighbors. It is not the total number of cells of a rectangular moving window, since it must must be an even number in this case. Other shapes than rectangular are more complex to implement. </div>

<div><br></div><div>To be diplomatic i decided to use 501 pixel, which might represented a 23x21 pixel moving window, to show that this "small" number of operations needs a considerable amount of time on modern CPU's.</div>

<div><br></div><div>If you use a 501x501 pixel moving window the computational effort is roughly 501 times 12 billion ops. IMHO in this case a GPU or neighbor algorithm specific FPGA/ASIC may be able to perform this operation in 2/3 seconds.</div>

<br><div>Best regards</div><div>Soeren</div></div>