[GRASS-dev] lidar tools update in grass7
Soeren Gebbert
soerengebbert at googlemail.com
Wed Apr 28 12:08:29 EDT 2010
Hello,
>> Slightly off-topic, but while we are thinking about this code ...
>>
>> A while back I did some crude profiling of the v.lidar tools and found
>> that it was spending lots and lots of time in the Tcholetsky decomposition
>> loop (3-for loops deep).
>>
>
> That's better now because the matrix is a bit smaller.
Yes, its a nice symmetric band matrix. Was a bit puzzling for me to
understand the creation of the matrix without any documentation!
So, i have implemented some conversion functions into the gmath
library while moving the tchol* code from lidar lib into the gmath
lib: now transformation of qudratic <-> band <-> sparse matrices can
be performed. A new CG band solver, the matrix type conversion and a
new band matrix -vector multiplication function are available with
tests in the svn-trunk of grass 7.
>>
>> It seemed like a good & simple test case for using OpenMP to multi-thread
>> it, but I got stuck with it segfaulting. AFAIR the problem was that OpenMP
>> wanted you to malloc the entire thing before starting, and this could
>> get big.
>>
>> if it interests you, see the attached patch and
>> https://trac.osgeo.org/grass/ticket/657
>> http://lists.osgeo.org/pipermail/grass-dev/2009-June/044709.html
>> http://lists.osgeo.org/pipermail/grass-dev/2009-June/044705.html
>> http://grass.osgeo.org/wiki/OpenMP
>>
>
> Hmm, if I understand the code right, only the innermost for loop can be
> executed in parallel because the first two for-loops need results of
> previous runs (not possible to run i + 1 before i, same for j). But I don't
> know that parallel stuff, I would in any case first optimize the code
> without parallelization, only then add multithreading.
I fully agree. The band cholesky solver is well suited for the job but
not designed for parallelization. Thus i have implemented an iterative
Conjugate Gradient (CG) solver for band matrices (just by replacing
the matrix - vector multiplication) to replace the cholesky band
solver. Well, the cholesky solver out-performes the CG solver by a
factor of 10+. I have partly parallelized the CG solver for band
matrices, but even the speedup on a 8 core system is to small to
compete with the cholesky band solver. Besides of that, the created
band matrices are of bad condition, so the CG solver may fail for
subregions.
Hence, i would suggest to use MPI parallelization or something else on
process base, to concurrently compute the subregions which are created
by default. Maybe a python script using overlapping tiling and
subprocess can do the job?
Best regards
Soeren
More information about the grass-dev
mailing list