[GRASS-dev] lidar tools update in grass7

Soeren Gebbert soerengebbert at googlemail.com
Sat May 1 07:58:25 EDT 2010

Hi Hamish,

2010/5/1 Hamish <hamish_b at yahoo.com>:
> Soeren wrote:
>> The computation time of overlapping regions should be the
>> same in python as in C.
>> IMHO it is easier to program a python module which computes
>> the tiles, split up the vector points based on the tiles and
>> distribute the data across a grid or cloud for computation.
>> After the distributed processes are finished, the script can
>> gather the data and merge it together.
>> If you implement this into the lidar modules, each module
>> needs to be changed to support MPI. Using MPI is IMHO much more
>> complex, then just running the modules on different datasets
>> on different nodes, gathering and patching the data.
> ...
>> Maybe the overlapping algorithm can be simplified in the
>> python version?
>> In my vision, the LIDAR computation could be run in an
>> elastic computation cloud (EC2) as WPS backend using
>> recursive tiling algorithm for data distribution.
> thinking big!

Thinking practical. :)

Ignoring the WPS overhead, such an approach may work for every lidar
module and other SIMD (single instruction, multiple data) type
modules, where the computational effort is much larger than the
tiling, distribution and gathering effort (rst - interpolation, ...).
A tiling - distribution - gathering approach can be implemented in
python very simple (multiprocessing, pympi ...).

> for my 2c (fwiw I run about 160 computational-hours per day
> split up and dynamically sent out to a couple of clusters using
> MPI), I would first ask how much processing are we talking
> about?  If the typical large-case lidar tools job only took ~3
> hours to run on a modern PC, I'd much prefer the simplicity of
> OpenMP and run it on the same quad-core desktop or 8-core Xeon
> server that the GIS is running on.
> If the jobs could take 12-24 hours and have to happen often I
> would more seriously consider using MPI. Setting up the mpd
> daemon and organizing the mpirun jobs is a lot of overhead,
> which for smaller jobs seems slightly counter-productive to me.

In case the GRASS project has developers which have the time and the
knowledge to parallelize the lidar modules using OpenMP or MPI that's
fine. But i do not see that we have such resources. I do not have the
knowledge and time to implement it. IMHO many modules would benefit
from a more general, generic approach described above. Such an
approach can be run on multi-core machines, a cluster or in a cloud.

I do not have a cluster available, only a 4 core machine. If i need to
run a huge LIDAR job in short time (which i never did before!), i
would need to buy additional computational power. I.e: Amazon EC2 is a
quite reasonable choice for that.

> also, I think it is quite natural to have the control and
> management scripts in python, but get worried about splitting
> up low-level processing into a hybrid python/C mix. for one
> thing it makes the code harder to follow/understand, for another
> you are at risk from different component versions interacting
> in different ways. (see wxNviz, vdigit problems on WinGrass..)

Using i.e: MPI for parallelism means often to implement two versions
of the program, a serial and a parallel version. OpenMP works only on
multi-core or ccNUMA machines (SGI Altix) not in a cluster or cloud.
Many paralleized programs are more complex and mostly rewritten than
the serial version they based on. IMHO to implement a single python
approach with additional python dependencies is not as risky as to
implement and maintain many modules of highly complex parallel C code
with additional serial versions.

Using a tiling - distribution . gathering python approach will have a
much lower speedup as pure OpenMP or MPI versions. But the benefit
will raise with the size of the computational problem.

Best regards

> (and of course this has to be an optional extra, and I'm all for
> exploring all these different methods .. we don't know which
> one will win out/survive in a few years time when 16-core
> workstations are the norm)
> regards,
> Hamish

More information about the grass-dev mailing list