[GRASS-dev] Re: [GRASS-user] RE: [GRASSLIST:1174] Working with very large data sets

Fri Aug 25 11:53:43 EDT 2006

Hello Hamish

On Thu, 24 Aug 2006, Hamish wrote:

[...]
> * Vector operations when dealing with several million features can be
> quite memory intensive. Several work-arounds have been put in place to
> deal with this though. The only datasets I know of where this is an
> issue are vector/point data from LIDAR or swath sonar. By skipping the
> creation of a database and leaving topology unbuilt, massive datasets
> can be pulled in. I don't think we've found the cap to that yet. Maybe
> Helena & Andrew have found it? Core modules for dealing with such
> data have been modified to deal with the topology-free case (i.e.
> v.surf.rst). LFS is supported by vectors AFAIK (well, as much as
> anywhere else).

Just now I have also updated v.surf.idw to read vector points files 
without topology. Especially because it indexes the points itself, there 
really was no need at all for it to be using any functions (in this case 
Vect_get_num_lines() and Vect_read_line()) that require topology; I guess 
it just wasn't yet an issue at the time the module was converted to read 
the new vector format. I basically just got rid of the call to 
Vect_get_num_lines() and used Vect_read_next_line() instead of 
Vect_read_line() to work through all the vector points sequentially in a 
while loop instead of a for loop.

>> Has anyone gotten GRASS working with an MP setup for things like
>> mapcalc?
>
> GRASS is a group of modular little programs, so it is often possible to
> spawn off processes, but it is not threaded. i.e. a single module will
> not use both processors at once. There have been some efforts to add
> threading support to some GRASS modules, e.g. parallelized s.surf.idw:
>  http://grass.itc.it/download/addons.php

I know about that but it still uses the old inefficient non-indexing 
algorithm. For each cell in the output map, it searches through every input
point to find the 12 closest (to determine the weighted average from).
Obviously that algorithm could benefit from being parallelised but IMHO a 
better solution was to index the points and avoid the need for all that
redundant searching. That's what the current v.surf.idw does (unless you 
use the -n switch).

It would be nice though to somehow use the vector topology as an index 
instead of creating a custom index structure within the module. But then 
again with the large memory requirements of topology that probably isn't 
feasible either.

Paul