[GRASS-dev] vector libs: file based spatial index

Markus GRASS markus.metz.giswork at googlemail.com
Thu Jun 25 03:21:01 EDT 2009


Hamish wrote:
> Moritz wrote:
>   
>> The largest file I have used is about 125000 areas with a
>> topo file weighing 42M, so taking your worst estimation,
>> this would mean around 200MB of spatial index, which is
>> still largely acceptable for me.
>>     
>
> lidar and swath bathymetry data will easily have millions of points,
> and as time goes on this will only expand. I seem to recall that one of
> Radim's big disappointments was that the need to handle this technology/
> data density only really became apparent just when GRASS's new vector
> engine was nearing completion. With some earlier notice it could have
> been designed to scale better. Still, there is much tuning which can
> be done with the present model to reduce the memory overheads, etc.
>   
Yes. As an example, for a 2D point dataset, the topo file should be
about 4 times as large as the coor file, same for the spatial index.
This is because each x,y coordinate pair is stored 3 times in the topo
file, plus some other information that is for points not needed, e.g.
area/isle to the left and to the right, start node and end node (start
node = end node for points/centroids). Each x,y coordinate pair is
stored 2 times in the spatial index (rectangle of size zero with N S E W
and N = S, E = W). I see some potential for cleaning up.
> FWIW the sites type (now vector points) in GRASS 4/5 scales well, just
> as much as you can fit in the text file. (not sure if fseeks are 64bit-
> proof there, probably not)
>   
I guess that was without topo?
> the biggest lidar file used that I know about is Doug's 379GB dataset
> (14.5 billion points). 
Frightening.
> you might look at libLAS (for lidar data -- an OSGeo semi-affiliated
> project:   http://liblas.org/   It is my understanding that Howard is
> currently adding spatial index support in the development version.
> You might check out his approach.
>   
Will do.
> I have been, and still am ignorant of what advantage a spatial index
> gives you for point data. ... interested to learn why "topology" would
> be useful for points-only data.
>   
Strictly speaking, topology and spatial index are two different things,
you could have a spatial index without topo. I can also not see the
usefulness of topology for point data. A spatial index may be useful to
extract a subset (v.select), but in this case you could just as well go
through the points in the coor file, read one at a time and select the
ones that fall into the study area. Should be slower than with a spatial
index but then you're not dragging along the spatial index.
>
> In general I'm fairly happy with the no-topology solution for lidar
> data in grass, but a few targeted modules (eg v.info) really need to
> be modified to deal with them.
>
>
>
> Hamish
>
>
> ps- we still need to hunt through the archives for Radim's posts on these
> issues which explain quite a bit.
>   
I remember one comment where he said that the spatial index is not
written out because of time and space concerns. Space should not be an
issue today, and opening an old vector is faster if the spatial index is
available in a file. Of course I would like a solution that needs less
memory and is faster when modifying a spatial index, but I have not the
faintest idea how to do that. Maybe Paul Kelly's tip on memory mapping
can help.

Markus M


More information about the grass-dev mailing list