[GRASS-dev] vector libs: file based spatial index

Mon Jul 13 08:44:52 EDT 2009

Moritz Lennert wrote:
> On 25/06/09 08:51, Markus GRASS wrote:
>> I would suggest that I first implement a new version were the spatial
>> index is always written out when a new or modifed vector is closed.
>> Intermediate data are still stored in memory. Opening an old vector in
>> read-only mode would then be faster, opening an old vector in update
>> mode would be the same like currently done, the spatial index is loaded
>> to memory. This can then be tested and polished, and once that is
>> stable, an env var could be added to keep the spatial index in file when
>> modifying (Vect_open_new or Vect_open_update). This would only be needed
>> for massive vectors.
>
> +1
Now in trunk r38390, time to make distclean again...

To work with an existing vector in grass7, topology needs to be rebuilt
because a support file is missing, the spatial index. After that
everything is fine and grass6 can read the vector again as it is.

The vector spatial index is now built in memory and written out to file,
like topology and the category index. When opening an old vector, only
the header of the spatial index file is loaded, searches are done in
file. When opening an old vector for update, the spatial index is loaded
from file to memory, modifed there and then written out, like topology
and the category index.

The new spatial index algorithm (R*-tree) is a bit faster than the old
algorithm (RTree), breaking lines profits from it and thus v.in.ogr and
v.clean.

v.build is now a bit faster, sometimes same speed, sometimes twice as
fast, generally better performance for more complicated geometry.
v.what is now generally faster by a factor of 6 to 30, depending on the
vector.

The authors of the R*-tree claim that the R*-tree's search performance
is better particularly for massive point datasets. Using
elev_lid792_bepts in nc_spm_08 (not really massive), v.what takes now
here about 0.16s instead of 4.3s, ~25x faster, combined improvement of
file-based index and better index algorithm.

Still, for massive point datasets I would recommend not to build
topology, because all three support structures, topology, spatial index
and category index, can become massive. Keeping one in file and loading
the other two to memory doesn't help much.

I hope I didn't mess up too much...

Markus M