[GRASS-dev] vector libs: file based spatial index

Wed Jun 24 03:25:26 EDT 2009

On 23/06/09 20:28, Markus GRASS wrote:
> Paolo Cavallini wrote:
>> Markus GRASS ha scritto:
>>   
>>> What to do now? Leave it all in memory as in grass6, build in memory
>>> then write out (risk of running out of memory on massive datasets), or
>>> keep it always in file? I'll not commit any time soon (also waiting for
>>> the lib/raster commotion to settle down), I need feedback on how to
>>> proceed or if I should forget about it.
>>>     
>> I think advice from Radim would be very useful here.
>> All the best.
>>   
> OK, let me rephrase. I think I have two alternatives to the current
> implementation of the vector spatial index and would like to know if
> grass7 should get 1) faster vector display and lower memory consumption
> at the cost of (sometimes) slower vector processing [1], 2) faster
> vector display, a similar speed in vector processing but keep the risk
> of running out of memory when processing large datasets, or 3) no
> changes to the spatial index. IMHO this should be a general decision of
> the GRASS community, not of one or two developers.

What size of vector data are we talking about concerning the risk of 
running out of memory ? Would it be possible to implement both 1) and 2) 
with 2) being the default and a flag to switch to 1) for very large 
vectors ?

You wrote:
> Considering that a file based spatial index is only useful for massive
> vectors where memory can become a limiting factor, I hesitate to commit
> to trunk.

Well, I don't know what you call massive, but one of the main problems 
with the memory based index is that currently the spatial index is 
rebuilt for each run of v.what as it is stored no where. This makes 
querying large (e.g. 20-30.000 polygons) _very_ slow when using the GUI 
(which will be the only option in grass7 as xmons have disappeared).

I'm not sure I understand everything correctly here, but I have the 
feeling that there are two questions here:

1) Should we have a file-based storage of the spatial index ? This can 
then be read into memory when necessary, which still should be faster 
than rebuilding it each time.

To this I clearly say +1.
The question here is: how to make sure the index is up to date ?

2) Should the entire treatment of the index be file-based, i.e. the 
index is never read into memory, but always accessed via file, with the 
speed penalties you spoke about.

To this I would say, if we can have a file-based, permanent storage of 
the index, but read into memory for treatment, unless a flag says not to 
load it into memory, than this would probably be the ideal, within my 
limited understanding of the issue.

Moritz