[GRASS-dev] vector libs: file based spatial index

Moritz Lennert mlennert at club.worldonline.be
Wed Jun 24 18:02:07 EDT 2009


On 24/06/09 22:49, Markus GRASS wrote:
> Moritz:
>> I'm not sure I understand everything correctly here, but I have the
>> feeling that there are two questions here:
>>
>> 1) Should we have a file-based storage of the spatial index ? This can
>> then be read into memory when necessary, which still should be faster
>> than rebuilding it each time.
> If an old vector is opened just for reading (v.what, v.info, probably
> also d.vect), the fastest solution is probably to only load the header
> of the spatial index, as is done for the coor file, and perform spatial
> queries in file. This is very fast AFAIKT.

Then the main issue is during editing ? I guess it then depends on the 
use cases, but I don't know if frequent editing is something happening 
very often on large files... IMHO, a slight speed penality for the 
infrequent updates of vectors is acceptable compared to the huge 
advantage of not having to rebuild the index every time.

>> To this I clearly say +1.
>> The question here is: how to make sure the index is up to date ?
> Keeping it up to date is not a problem per se with any method. The real
> issue here is whether to keep it on file or loading it to memory when
> modifying it, speed vs. memory consumption. And this is where I would
> like to get feedback, what is your experience, do larger vectors use too
> much memory or is vector processing relatively slow and should not get
> any slower?

I haven't used vector files, yet, that have caused memory problems, but 
I have had serious speed problems...So, I would plead for whatever makes 
things faster.

> Hmm yes, I think these massive datasets are still the exception, now and
> then someone tries to work with huge vectors but this is not the
> everyday case (maybe because it takes so long...).

They will probably become more normal (cf Lidar), so we should plan with 
them in mind, without making everyone else "suffer" because a few people 
need to use them.

 > I would rather want to hard-code
> the way of modifying a spatial index. There are different possibilities:
> 1) let the vector libs figure out what is best (very difficult)

-1

> 2) have
> an env variable (could work), 

As this retains flexibility for the future, I would favor this, but have 
no idea of what this entails in terms of increase of complexity of the code.

> 3) have a new flag in all vector modules
> (users shouldn't be bothered on module level with vector model details)

Yeah, I agree that this is probably not the best idea.

> 4) decide on a new standard method and hard-code that method as was done
> for the coor file which is never loaded to memory.

The largest file I have used is about 125000 areas with a topo file 
weighing 42M, so taking your worst estimation, this would mean around 
200MB of spatial index, which is still largely acceptable for me.

I find it a bit difficult to give you a definitive answer on the base of 
theory alone. Do you have any means of testing the impact of one choice 
over the other for different use cases (editing, v.build, v.what - the 
latter especially when using in the GUI) ?

If the above is difficult, I would say go for your current preference 
which seems to be file-based. Would it be possible (just thinking out 
loud, without any idea what this entails) to work on two levels, with 
high-level functions which then can call either file-based or 
memory-based low-level functions ? This way you can create the 
high-level API with a file-based system behind that, but allow future 
creation of another memory-based "backend" if the need arises. E.g. 
something like the high-level db functions which you can call whatever 
the actual db driver.

Moritz


More information about the grass-dev mailing list