[GRASS-dev] vector libs: file based spatial index

Paul Kelly paul-grass at stjohnspoint.co.uk
Thu Jun 25 02:58:50 EDT 2009


On Wed, 24 Jun 2009, Markus GRASS wrote:

> Paul Kelly wrote:
>> On Tue, 23 Jun 2009, Markus GRASS wrote:
>>
>>> My implementation is completely file based, also when creating or
>>> updating a vector. This comes obviously with a speed penalty because
>>> reading in memory is faster than reading from file. With all sorts of
>>> tricks and relying on the system to cache files, the file based spatial
>>
>> Hi Markus, The last point above sounds interesting. What sort of tricks?
> When inserting or deleting or searching, the search patch is remembered
> (kept in memory) and walked back by using a stack in a non-recursive
> functions. This reduces file IO operations about by half and should be a
> faster algorithm than the recursive alternative. As an example,
> inserting a new item into an RTree that already has about 50,000 items
> requires a total of 3 file reads and 3 - 5 file writes, each file IO
> reads or writes 528 bytes on a 32 bit system and 586 bytes on a 64 bit
> system. And when using intermediate files, the whole chunk of 528/568
> bytes is read or written at once.

>> Are you using memory-mapped files?
>
> What are memory-mapped files? Excuse my ignorance, I'm just a
> self-trained coder (learning by doing).

http://en.wikipedia.org/wiki/Memory-mapped_file
A chunk of a disk file is directly mapped into memory so you can access it 
using normal pointers as if it was permanently in memory, and the OS 
transparently handles paging the relevant chunks of the file in and out 
of memory as required.
It can be a useful and elegant way of managing access to large files a 
section at a time. For files small enough to be completely cached in 
memory the performance penalty over keeping the same data in memory 
(without a file) should be relatively small, and will be faster than 
accessing a file using conventional fopen()/fread() etc. functions.
But how much of a performance gain it would give for very large files 
strongly depends on the nature of access to the file, which I know very 
little about in this case. E.g. for completely random access there might 
not be a lot of gain. But if there was random access only within a certain 
section of a file, that section could be mapped into memory and access 
would then be quite fast. Overwriting "dead" areas of the index would be 
particularly simple I think, if it was memory mapped.

Paul


More information about the grass-dev mailing list