[GRASS-dev] vector large file support

Markus Metz markus.metz.giswork at googlemail.com
Sun Feb 15 05:14:46 EST 2009


Moritz Lennert wrote:
> On 11/02/09 12:54, Markus Metz wrote:
>
>
>> Maybe I should rather try to fix bugs than to add new features... 
>
> here are my two top candidates:
>
> - "Keep topology and spatial index in file instead of in memory" in 
> the vector ToDo. The fact that it is in memory makes simple vector 
> querying almost unusable when dealing with larger maps.
Starting with the spatial index. There is a bug in the rtree libs, 
causing segfault when cleaning larger vectors (ca. 1GB coor file size). 
I think I got a fix for the rtree, but with two side effects. First, 
v.in.ogr and therefore most vector cleaning procedures are a bit faster, 
second, the spatial index is much smaller. After importing my 
test-shapefile once with the original grass7 and once with my patched 
grass7, I get exactly identical topo dumps with v.build 
option=build,dump, but the spatial index dump with v.build 
option=build,sdump is very different between the original grass7 and my 
patched grass7. The original spatial index dump is in this case 7.4 MB, 
the patched spatial index dump is only 1.8 MB. Very strange. Display is 
ok, also when displaying topo. v.in.ogr uses the rtree routines in 
several different places, for a custom spatial index used within 
v.in.ogr, another custom spatial index is used by Vect_break_polygons() 
which is in turn used by v.in.ogr, and the spatial index of the vector 
is used by Vect_break_lines(), also in turn used by v.in.ogr, and other 
routines also use the spatial index (maybe most of the vector routines). 
Apparently all that works with my small spatial index.
I suspect the bugs in rtree to be in index.c at line 119
b.child = (struct Node *)tid;
b.child is a pointer to the next node in the RTree. tid is the line 
number for which a new bounding box will be inserted in the RTree. This 
is a cast from integer to pointer, giving a compile time warning.
and at lines 284 - 286
            RTreeInsertRect(&(tmp_nptr->branch[i].rect),
                    (int)tmp_nptr->branch[i].child,      <-- cast from 
pointer to integer of different size
                    nn, tmp_nptr->level);

I understand the concept of RTrees better than this code, but I suspect 
that at line 119, line number is used as memory address for a pointer 
and at line 285, the memory address of a pointer is used as line number. 
Can you developer cracks confirm that?
line 285 was the bug causing the segfault when cleaning large vectors. I 
gave the RTree Branch structure a new variable, its id, and used the id 
where appropriate and pointer where appropriate, and got the above 
described results (also no more compile warnings).

IMHO stuff like this needs to be sorted out before the vector libs get LFS.

What should I do now? Submit a patch to trac or commit to trunk or leave 
it open for discussion?

Regards,

Markus



More information about the grass-dev mailing list