[GRASS-dev] vector large file support
Markus Metz
markus.metz.giswork at googlemail.com
Sun Feb 15 05:14:46 EST 2009
Moritz Lennert wrote:
> On 11/02/09 12:54, Markus Metz wrote:
>
>
>> Maybe I should rather try to fix bugs than to add new features...
>
> here are my two top candidates:
>
> - "Keep topology and spatial index in file instead of in memory" in
> the vector ToDo. The fact that it is in memory makes simple vector
> querying almost unusable when dealing with larger maps.
Starting with the spatial index. There is a bug in the rtree libs,
causing segfault when cleaning larger vectors (ca. 1GB coor file size).
I think I got a fix for the rtree, but with two side effects. First,
v.in.ogr and therefore most vector cleaning procedures are a bit faster,
second, the spatial index is much smaller. After importing my
test-shapefile once with the original grass7 and once with my patched
grass7, I get exactly identical topo dumps with v.build
option=build,dump, but the spatial index dump with v.build
option=build,sdump is very different between the original grass7 and my
patched grass7. The original spatial index dump is in this case 7.4 MB,
the patched spatial index dump is only 1.8 MB. Very strange. Display is
ok, also when displaying topo. v.in.ogr uses the rtree routines in
several different places, for a custom spatial index used within
v.in.ogr, another custom spatial index is used by Vect_break_polygons()
which is in turn used by v.in.ogr, and the spatial index of the vector
is used by Vect_break_lines(), also in turn used by v.in.ogr, and other
routines also use the spatial index (maybe most of the vector routines).
Apparently all that works with my small spatial index.
I suspect the bugs in rtree to be in index.c at line 119
b.child = (struct Node *)tid;
b.child is a pointer to the next node in the RTree. tid is the line
number for which a new bounding box will be inserted in the RTree. This
is a cast from integer to pointer, giving a compile time warning.
and at lines 284 - 286
RTreeInsertRect(&(tmp_nptr->branch[i].rect),
(int)tmp_nptr->branch[i].child, <-- cast from
pointer to integer of different size
nn, tmp_nptr->level);
I understand the concept of RTrees better than this code, but I suspect
that at line 119, line number is used as memory address for a pointer
and at line 285, the memory address of a pointer is used as line number.
Can you developer cracks confirm that?
line 285 was the bug causing the segfault when cleaning large vectors. I
gave the RTree Branch structure a new variable, its id, and used the id
where appropriate and pointer where appropriate, and got the above
described results (also no more compile warnings).
IMHO stuff like this needs to be sorted out before the vector libs get LFS.
What should I do now? Submit a patch to trac or commit to trunk or leave
it open for discussion?
Regards,
Markus
More information about the grass-dev
mailing list