[GRASS-dev] vector large file support

Markus Metz markus.metz.giswork at googlemail.com
Tue Feb 17 04:34:26 EST 2009


Glynn Clements wrote:
> Markus Metz wrote:
>
>   
>> That thing branch[i].child is a pointer to the next node in the tree, if 
>> I understand the code correctly.
>>     
>
> It's either a pointer to the child or, at the lowest level (where
> there are no children), an integer ("tid").
>   
Hmmm. What when RTreeInsertRect() is called by RTreeDeleteRect() during 
reinsertion? For internal nodes, tmp_nptr->b.child was a pointer to a 
node, but now it is cast to an integer, and this integer gets stuffed 
back into tmp_nptr->b.child?
>   
>> This is a standard tree design, not 
>> only of RTrees, but of many search trees. So your condition #2 would be 
>> violated, because a pointer is passed as an integer to RTreeInsertRect().
>>     
>
> Not if the "pointer" is actually an integer which was stuffed into the
> field, which is exactly what index.c:119 does:
>
> 	b.child = (struct Node *)tid;
>   
That's not the problem and works. The problem is if b.child is really a 
pointer to a node. b.child was originally stuffed with an integer only 
for level 0, not for higher levels.
In case of root split
b.child = *root; /* old root node */
and
b.child = newnode; /* new node just below root */
No integer is stuffed in there.
During insertion, a node somewhere between root and level 0 may need to 
be splitted. Then it says
b.child = n2; /* n2: pointer to node */
No integer is stuffed in there.
/* */ comments by me.

The bulk of the nodes in reInsertList are probably internal nodes, not 
end nodes, i.e. tmp_nptr->branch[i].child is a pointer to a node, an 
integer was not stuffed in there.
>   
>> I believe the compile warning was there for a reason.
>>     
>
> The compiler complains because, on a 64-bit architecture, casting a
> pointer to an "int" results in truncation. The compiler has no way of
> knowing whether the "pointer" is just an integer which was stuffed
> into the field.
>
> If the code used a union of an int and a "struct Node *", instead of
> just a "struct Node *", the resulting machine code would be identical,
> but the compiler wouldn't complain because the code would "explain
> itself" to the compiler's satisfaction.
>   
During deletion, (int)tmp_nptr->branch[i].child is done. I think the 
idea is to get both a node pointer if tmp_nptr is an internal node and 
the tid if tmp_nptr is on level 0.
During reinsertion I would really like to pass a pointer if b.child is a 
pointer to the next node. If tmp_nptr->branch[i].child is cast to an 
integer, and we want to keep that behaviour, we would need to use off_t 
and LFS for the rtree library. I can't judge if a union would work in 
this case without off_t usage: (off_t)tmp_nptr->branch[i].child, because 
the address of tmp_nptr->branch[i].child is needed for internal nodes.

The branch structure holds the rectangle (bounding box) and a pointer to 
a node. The rectangle is what the spatial index is really after, 
therefore I decided to put the feature id in the branch structure, next 
to the rectangle. I gave all internal nodes a feature id of 0, in the 
hope that e.g. line numbers start with 1 and not with 0. I'm sure there 
is a more elegant solution, but with this you could build in some more 
safety checks.
This solution is LFS-independent, but increases the size of the branch 
structure, thus requiring more memory. The required changes to the code 
are still minimal.

To me it seems that the solution interfering the least with the current 
code would be to make use of off_t (only some replacements of int with 
off_t) and adjust the Makefile for rtree accordingly. Somehow this 
appears to me more dangerous than my solution with an additional int id, 
but I can't give precise reasons why. Maybe because higher-level code 
would have to be adjusted too, at least everything using a 
SearchHitCallback with RTreeSearch().
As I said, I understand the concept better than the code. At least, I 
think that passing around pointer addresses as integer is not good 
coding practise.

In short, I think even when using a union of an int and a "struct Node 
*", additional changes are required. You want to give it a try?

BTW, I think it is possible that this is also the reason in the new 
ticket #494.



More information about the grass-dev mailing list