[GRASS-dev] vector large file support

Markus Metz markus.metz.giswork at googlemail.com
Fri Feb 6 05:28:13 EST 2009



Glynn Clements wrote:
> Markus Metz wrote:
>
>   
>>> [...]  
>>>       
>> How about
>>
>> extern off_t G_ftell(FILE *fp)
>> {
>> #ifdef HAVE_LARGEFILES
>>     return (ftello(fp);
>> #else
>>     return (ftell(fp);
>> #endif     
>> }
>>     
>
> Yep, other than the extraneous open parenthesis (2 open, 1 close). 
>   
Oops, my sloppy writing... What about off_t lseek(int fd, off_t offset, 
int whence) ? From the GNU C library: "The lseek function is the 
underlying primitive for the fseek, fseeko, ftell, ftello and rewind 
functions [...]" lseek is used in libgis and several modules, I didn't 
see something like the above #ifdef construct. Not an option for the 
vector libs I assume, because these would need to be largely rewritten 
when using lseek instead of fseek, read instead of fread and so on 
(using file descriptor instead of stream pointer throughout). Probably a 
nonsense idea anyway.
>   
>> [...]
>>     
>
> I think that the code which reads these files needs functions to
> read/write off_t values at the size used by the file, not the size
> used by the code.
>
> I.e. if the code is built for 64-bit off_t, it should still be able to
> directly read/write files using a 32-bit off_t. Code built for 32-bit
> off_t should also directly read/write files which use a 64-bit off_t,
> subject to the constraint that only 31 bits are non-zero (if you have
> a 32-bit off_t, attempting to open a file >=2GiB will fail, as will
> attempting to enlarge a file beyond that size).
>   
The problem I see is that offset values are stored in topo and cidx 
(e.g. the topo file knows that line i is in the coor file at offset o). 
So if the topo file was written with 64-bit off_t but the current 
compiled library uses 32-bit off_t, can this 32-bit library somehow get 
these 64-bit offset values out of the topo file? Granted that these 
values are in the 32-bit range. I have really no idea if this can be 
done, my suggestion would be to rebuild topology if there is a mismatch 
between off_t size used in the topo file and off_t size used by the 
current library. The other way around may be less problematic, when you 
have a 64-bit off_t library and a topo file with 32-bit offset values. 
As long as you know what off_t size was used to write the topo and cidx 
files. And now the mess starts, I'm afraid. The header of the topo file 
would need to get modified so that it holds the off_t size used to write 
this file. This information must be available before any attempt is made 
to retrieve an offset value from the topo file. Then do some safety 
checking if the offset values can be properly retrieved, if no, request 
rebuilding topology...

The functions needed to read/write off_t you mention above will not be 
as easy as the current functions to read/write e.g. int or long, because 
the current code uses a fixed length for int and long (and all others) 
that is independent of the system to guarantee portability. The 
functions to read/write off_t need to work with variable off_t size, 
because LFS may not always be requested when compiling grass and because 
it would be nice if topo files written with an off_t size different from 
the current library can be retrieved.

I get the impression that the code would have to be changed 
considerably, that all sorts of safety checks need to be built in, that 
different off_t sizes must be supported independent of LFS presence, 
that the layout of the topo file needs to be changed, the layout of the 
cidx file maybe not maybe yes, and that it will create some 
forward/backward compatibility problems when reading even small vectors 
with different grass versions. There were big changes in both the raster 
and the vector model before, grass7 would be an opportunity to introduce 
changes, but this is getting a bit too much for me and I think I should 
leave this to people in the know instead of making wild suggestions.

Markus M



More information about the grass-dev mailing list