[GRASS-dev] vector large file support
Markus Metz
markus.metz.giswork at googlemail.com
Fri Feb 6 05:28:13 EST 2009
Glynn Clements wrote:
> Markus Metz wrote:
>
>
>>> [...]
>>>
>> How about
>>
>> extern off_t G_ftell(FILE *fp)
>> {
>> #ifdef HAVE_LARGEFILES
>> return (ftello(fp);
>> #else
>> return (ftell(fp);
>> #endif
>> }
>>
>
> Yep, other than the extraneous open parenthesis (2 open, 1 close).
>
Oops, my sloppy writing... What about off_t lseek(int fd, off_t offset,
int whence) ? From the GNU C library: "The lseek function is the
underlying primitive for the fseek, fseeko, ftell, ftello and rewind
functions [...]" lseek is used in libgis and several modules, I didn't
see something like the above #ifdef construct. Not an option for the
vector libs I assume, because these would need to be largely rewritten
when using lseek instead of fseek, read instead of fread and so on
(using file descriptor instead of stream pointer throughout). Probably a
nonsense idea anyway.
>
>> [...]
>>
>
> I think that the code which reads these files needs functions to
> read/write off_t values at the size used by the file, not the size
> used by the code.
>
> I.e. if the code is built for 64-bit off_t, it should still be able to
> directly read/write files using a 32-bit off_t. Code built for 32-bit
> off_t should also directly read/write files which use a 64-bit off_t,
> subject to the constraint that only 31 bits are non-zero (if you have
> a 32-bit off_t, attempting to open a file >=2GiB will fail, as will
> attempting to enlarge a file beyond that size).
>
The problem I see is that offset values are stored in topo and cidx
(e.g. the topo file knows that line i is in the coor file at offset o).
So if the topo file was written with 64-bit off_t but the current
compiled library uses 32-bit off_t, can this 32-bit library somehow get
these 64-bit offset values out of the topo file? Granted that these
values are in the 32-bit range. I have really no idea if this can be
done, my suggestion would be to rebuild topology if there is a mismatch
between off_t size used in the topo file and off_t size used by the
current library. The other way around may be less problematic, when you
have a 64-bit off_t library and a topo file with 32-bit offset values.
As long as you know what off_t size was used to write the topo and cidx
files. And now the mess starts, I'm afraid. The header of the topo file
would need to get modified so that it holds the off_t size used to write
this file. This information must be available before any attempt is made
to retrieve an offset value from the topo file. Then do some safety
checking if the offset values can be properly retrieved, if no, request
rebuilding topology...
The functions needed to read/write off_t you mention above will not be
as easy as the current functions to read/write e.g. int or long, because
the current code uses a fixed length for int and long (and all others)
that is independent of the system to guarantee portability. The
functions to read/write off_t need to work with variable off_t size,
because LFS may not always be requested when compiling grass and because
it would be nice if topo files written with an off_t size different from
the current library can be retrieved.
I get the impression that the code would have to be changed
considerably, that all sorts of safety checks need to be built in, that
different off_t sizes must be supported independent of LFS presence,
that the layout of the topo file needs to be changed, the layout of the
cidx file maybe not maybe yes, and that it will create some
forward/backward compatibility problems when reading even small vectors
with different grass versions. There were big changes in both the raster
and the vector model before, grass7 would be an opportunity to introduce
changes, but this is getting a bit too much for me and I think I should
leave this to people in the know instead of making wild suggestions.
Markus M
More information about the grass-dev
mailing list