[GRASS-dev] vector large file support
Markus Metz
markus.metz.giswork at googlemail.com
Sun Feb 8 06:05:00 EST 2009
Glynn Clements wrote:
> Markus Metz wrote:
>
>
>> Do I understand right that fseeko and ftello are only needed on 32-bit
>> systems that want D_FILE_OFFSET_BITS=64? fseek e.g. returns long which
>> is on my 64bit Linux 64bit, I guess that's why I can write coor files >
>> 2GB with the current vector libs.
>>
>
> Yes. There's no point in using them unless off_t is larger than long
> (i.e. 64-bit off_t versus 32-bit long).
>
I like the point of Ivan that off_t is the native type for file offsets.
Could G_fseek then use fseeko whenever fseeko is available (ditto for
ftello)?
>
>> So we do need G_fseek() and G_ftell()
>>
>
> Yes. Those would be useful regardless of anything related to the
> vector format.
>
So it's about time these functions get implemented ;-)
>
> Bear in mind that a GRASS database may be on a networked file system,
> and accessed by both 32- and 64-bit systems, and by both big- and
> little-endian systems.
>
> Also, the user shouldn't need write permission in order to read a map.
> Or, rather, don't assume that the user has write permission for a map
> which they are reading.
>
OK, the biggest problem is to support reading a vector written with
sizeof(off_t) == 8 when the libs use sizeof(off_t) == 4, without
rebuilding topology. As you suggested, 2 32bit reads can be done, and
depending on the endian-ness of the host system either the high word
value or the low word value used. To read and write offsets, two new
functions are needed anyway in diglib/portable.c, something like
dig__fread_port_O() and dig__fwrite_port_O(). Type size mismatch and/or
endian-ness mismatch is already handled by the current code for other
types. In this particular case reading offset twice seems less of a
hassle than I thought first (recycle the argument for the number of
reads as used by dig__fread_port_L(), reading long).
If a vector was written with sizeof(off_t) == 4 but the libs use
sizeof(off_t) == 8, the handling for reading could be the same like
currently done for long.
When writing offsets, it would be easiest (also safest?) to always use
sizeof(off_t) of the libs. There will be no mix of different offset
sizes because topo and cidx are currently written anew when the vector
was updated.
More information about the grass-dev
mailing list