[GRASS-dev] vector large file support

Markus Metz markus.metz.giswork at googlemail.com
Sun Feb 8 06:05:00 EST 2009


Glynn Clements wrote:
> Markus Metz wrote:
>
>   
>> Do I understand right that fseeko and ftello are only needed on 32-bit 
>> systems that want D_FILE_OFFSET_BITS=64? fseek e.g. returns long which 
>> is on my 64bit Linux 64bit, I guess that's why I can write coor files > 
>> 2GB with the current vector libs.
>>     
>
> Yes. There's no point in using them unless off_t is larger than long
> (i.e. 64-bit off_t versus 32-bit long).
>   
I like the point of Ivan that off_t is the native type for file offsets. 
Could G_fseek then use fseeko whenever fseeko is available (ditto for 
ftello)?
>   
>> So we do need G_fseek() and G_ftell()
>>     
>
> Yes. Those would be useful regardless of anything related to the
> vector format.
>   
So it's about time these functions get implemented ;-)
>
> Bear in mind that a GRASS database may be on a networked file system,
> and accessed by both 32- and 64-bit systems, and by both big- and
> little-endian systems.
>
> Also, the user shouldn't need write permission in order to read a map. 
> Or, rather, don't assume that the user has write permission for a map
> which they are reading.
>   
OK, the biggest problem is to support reading a vector written with 
sizeof(off_t) == 8 when the libs use sizeof(off_t) == 4, without 
rebuilding topology. As you suggested, 2 32bit reads can be done, and 
depending on the endian-ness of the host system either the high word 
value or the low word value used. To read and write offsets, two new 
functions are needed anyway in diglib/portable.c, something like 
dig__fread_port_O() and dig__fwrite_port_O(). Type size mismatch and/or 
endian-ness mismatch is already handled by the current code for other 
types. In this particular case reading offset twice seems less of a 
hassle than I thought first (recycle the argument for the number of 
reads as used by dig__fread_port_L(), reading long).
If a vector was written with sizeof(off_t) == 4 but the libs use 
sizeof(off_t) == 8, the handling for reading could be the same like 
currently done for long.
When writing offsets, it would be easiest (also safest?) to always use 
sizeof(off_t) of the libs. There will be no mix of different offset 
sizes because topo and cidx are currently written anew when the vector 
was updated.



More information about the grass-dev mailing list