[GRASS-dev] vector large file support

Glynn Clements glynn at gclements.plus.com
Sun Feb 8 11:37:31 EST 2009


Markus Metz wrote:

> >> Do I understand right that fseeko and ftello are only needed on 32-bit 
> >> systems that want D_FILE_OFFSET_BITS=64? fseek e.g. returns long which 
> >> is on my 64bit Linux 64bit, I guess that's why I can write coor files > 
> >> 2GB with the current vector libs.
> >>     
> >
> > Yes. There's no point in using them unless off_t is larger than long
> > (i.e. 64-bit off_t versus 32-bit long).
> 
> I like the point of Ivan that off_t is the native type for file offsets. 
> Could G_fseek then use fseeko whenever fseeko is available (ditto for 
> ftello)?

Well, that's the general idea. The only advantage of fseek/ftell is
that they are always available.

> > Bear in mind that a GRASS database may be on a networked file system,
> > and accessed by both 32- and 64-bit systems, and by both big- and
> > little-endian systems.
> >
> > Also, the user shouldn't need write permission in order to read a map. 
> > Or, rather, don't assume that the user has write permission for a map
> > which they are reading.
> 
> OK, the biggest problem is to support reading a vector written with 
> sizeof(off_t) == 8 when the libs use sizeof(off_t) == 4, without 
> rebuilding topology.

The biggest problem is when the compiler doesn't provide a 64-bit
integral type (off_t doesn't necessarily have to be 64 bits).

> As you suggested, 2 32bit reads can be done, and 
> depending on the endian-ness of the host system either the high word 
> value or the low word value used.

The low word is always used. That might be the first word or the
second word, but it's always the low word.

> To read and write offsets, two new 
> functions are needed anyway in diglib/portable.c, something like 
> dig__fread_port_O() and dig__fwrite_port_O().

Yep.

> Type size mismatch and/or 
> endian-ness mismatch is already handled by the current code for other 
> types. In this particular case reading offset twice seems less of a 
> hassle than I thought first (recycle the argument for the number of 
> reads as used by dig__fread_port_L(), reading long).
> If a vector was written with sizeof(off_t) == 4 but the libs use 
> sizeof(off_t) == 8, the handling for reading could be the same like 
> currently done for long.
> When writing offsets, it would be easiest (also safest?) to always use 
> sizeof(off_t) of the libs. There will be no mix of different offset 
> sizes because topo and cidx are currently written anew when the vector 
> was updated.

It would be both easiest and safest. Although it would be preferable
to use 32 bits if that is known to be sufficient, I don't know whether
this is feasible.

I tried to do the same thing with the raster row offsets, but you
can't tell whether you need 32 or 64 bits until it's too late, so it
always uses 64 bits.

-- 
Glynn Clements <glynn at gclements.plus.com>


More information about the grass-dev mailing list