[GRASS-dev] vector large file support

Markus Metz markus.metz.giswork at googlemail.com
Wed Feb 4 03:27:41 EST 2009


I tried to understand the grass wiki on Large File Support, sorry for 
being a bit late with that!

Glynn Clements wrote:
> Markus Metz wrote:
>
>   
>> If the coor file size stored in the topo file is indeed needed to 
>> properly process the coor file, the respective variables must be 
>> something else than long in order to support coor files larger than 2 
>> GB, maybe long long? Same for all intermediate variables in the vector 
>> library storing coor file size.
>> Looking at limits.h, long can be like int or like long long (only true 
>> 64 bit systems). I use Linux 64bit with 32bit compatibility, here long 
>> is like int. Someone more familiar with type limits and type 
>> declarations on different systems please help!
>>     
>
> As you note, long will normally be the largest size which the CPU can
> handle natively, while long long (only available in C99 or as a gcc
> extension) can be expected to be 64 bits where it exists. FWIW, "int"
> can theoretically be 64 bits, but this is rare.
>
> The correct type to use for the size of a file is off_t, which can be
> made to be a 64-bit type by adding -D_FILE_OFFSET_BITS=64 to the
> compilation switches. This should only be done if $(USE_LARGEFILES) is
> non-empty (corresponding to --enable-largefile).
>
> However, that alone isn't sufficient, as you have to explicitly force
> offset calculations to be performed using off_t rather than int/long,
> e.g.:
>
> 	long idx, step;
> 	...
> 	off_t offset = (off_t) idx * step;
> or:
> 	off_t offset = idx * (off_t) step;
>
> Note that:
>
> 	off_t offset = idx * step;
> and:
> 	off_t offset = (off_t) (idx * step);
>
> won't work, as the result isn't up-cast until after it has been
> truncated.
>   
I think I understand. So according to the grass wiki the steps to enable 
large file support would be

1) add
ifneq ($(USE_LARGEFILES),)
EXTRA_CFLAGS = -D_FILE_OFFSET_BITS=64
endif

to all relevant Makefiles

2) use off_t where appropriate, and take care with type casting. file 
offset is used in various different places in the vector library, a bit 
of work to get off_t usage right.

3) solve the fseek/fseeko and ftell/ftello problem. Get inspiration from 
libgis and LFS-safe modules? Or as suggested in the grass wiki on LFS, add
extern off_t G_ftell(FILE *fp);
extern int G_fseek(FILE *stream, off_t offset, int whence);
for global use?

4) figure out if coor file size really needs to be stored in coor and 
topo. coor file size doesn't say a lot about the number of features 
because coor can contain a high proportion of dead lines (a problem in 
itself, vector TODO). If if does not need to be stored in coor and topo, 
how does removing coor file size info affect reading and writing of coor 
and topo? Are there hard-coded offsets for reading these files?

It would be great to have LFS support in vector libs in grass7! I am 
getting coor files > 2GB more and more often with v.in.ogr and v.clean, 
and I suspect that modifying a coor file > 2GB, even if the module does 
the work and does not complain, produces unusable results. I can now 
modify a module so that it takes say a 1GB coor file, works on it, e.g. 
do some cleaning, coor file grows over 2GB in the process, at the end of 
the module only alive lines are written out and the resulting coor file 
is again below 2 GB. But in between some unnoticed errors may have 
occurred. This is no good.

Regards,

Markus



More information about the grass-dev mailing list