[GRASS-dev] Re: [GRASS-user] Large vector files
Glynn Clements
glynn at gclements.plus.com
Mon Oct 9 06:48:06 EDT 2006
Brad Douglas wrote:
> > All the filesize, ftell, fseek calls don't need to be there and can
> > easily be #ifdef'd out if required. They are just there for the
> > (somewhat lame & inaccurate; but fast, lightweight, and non-critical)
> > guess at the total number of lines in the input file to pass to
> > G_percent().
> >
> > But G_percent() is most interesting when the processing will take a long
> > time, so it would be nice to have it there for large files.
> >
> >
> > This is the critical loop:
> > while( 0 != G_getl2(buff, BUFFSIZE-1, in_fd) ) { ... }
>
> And this is where the bulk of the problem lies. 'in_fd' is a
> misinformed name for the variable. It is not a file descriptor, but is
> in fact a 'FILE' struct.
>
> Many GRASS functions almost require using fopen()/fclose()/fseek(), etc.
> because of its dependence on the 'FILE' structure.
> open()/close()/lseek() are better bets if you don't like your data
> truncated, but requires POSIX.1.
We already require those functions in libgis and a lot of other
places.
OTOH, it would be nice to minimise the number of places that use the
Unix I/O functions are used, with most modules restricted to the ANSI
stdio functions. Unfortunately, that makes LFS a pain.
It would all be so much easier if the Linux API/ABI was redefined to
use a 64-bit "long" even on 32-bit platforms; _FILE_OFFSET_BITS is a
really ugly hack.
> > HA:
> > besides that it's just fopen() and fclose() -- it is very simple really.
> > All the other scanning stuff is optional.
>
> I sent Glynn a few ideas I have. They would require that GRASS handles
> all file IO, so we'd have to add a G_open()/G_close(). Large files are
> quickly becoming a show-stopper.
It isn't open/close that's the problem; we need G_seek() and G_tell()
functions that always use off_t, using fseeko() and ftello() where
available, and fseek() and ftell() (with their 32-bit limitations)
where they aren't.
BTW, note that the problems with large files aren't necessarily
limited to I/O. Modules which compute cell counts can overrun the
31-bit range (even with files <2GiB, by virtue of compression).
--
Glynn Clements <glynn at gclements.plus.com>
More information about the grass-dev
mailing list