[GRASS-dev] Re: [GRASS-user] Large vector files

Glynn Clements glynn at gclements.plus.com
Mon Oct 9 06:48:06 EDT 2006


Brad Douglas wrote:

> > All the filesize, ftell, fseek calls don't need to be there and can 
> > easily be #ifdef'd out if required. They are just there for the
> > (somewhat lame & inaccurate; but fast, lightweight, and non-critical)
> > guess at the total number of lines in the input file to pass to
> > G_percent().
> > 
> > But G_percent() is most interesting when the processing will take a long
> > time, so it would be nice to have it there for large files.
> > 
> > 
> > This is the critical loop:
> >  while( 0 != G_getl2(buff, BUFFSIZE-1, in_fd) ) { ... }
> 
> And this is where the bulk of the problem lies.  'in_fd' is a
> misinformed name for the variable.  It is not a file descriptor, but is
> in fact a 'FILE' struct.
> 
> Many GRASS functions almost require using fopen()/fclose()/fseek(), etc.
> because of its dependence on the 'FILE' structure.
> open()/close()/lseek() are better bets if you don't like your data
> truncated, but requires POSIX.1.

We already require those functions in libgis and a lot of other
places.

OTOH, it would be nice to minimise the number of places that use the
Unix I/O functions are used, with most modules restricted to the ANSI
stdio functions. Unfortunately, that makes LFS a pain.

It would all be so much easier if the Linux API/ABI was redefined to
use a 64-bit "long" even on 32-bit platforms; _FILE_OFFSET_BITS is a
really ugly hack.

> > HA:
> > besides that it's just fopen() and fclose() -- it is very simple really.
> > All the other scanning stuff is optional.
> 
> I sent Glynn a few ideas I have.  They would require that GRASS handles
> all file IO, so we'd have to add a G_open()/G_close().  Large files are
> quickly becoming a show-stopper.

It isn't open/close that's the problem; we need G_seek() and G_tell()
functions that always use off_t, using fseeko() and ftello() where
available, and fseek() and ftell() (with their 32-bit limitations)
where they aren't.

BTW, note that the problems with large files aren't necessarily
limited to I/O. Modules which compute cell counts can overrun the
31-bit range (even with files <2GiB, by virtue of compression).

-- 
Glynn Clements <glynn at gclements.plus.com>




More information about the grass-dev mailing list