[GRASS-dev] Re: [GRASS-user] Large vector files

Brad Douglas rez at touchofmadness.com
Sun Oct 8 21:14:34 EDT 2006


On Mon, 2006-10-09 at 13:01 +1300, Hamish wrote:
> HB:
> > > > I am always looking for feedback on how r.in.xyz goes with massive
> > > > input data. (>2gb? >4gb?)
> GC:
> > > r.in.xyz doesn't use LFS, so it will be limited to 2Gb on 32-bit
> > > systems (any system where "long" is 32 bits). As it uses ANSI stdio
> > > functions (including ftell/fseek), extending it to support large
> > > files would be non-trivial.
> HA:
> It's inherent in the purpose of the module that it be LFS compliant.
>
> I disagree with "extending it to support large files is non-trivial":

I disagree with your disagreement. :-)

> HA:
> All the filesize, ftell, fseek calls don't need to be there and can 
> easily be #ifdef'd out if required. They are just there for the
> (somewhat lame & inaccurate; but fast, lightweight, and non-critical)
> guess at the total number of lines in the input file to pass to
> G_percent().
> 
> But G_percent() is most interesting when the processing will take a long
> time, so it would be nice to have it there for large files.
> 
> 
> This is the critical loop:
>  while( 0 != G_getl2(buff, BUFFSIZE-1, in_fd) ) { ... }

And this is where the bulk of the problem lies.  'in_fd' is a
misinformed name for the variable.  It is not a file descriptor, but is
in fact a 'FILE' struct.

Many GRASS functions almost require using fopen()/fclose()/fseek(), etc.
because of its dependence on the 'FILE' structure.
open()/close()/lseek() are better bets if you don't like your data
truncated, but requires POSIX.1.

> HA:
> besides that it's just fopen() and fclose() -- it is very simple really.
> All the other scanning stuff is optional.

I sent Glynn a few ideas I have.  They would require that GRASS handles
all file IO, so we'd have to add a G_open()/G_close().  Large files are
quickly becoming a show-stopper.

> BD:
> > Attached is a quick patch to enable LFS.  It's "poorly" implemented
> > with fseeko/ftello, so I'm not sure if I should commit it.
> HA:
> Thanks Brad. The patch looks good to my untrained eye, my only query is
> if those calls should be conditionalized to USE_LARGEFILES, as fseeko()
> & co are not ANSI compliant:
> 
> ====snip====
> NOTES
>        These  functions  are found on SysV-like systems.  They are not
>        present in libc4, libc5, glibc 2.0 but  available  since  glibc
>        2.1.
> 
> CONFORMING TO
>        The fseeko and ftello functions conform to SUSv2.
> ============

This is why I didn't want to commit it.  Besides being non-standard,
they are poor implementations with plenty of caveats to go around.
Collect them all and trade them with your friends! ;-)


-- 
Brad Douglas <rez touchofmadness com>                    KB8UYR/6
Address: 37.493,-121.924 / WGS84    National Map Corps #TNMC-3785




More information about the grass-dev mailing list