[GRASS-dev] Re: [GRASS-user] Large vector files
Brad Douglas
rez at touchofmadness.com
Sun Oct 8 21:14:34 EDT 2006
On Mon, 2006-10-09 at 13:01 +1300, Hamish wrote:
> HB:
> > > > I am always looking for feedback on how r.in.xyz goes with massive
> > > > input data. (>2gb? >4gb?)
> GC:
> > > r.in.xyz doesn't use LFS, so it will be limited to 2Gb on 32-bit
> > > systems (any system where "long" is 32 bits). As it uses ANSI stdio
> > > functions (including ftell/fseek), extending it to support large
> > > files would be non-trivial.
> HA:
> It's inherent in the purpose of the module that it be LFS compliant.
>
> I disagree with "extending it to support large files is non-trivial":
I disagree with your disagreement. :-)
> HA:
> All the filesize, ftell, fseek calls don't need to be there and can
> easily be #ifdef'd out if required. They are just there for the
> (somewhat lame & inaccurate; but fast, lightweight, and non-critical)
> guess at the total number of lines in the input file to pass to
> G_percent().
>
> But G_percent() is most interesting when the processing will take a long
> time, so it would be nice to have it there for large files.
>
>
> This is the critical loop:
> while( 0 != G_getl2(buff, BUFFSIZE-1, in_fd) ) { ... }
And this is where the bulk of the problem lies. 'in_fd' is a
misinformed name for the variable. It is not a file descriptor, but is
in fact a 'FILE' struct.
Many GRASS functions almost require using fopen()/fclose()/fseek(), etc.
because of its dependence on the 'FILE' structure.
open()/close()/lseek() are better bets if you don't like your data
truncated, but requires POSIX.1.
> HA:
> besides that it's just fopen() and fclose() -- it is very simple really.
> All the other scanning stuff is optional.
I sent Glynn a few ideas I have. They would require that GRASS handles
all file IO, so we'd have to add a G_open()/G_close(). Large files are
quickly becoming a show-stopper.
> BD:
> > Attached is a quick patch to enable LFS. It's "poorly" implemented
> > with fseeko/ftello, so I'm not sure if I should commit it.
> HA:
> Thanks Brad. The patch looks good to my untrained eye, my only query is
> if those calls should be conditionalized to USE_LARGEFILES, as fseeko()
> & co are not ANSI compliant:
>
> ====snip====
> NOTES
> These functions are found on SysV-like systems. They are not
> present in libc4, libc5, glibc 2.0 but available since glibc
> 2.1.
>
> CONFORMING TO
> The fseeko and ftello functions conform to SUSv2.
> ============
This is why I didn't want to commit it. Besides being non-standard,
they are poor implementations with plenty of caveats to go around.
Collect them all and trade them with your friends! ;-)
--
Brad Douglas <rez touchofmadness com> KB8UYR/6
Address: 37.493,-121.924 / WGS84 National Map Corps #TNMC-3785
More information about the grass-dev
mailing list