[GRASS-dev] Re: [GRASS-user] Large vector files
Hamish
hamish_nospam at yahoo.com
Sun Oct 8 20:01:12 EDT 2006
[moved to the devel list]
HB:
> > > I am always looking for feedback on how r.in.xyz goes with massive
> > > input data. (>2gb? >4gb?)
GC:
> > r.in.xyz doesn't use LFS, so it will be limited to 2Gb on 32-bit
> > systems (any system where "long" is 32 bits). As it uses ANSI stdio
> > functions (including ftell/fseek), extending it to support large
> > files would be non-trivial.
It's inherent in the purpose of the module that it be LFS compliant.
I disagree with "extending it to support large files is non-trivial":
All the filesize, ftell, fseek calls don't need to be there and can
easily be #ifdef'd out if required. They are just there for the
(somewhat lame & inaccurate; but fast, lightweight, and non-critical)
guess at the total number of lines in the input file to pass to
G_percent().
But G_percent() is most interesting when the processing will take a long
time, so it would be nice to have it there for large files.
This is the critical loop:
while( 0 != G_getl2(buff, BUFFSIZE-1, in_fd) ) { ... }
besides that it's just fopen() and fclose() -- it is very simple really.
All the other scanning stuff is optional.
BD:
> Attached is a quick patch to enable LFS. It's "poorly" implemented
> with fseeko/ftello, so I'm not sure if I should commit it.
Thanks Brad. The patch looks good to my untrained eye, my only query is
if those calls should be conditionalized to USE_LARGEFILES, as fseeko()
& co are not ANSI compliant:
====snip====
NOTES
These functions are found on SysV-like systems. They are not
present in libc4, libc5, glibc 2.0 but available since glibc
2.1.
CONFORMING TO
The fseeko and ftello functions conform to SUSv2.
============
I don't have:
- a 64 bit machine
- a dataset that large
- a [funded] research project that needs it
- any real experience with LFS
so it is as it is, and I welcome improvements from anybody with
something from the above list.
Hamish
More information about the grass-dev
mailing list