[GRASS-dev] vector large file support
glynn at gclements.plus.com
Sun Feb 8 02:44:35 EST 2009
Markus Metz wrote:
> Do I understand right that fseeko and ftello are only needed on 32-bit
> systems that want D_FILE_OFFSET_BITS=64? fseek e.g. returns long which
> is on my 64bit Linux 64bit, I guess that's why I can write coor files >
> 2GB with the current vector libs.
Yes. There's no point in using them unless off_t is larger than long
(i.e. 64-bit off_t versus 32-bit long).
> > It's not worth using "raw" I/O just to avoid this issue. Apart from
> > anything else, there's a potentially huge performance hit, as the
> > vector library tends to use many small read/write operations. Using
> > low-level I/O requires a system call for each operation, while the
> > stdio interface will coalesce these, reading/writing whole blocks.
> Interesting and good to know. So we do need G_fseek() and G_ftell()
Yes. Those would be useful regardless of anything related to the
> >> The problem I see is that offset values are stored in topo and cidx
> >> (e.g. the topo file knows that line i is in the coor file at offset o).
> >> So if the topo file was written with 64-bit off_t but the current
> >> compiled library uses 32-bit off_t, can this 32-bit library somehow get
> >> these 64-bit offset values out of the topo file?
> > In the worst case, it can just perform 2 32-bit reads, and check that
> > the high word is zero and the low word is positive.
> Uff. Some more safety checks in the code. From a coding perspective it's
> easier just to request a topology rebuild. Annoying for the user though.
> OTOH, that coor file size check is done before anything is read from the
> coor file, the libs could say something like "Sorry, that vector is too
> big for you. Please recompile GRASS with LFS" (more friendly phrasing
> needed). Also potentially annoying.
Right. But if you have a >=2GiB coor file with a 32-bit off_t, the OS
will refuse to open() to the coor file regardless of any checks GRASS
> But if the coor file size check is
> passed (<= 2GB), the high word must be always zero, otherwise it would
> refer to an offset beyond EOF. You could just use the low word value.
> Would you have to swap high word and low word if the byte order of the
> vector is different from the byte order of the current system?
> happen when e.g. a whole grass location is copied to another system. I
> think not because the vector libs use their own fixed byte order. I
> would really just request a topology rebuild to avoid all this hassle.
Bear in mind that a GRASS database may be on a networked file system,
and accessed by both 32- and 64-bit systems, and by both big- and
Also, the user shouldn't need write permission in order to read a map.
Or, rather, don't assume that the user has write permission for a map
which they are reading.
> > If the topo file contains any offsets which exceed the 2GiB range,
> > then the coor file will be larger than 2GiB. If you aren't using
> > _FILE_OFFSET_BITS=64, open()ing the coor file will likely fail.
> Opening the coor file is not even attempted with the current code in
> this situation, because the coor file size stored in the topo header can
> not be larger than 2GB and this size is used for a safety check before
> opening the coor file. Actually, I don't know what would happen on a
> 32-bit system. If new vector libs are compiled without LFS, does a
> 32-bit system have a chance to find out that the coor file is too large?
> To be precise, when calling stat(path, &stat_buf), what would be the
> maximum possible value of stat_buf.st_size in 32-bit? Likely LONG_MAX.
Effectively; using stat() on a file >=2GiB results in:
(stat()) path refers to a file whose size cannot be represented
in the type off_t. This can occur when an application
compiled on a 32-bit platform without -D_FILE_OFFSET_BITS=64
calls stat() on a file whose size exceeds (2<<31)-1 bits.
Glynn Clements <glynn at gclements.plus.com>
More information about the grass-dev