[GRASS5] thoughts about runlength encoding

Glynn Clements glynn.clements at virgin.net
Thu Aug 12 10:52:08 EDT 2004


Glynn Clements wrote:

> BTW, if you're interested in the format of raster files, you should
> probably look at the {get,put}_row2.c files which I posted recently
> (in the "Raster lib and CELL files > 2GB" thread). Hopefully, these
> should be somewhat easier to read than the original versions.

I've now committed these to CVS.

> In the long run, I'm hoping to completely re-write the raster I/O
> code. I'm not planning to support RLE compression (other than to allow
> old files to be converted to the new format), but to use zlib for both
> integer and FP formats.
> 
> However, a complete re-write is a long way off. In the mean time, I'm
> considering implementing some of the less radical changes as an
> intermediate measure. Primarily, I intend to add support for 64-bit
> offsets on 32-bit platforms (so that raster files aren't limited to
> 2Gb). I'm also thinking about supporting the use of zlib for integer
> maps, as well as the possibility of eliminating the null file.

I've also committed fixes to use off_t instead of long throughout the
raster I/O code. If you compile with -D_FILE_OFFSET_BITS=64, you
should be able to have raster maps which are larger than 2Gb (tested
briefly).

The code which reads the row pointers can handle both 32-bit and
64-bit offsets regardless of whether -D_FILE_OFFSET_BITS=64 was used. 
Obviously, if you have a file which actually exceeds 2Gb, you can't
read it with a version of GRASS which wasn't built with
-D_FILE_OFFSET_BITS=64.

Also, the code which writes the row pointers will only write 64-bit
offsets when necessary (i.e. if the file is larger than 4Gb), so using
-D_FILE_OFFSET_BITS=64 shouldn't introduce any incompatibilities with
previous versions.

The only tricky issue is pre-3.0 compressed files (indicated by
cellhd.format being negative). These have the row pointers in the
native format (in terms of sizeof(long) and endianness) of the system
which wrote the file, with no indication as to exactly which format is
used.

E.g. if a pre-3.0 compressed file was written on a little-endian
system where sizeof(long) == 4, it can only be read on a little-endian
system where sizeof(off_t) == 4 (i.e. using -D_FILE_OFFSET_BITS=64
will prevent such files from being read).

OTOH, I would guess that pre-3.0 compressed files are almost
non-existent these days, so this is unlikely to be an issue.

Note that the changes only affect the raster I/O code. Other files
(e.g. temporary files created directly by a program) will typically
still be limited to 32 bits (most of the code which I've seen not only
uses "long" to hold offsets, but actually calculates offsets using
"int" arithmetic, and so is limited to 32 bits even on systems with a
64-bit "long" type).

In that regard, adding -D_FILE_OFFSET_BITS=64 globally (e.g. by adding
it to CFLAGS when running the configure script) may be risky, as such
programs will open their temporary files using the 64-bit API, but
will silently wrap offsets larger than 2Gb.

Without that switch, open() will simply refuse to open files which are
larger than 2Gb, and write() will fail if it would result in the
file's size exceeding 2Gb.

-- 
Glynn Clements <glynn.clements at virgin.net>




More information about the grass-dev mailing list