[GRASS-dev] configure: testing arch endian

Mon May 8 18:26:20 EDT 2006

Hamish wrote:

> > r.in.mat and r.out.mat are littered with "sizeof(long) == 4"
> > assumptions.
> 
> The MATv4 file format specifies this (IIRC, it's been a while now & I'd
> have to look it up), at least for the header but I think for the arrays
> as well. The file header specifies which endianness the data that
> follows was written in, it allows either.

You're getting confused. sizeof(long) is whatever the compiler says it
is; the MATv4 format has no say in the matter.

If you mean that the fields are supposed to be 4-byte integers, that's
a different matter. In that case, the code needs to use 4-byte
integers, not "long".

> > Also, AFAICT, r.out.mat always writes the output in the
> > system's byte-order,
> 
> as specified by the format, byte order used is recorded in the header,

But there's no requirement that it's the same as the system's
byte-order, right?

> > and r.in.mat just assumes that the file is in the system's byte-order
> > (it checks, but doesn't do anything in the event of a mismatch).
> 
> In the event of a mismatch it triggers a warning that this is "TODO" and
> the rest will likely not succeed. I prefer that to a G_fatal_error(), it
> encourages help with debugging. It is likely that more of those warnings
> are needed for other endian/64bit permutations. The situation is also
> mentioned in the r.in.mat help page. I'd rather invest the time fixing
> the problem vs. going to great lengths to add more elablorate tests to
> provide "sorry," messages.

Right. So explicit [de]serialisation will prevent all of those
problems while eliminating the need to check the system's byte order.

> > Both of those programs need to be substantially re-written.
> 
> I welcome help. Bitwise operations are not my forte.

Converting an integer to 4 bytes, little-endian:

	void serialise_int32_le(unsigned char *buf, long x)
	{
		int i;
		for (i = 0; i < 4; i++)
			buf[i] = (x >> i*8) & 0xFF;
	}

Converting an integer to 4 bytes, big-endian:

	void serialise_int32_be(unsigned char *buf, long x)
	{
		int i;
		for (i = 0; i < 4; i++)
			buf[3-i] = (x >> i*8) & 0xFF;
	}

Converting 4 bytes, little-endian, to an integer;

	long deserialise_int32_le(const unsigned char *buf)
	{
		long x = 0;
		for (i = 0; i < 4; i++)
			x |= (buf[i] << i*8);
		return x;
	}

Converting 4 bytes, big-endian, to an integer;

	long deserialise_int32_le(const unsigned char *buf)
	{
		long x = 0;
		for (i = 0; i < 4; i++)
			x |= (buf[3-i] << i*8);
		return x;
	}

These work regardless of whether the system is big- or little-endian
or whether x is 32 or 64 bits.

> [r.out.bin]
> > If you change the semantics of that flag so that the absence of the -s
> > switch means little-endian while the presence of the flag means
> > big-endian, r.out.bin doesn't need to know the host's byte order, and
> > a given r.out.bin command achieves the same result (file in big-endian
> > format or file in little-endian format) regardless of the system's
> > byte order.
> 
> mmph. don't make the confusion worse; change the flag's letter to
> something else and loudly warn -s is superseded. Are you advocating that
> the default mode should not write out in the native byte order?!

That's correct. The system's byte order is irrelevant for file
formats.

> Is that really "expected behavior"? I would think it preferable to have -b and
> -l flags to force big or little if you want them, otherwise go native.

-b and -l make sense, but the default should be one or other
regardless of the CPU type. The system's byte order is irrelevant for
file formats.

> (but -b is taken for BIL, "-l" should be avoided as it looks the same as
> "-1" in some fonts, and -e,-E has issues for mingw32 people...?)
> 
> > As for backwards compatibility, making the default byte order (no -s
> > flag) little-endian means that anyone using x86 (i.e. most users) will
> > be unaffected.
> 
> that's pretty crap for the non x86 crowd (who are the ones most likely
> to need that flag in the first place).

Probably not. Most of the DEMs I've come across (e.g. ETOPO30) are in
big-endian format, so it's the x86 users who need -s.

> > IOW, I've yet to come across a situation which actually has a
> > legitimate reason to know the system's byte order.
> 
> my feelings are: seamless at the user end is good. ambiguity is bad.

Exactly.

A specific command should achieve a specific result without the user
first having to run "uname -m" then find out whether SPARC is big- or
little-endian so that they know whether or not to use -s.

There should be one option for creating little-endian files and
another for big-endian files. The system's byte order is irrelevant.

> > BTW, when it comes to floating-point values, the situation isn't as
> > simple as big- or little-endian. On some systems, FP values may use a
> > different byte order to integers, or double-precision FP values may
> > have the 32-bit halves in a different order than the order of bytes
> > within a word.
> 
> fun fun fun

Yep. Which is presumably why libgis uses XDR for FP rasters.

-- 
Glynn Clements <glynn at gclements.plus.com>