[GRASS-dev] Re: [GRASS GIS] #73: r.out.gdal tiff output does not work

Glynn Clements glynn at gclements.plus.com
Mon Apr 20 21:23:50 EDT 2009


Markus Metz wrote:

> >> OK, but I have tested r.out.gdal with type = GDT_Int32 and nodata = 
> >> 0x80000000. Same result: NULL cells become -2147483648 but the nodata 
> >> value in the metadata says 2147483648 (gdalinfo on the export output),
> >
> > Yes. I know. Running "r.out.gdal ... nodata=0x80000000" is
> > misinterpreting what I was saying, as r.out.gdal parses nodata= as a
> > floating-point value.
> >
> > I'm talking about using (GDT_Int32)0x80000000 (i.e. -2^31) 
> 
> I think I slowly begin to understand. You suggest to use 
> (GDT_Int32)0x80000000 for both the GDAL metadata info and the GDAL 
> raster data although GDAL metadata nodata is double?

Yes; for the latter, the value would be:

	(double) (GDT_Int32) 0x80000000

IOW, -2147483648.0.

Note that this isn't the same as atof("0x80000000"), which would be
positive.

> When using a default nodata value, what I want to get in the exported 
> map is identical values for the metadata nodata value and the value 
> assigned to NULL cells. GDAL metadata nodata value is always double 
> (GDALSetRasterNoDataValue()), the value assigned to NULL cells is of the 
> selected GDAL datatype. So 0x80000000 for GDT_Int32 will be 2^31 in 
> metadata and -2^31 in the data if passed as such to the corresponding 
> GDAL functions, (double ) 0x80000000 for metadata info and (GDT_Int32) 
> 0x80000000 for raster data.

That depends upon how you interpret "0x80000000".

	(double) (GDT_Int32) 0x80000000

is equal to -2147483648.0, while 

	atof("0x80000000")

is equal to +2147483648.0.

GDT_Int32 itself cannot represent +2^31.

When writing CELL data as GDT_Int32, the default null value should be
(GDT_Int32) 0x80000000, as that's the only value which cannot occur in
the source data. The GDAL no-data value should be that value cast to a
double, i.e.:

	(double) (GDT_Int32) 0x80000000
or:
	atof("-0x80000000")
or:
	-2147483648.0

or any other permutation which produces the same "double" value.

> > or
> > (GDT_UInt32)0x80000000 (i.e. +2^31) as appropriate for the destination
> > type.
> 
> I was assuming that not only CELL, but also FCELL and DCELL maps may be 
> exported as GDT_UInt32. Then 2^31 can occur in the raster to be 
> exported. Actually, I was assuming that any GRASS raster type may be 
> exported as any GDAL data type.

Right. But exporting FCELL or DCELL as GDT_UInt32 is inherently lossy. 
For DCELL, every possible GDT_UInt32 value may occur in the source
data, so there isn't any "ideal" default no-data value.

> >>>> For r.out.gdal, there was discussion about not to override user options 
> >>>> and instead issue an error or a warning (going for error now) that the 
> >>>> nodata value is out of range.
> >>>>     
> >>> This isn't about overriding user options, but interpreting them
> >>> correctly.
> >> Close to "the module knows better" which is not desirable according to 
> >> Dylan and Moritz.
> >
> > No, the opposite. Writing code which does what the user asks for
> > rather than doing something else which is easier to code.
> 
> What if the user asks for a nodata value to be out of range, i.e. really 
> wants it to be out of range, leading to a mismatch between the nodata 
> value in the metadata and the value assigned to NULL cells in the GDAL 
> export? I understood that this was requested to be possible.

Hmm. To allow that, you can't cast to the source or destination type,
so the user will have to understand the conversion issues even if they
never use an out-of-range value. Ultimately, its either usability or
flexibility.

> >> If r.out.gdal should support out of range nodata values, and the nodata 
> >> value is read with atof(), 2^31 should stay 2^31and not be cast to 
> >> -2^31. The GUI description of the nodata option says that it is of type 
> >> float (actually it's a double), i.e. suggesting that any value that is 
> >> representable as floating point is ok.
> >
> > That's a limitation of the parser. If the input is CELL, then the
> > nodata value which the input data uses isn't any kind of float. It
> > might be more correct to have separate int/float nodata options.
> >
> > OTOH, GDAL's no-data value is always a double, even for integer data.
> 
> I would rather stick to GDAL's method of handling nodata options and try 
> to mimic that in r.out.gdal.

I don't have any preference, although I suspect that users will
regularly get bitten by the conversion issues.

> >>> OTOH, reading as FP allows you to specify +2^31 as the nodata value
> >>> when exporting CELL as UInt32. Maybe it would be better to read as
> >>> double then cast to the destination type.
> >> Hmm, then it would be no longer possible to have an out-of-range nodata 
> >> value in the metadata. I would support that solution, but it was 
> >> explicitly requested to obey all user options when the -f flag is used, 
> >> also writing the nodata value as it is to the metadata. It is then up to 
> >> the user to figure out what happened to the NULL cells, assuming that 
> >> the user knows what happens when using the -f flag.
> >
> > I can see some point to that, but may be confusing for the user, who
> > may assume that they're talking about the data's no-data value when
> > they're actually specifying GDAL's no-data value.
> 
> That's why there are warnings if there will be a mismatch between the 
> GDAL metadata nodata value and the GDAL raster nodata value (out of 
> range test). And having this mismatch is only possible with the -f flag, 
> when forcing r.out.gdal to use exactly the given parameters and options. 
> The GDAL API documentation says "To clear the nodata value, just set it 
> with an "out of range" value." This is currently possible with the -f 
> flag, but r.out.gdal then does not give any information on what value is 
> assigned to NULL cells, only what nodata value is written to the metadata.

Right. There's only one nodata= option, and it's performing a dual
function: what to store for null cells and what to set GDAL's no-data
value to. If you want to allow a mismatch, you really need two
options.

> >>> Even so, 0x80000000 is still the ideal value for exporting CELL data
> >>> to either UInt32 or Int32. You just have to pick the correct
> >>> interpretation of it (unsigned int or signed int respectively).
> >>>   
> >> The current as well as previous policy of r.out.gdal is to not to 
> >> interpret the nodata value. It is written as is (currently double) to 
> >> metadata, same like e.g. gdal_translate -a_nodata behaves.
> >
> > It doesn't have any choice about interpreting it. nodataopt->answer is
> > a char*, while GDALSetRasterNoDataValue() expects a double. There are
> > several ways to convert a char* to a double; atof() may be the
> > simplest to implement, but is it the most correct?
> 
> No idea. If you can't think if a more correct way, that is it I'm 
> afraid. You're definitively more of an expert on that than I am.

I don't have any particualar opinion on what's correct behaviour for
the module (choosing the easiest solution to code isn't a correct
design methodology, but that's a separate issue).

> >> and 0 as default nodata value for UInt32?
> >
> > No; 0 is a valid data value. Use +2147483648, as that will never occur
> > in the input data. Or even any value >= 2147483648.
> 
> Values >= 2147483648 may well occur in the input data if they are FCELL 
> or DCELL.

ANYTHING may well occur in the input data if it's DCELL.

> >> Getting tired of this UInt32 example, but again: 
> >> (unsigned int) 0x80000000 is 2^31, yes?
> >
> > Yes.
> >
> >> Then this value would be right 
> >> in the middle of the range of UInt32, and can cause data loss when 
> >> exporting a FCELL raster as UInt32 .
> >
> > In that case, use (double) 0xFFFFFFFFU, which will never exist in CELL
> > or FCELL data.
> 
> (double) 0xFFFFFFFFU = 4294967295 = 2^32 - 1 can not exist in FCELL or 
> DCELL data?

It can exist in DCELL. double is a superset of int; any 32-bit integer
(signed or unsigned) can be exactly represented as a double.

But float (i.e. FCELL) only has a 24-bit mantissa, so it can't exactly
represent every 32-bit integer. Specifically, once an integer exceeds
24 bits, it will be rounded to a multiple of 2, then of 4, 8 etc as
the number of bits increases.

> > Maybe the default should depend upon the input type?
> IMHO, the output type should be considered as well when selecting a 
> default nodata value. As above, what I want to get is a GDAL metadata 
> nodata value that is identical to the value assigned to NULL cells for 
> the given GDAL datatype.
> 
> I suggest to
> 
> ignore the GRASS raster type, use the actual GRASS raster data range. If 
> the selected GDAL datatype can not hold the range present in the GRASS 
> raster (raster_min < type_min || raster_max > type_max), print helpful 
> information and abort export, force proceed with -f flag. (Already 
> implemented)

I don't see the point in treating range limits differently to
precision limits. Either the output type is a superset of the input
type (lossless export) or it isn't (lossy export).

> Exporting FCELL or DCELL as some integer type is up to the user, only 
> range is checked, no precision warning. (Status quo)
> 
> custom nodata value:
> make sure the GDAL metadata nodata value is identical to the value 
> assigned to NULL cells. Abort export on mismatch, force proceed with -f 
> flag. (Already implemented)
> 
> use the actual GRASS raster data range to determine a default nodata 
> value for the given GDAL datatype:
> if (type_min < raster_min)
> nodata = type_min;
> else if (type_max > raster_max)
> nodata = type_max;
> else {
> /* presence of NULL cells yet unknown, just warn */
> G_warning(_("Could not determine a safe default nodata value, using 
> type_min"));
> nodata = type_min;
> }
> For GDT_Int32, type_min would be -2147483648 = (GDT_Int32) 0x80000000
> Not implemented, currently using type_min for integer types and NaN for 
> floating point types. For floating point types (GDT_Float32 and 
> GDT_Float64), maybe stick to NaN or use nodata = raster_min - 1 ?

When exporting to FP types, the default should probably be NaN (if
supported), regardless of whether the source is integer or FP.

Exporting CELL to GDT_Float32 should print a warning if the range of
the input data exceeds +/- 2^24, as it cannot be stored exactly.

-- 
Glynn Clements <glynn at gclements.plus.com>


More information about the grass-dev mailing list