[GRASS-dev] Re: [GRASS GIS] #1161: g.region and r.info decimel issue when using grass python libs

Thu Sep 30 18:19:12 EDT 2010

#1161: g.region and r.info decimel issue when using grass python libs
-------------------------+--------------------------------------------------
  Reporter:  isaacullah  |       Owner:  grass-dev@…              
      Type:  defect      |      Status:  closed                   
  Priority:  normal      |   Milestone:  6.4.1                    
 Component:  Python      |     Version:  6.4.0                    
Resolution:  invalid     |    Keywords:                           
  Platform:  All         |         Cpu:  All                      
-------------------------+--------------------------------------------------

Comment(by glynn):

 Replying to [comment:3 cmbarton]:

 > > These functions simply parse the (decimal) output from g.region and
 r.info. Python has printf-like formatting operations if you wish to use
 them.
 >
 > Actually this is not what seems to be happening.

 Yes it is.

 > g.region and r.info produce single precision values, as expected.

 g.region and r.info produce '''decimal''' values using
 G_format_{northing,easting}, which uses either %.15g or %.8f (except for
 lat/lon, which uses DMS). Both of these are better than IEEE single-
 precision which has between 6 and 7 decimal digits. %.15g uses 15 decimal
 digits (trailing zeros after the decimal pointer are omitted, as is the
 decimal point itself if it is not required); %.8f uses as many digits as
 are required before the decimal point and a further 8 digits after it.

 > But the python library functions do not seem to be getting values from
 these--

 The Python functions are wrappers around "g.region -g" and "r.info
 -rgstmpud", which parse the output into a dictionary, with the strings
 parsed using float(), int() or float_or_dms() as appropriate.

 > or are doing something strange with the values after the fact

 Yes; if by "strange" you mean converting them to (double precision) binary
 floating point values (which is a lossy operation; 10^-n^ (for n >= 1)
 isn't exactly representable in binary).

 OTOH, that isn't all that strange, given that the values started out as
 floating point before g.region/r.info converted them to decimal (which
 itself may be lossy; %.15g isn't quite enough for double precision, which
 has slightly better than 15 decimal digits of precision).

 > --in order to come up with double precision values. The result is that
 the values in the dictionary produced by grass.region() and
 grass.raster_info() are *different* from the values that come from
 g.region or r.info. Therein lies the problem.

 The values which come from g.region or r.info are '''strings''', each
 comprising a decimal representation of a number. Most of the things which
 you might want to do with that information will expect numbers rather than
 strings, so the Python functions convert them to numbers automatically.

 We could use Python's "decimal" package, although that doesn't work with
 everything, still doesn't necessarily give you the original value, and
 serves no purpose other than to work around bugs in scripts which expect
 to be able to perform floating-point comparisons using "==" or (worse
 still) string comparison. But if someone is making that kind of mistake,
 they will have far bigger problems.

 If you really need the exact output from g.region/r.info, use
 grass.parse_command() (which will parse key/value output into a dictionary
 but will leave the values as strings). But don't expect other commands to
 return identical strings for the same information; there is no one
 "correct" format string for coordinates.

 > A region set using g.region is different from a region set using
 grass.region(). The difference is not much

 In the example give, it's around 10 microns. I'm not convinced that there
 is a single set of geospatial data in existence which genuinely has that
 accuracy.

 > but it is enough to cause problems if you are comparing regions in a
 boolean way

 Which is a bug, and not one which will be solved by any changes to the
 Python library. Any program which parses the output from g.region or
 r.info will have exactly the same issues.

 > or trying to overlay maps created with a setting in g.region and maps
 created with a setting from grass.region().

 Even on the largest map, the differences are nowhere near half a cell,
 which is what would be required to move the sample point into the next
 cell.

 > My only guess is that somehow grass.region() is populating its
 dictionary via a swig/ctype call instead of just parsing g.region.

 It's just parsing the output from "g.region -g" via Python's float()
 operator:

 http://trac.osgeo.org/grass/browser/grass/trunk/lib/python/core.py#L525

 http://trac.osgeo.org/grass/browser/grass/trunk/lib/python/core.py#L485

 > If this guess is wrong, then something else is happening to the values
 after they are generated by g.region and before they go into the python
 dictionary.

 The only "something else" is that g.region() parses the decimal string to
 a float, and "print" converts it back to a decimal string. Both of these
 operations are lossy. But then just about anything which you do with a
 floating-point value is lossy, including parsing the values from the
 WIND/cellhd file in the first place.

 Parsing a decimal string to a floating-point value is inherently lossy.
 Converting a floating-point value to a decimal isn't inherently lossy but
 in practice you invariably use far fewer digits than are required for an
 exact representation, as the exact representation requires roughly 3 times
 as many digits as are necessary for a unique representation.

-- 
Ticket URL: <http://trac.osgeo.org/grass/ticket/1161#comment:4>
GRASS GIS <http://grass.osgeo.org>