[GRASS-dev] Re: [GRASS GIS] #1161: g.region and r.info decimel
issue when using grass python libs
GRASS GIS
trac at osgeo.org
Thu Sep 30 18:19:12 EDT 2010
#1161: g.region and r.info decimel issue when using grass python libs
-------------------------+--------------------------------------------------
Reporter: isaacullah | Owner: grass-dev@…
Type: defect | Status: closed
Priority: normal | Milestone: 6.4.1
Component: Python | Version: 6.4.0
Resolution: invalid | Keywords:
Platform: All | Cpu: All
-------------------------+--------------------------------------------------
Comment(by glynn):
Replying to [comment:3 cmbarton]:
> > These functions simply parse the (decimal) output from g.region and
r.info. Python has printf-like formatting operations if you wish to use
them.
>
> Actually this is not what seems to be happening.
Yes it is.
> g.region and r.info produce single precision values, as expected.
g.region and r.info produce '''decimal''' values using
G_format_{northing,easting}, which uses either %.15g or %.8f (except for
lat/lon, which uses DMS). Both of these are better than IEEE single-
precision which has between 6 and 7 decimal digits. %.15g uses 15 decimal
digits (trailing zeros after the decimal pointer are omitted, as is the
decimal point itself if it is not required); %.8f uses as many digits as
are required before the decimal point and a further 8 digits after it.
> But the python library functions do not seem to be getting values from
these--
The Python functions are wrappers around "g.region -g" and "r.info
-rgstmpud", which parse the output into a dictionary, with the strings
parsed using float(), int() or float_or_dms() as appropriate.
> or are doing something strange with the values after the fact
Yes; if by "strange" you mean converting them to (double precision) binary
floating point values (which is a lossy operation; 10^-n^ (for n >= 1)
isn't exactly representable in binary).
OTOH, that isn't all that strange, given that the values started out as
floating point before g.region/r.info converted them to decimal (which
itself may be lossy; %.15g isn't quite enough for double precision, which
has slightly better than 15 decimal digits of precision).
> --in order to come up with double precision values. The result is that
the values in the dictionary produced by grass.region() and
grass.raster_info() are *different* from the values that come from
g.region or r.info. Therein lies the problem.
The values which come from g.region or r.info are '''strings''', each
comprising a decimal representation of a number. Most of the things which
you might want to do with that information will expect numbers rather than
strings, so the Python functions convert them to numbers automatically.
We could use Python's "decimal" package, although that doesn't work with
everything, still doesn't necessarily give you the original value, and
serves no purpose other than to work around bugs in scripts which expect
to be able to perform floating-point comparisons using "==" or (worse
still) string comparison. But if someone is making that kind of mistake,
they will have far bigger problems.
If you really need the exact output from g.region/r.info, use
grass.parse_command() (which will parse key/value output into a dictionary
but will leave the values as strings). But don't expect other commands to
return identical strings for the same information; there is no one
"correct" format string for coordinates.
> A region set using g.region is different from a region set using
grass.region(). The difference is not much
In the example give, it's around 10 microns. I'm not convinced that there
is a single set of geospatial data in existence which genuinely has that
accuracy.
> but it is enough to cause problems if you are comparing regions in a
boolean way
Which is a bug, and not one which will be solved by any changes to the
Python library. Any program which parses the output from g.region or
r.info will have exactly the same issues.
> or trying to overlay maps created with a setting in g.region and maps
created with a setting from grass.region().
Even on the largest map, the differences are nowhere near half a cell,
which is what would be required to move the sample point into the next
cell.
> My only guess is that somehow grass.region() is populating its
dictionary via a swig/ctype call instead of just parsing g.region.
It's just parsing the output from "g.region -g" via Python's float()
operator:
http://trac.osgeo.org/grass/browser/grass/trunk/lib/python/core.py#L525
http://trac.osgeo.org/grass/browser/grass/trunk/lib/python/core.py#L485
> If this guess is wrong, then something else is happening to the values
after they are generated by g.region and before they go into the python
dictionary.
The only "something else" is that g.region() parses the decimal string to
a float, and "print" converts it back to a decimal string. Both of these
operations are lossy. But then just about anything which you do with a
floating-point value is lossy, including parsing the values from the
WIND/cellhd file in the first place.
Parsing a decimal string to a floating-point value is inherently lossy.
Converting a floating-point value to a decimal isn't inherently lossy but
in practice you invariably use far fewer digits than are required for an
exact representation, as the exact representation requires roughly 3 times
as many digits as are necessary for a unique representation.
--
Ticket URL: <http://trac.osgeo.org/grass/ticket/1161#comment:4>
GRASS GIS <http://grass.osgeo.org>
More information about the grass-dev
mailing list