[Gdal-dev] OGR returns wrong floating values for shapefiles (and integer as real, in error)

Tue Oct 24 07:55:03 EDT 2006

On Tue, 24 Oct 2006, Maciej Sieczka wrote:

> Roger Bivand wrote:
> > On Tue, 24 Oct 2006, Maciej Sieczka wrote:
> 
> >> ogrinfo (and other apps based on OGR, eg. OpenEV, QGIS) returns wrong
> >> floating point values querying my shapefiles. Eg.:
> >>
> >> $ ogrinfo -al -q streams.shp
> >>
> >> Layer name: streams
> >> OGRFeature(streams):523
> >>   CAT (Real) =         484
> >>   LCAT (Real) =          73
> >>   Z (Real) =      101.583309999999997
> >>   Z_BREACH (Real) =      101.583309999999997
> >>   Z_BREACH1 (Real) =      100.583309999999997
> >>   LENGTH (Real) =        2.036246000000000
> >>   LINESTRING (598549.144524969975464 5677309.376777020283043,598550.0
> >> 5677311.224603090435266)
> >>
> >>
> >> After opening the dbf in oocalc 2.03, I can see the values should
> >> recpectively be:
> >>
> >> CAT	  484
> >> LCAT	  73
> >> Z	  101.58331
> >> Z_BREACH  101.58331
> >> Z_BREACH1 100.58331
> >> LENGTH	  2.036246
> >>
> >> Why the spurious "09999999997" in case of Z, Z_BREACH, Z_BREACH1 in
> >> OGR? Note that, interestingly, LENGTH is OK though.
> 
> > Not spurious, just two different decimal "views" of the same underlying 
> > floating-point value, see e.g. David Goldberg (1991), What Every Computer 
> > Scientist Should Know About Floating-Point Arithmetic, ACM Computing 
> > Surveys, 23/1, 548, also available via 
> > http://docs.sun.com/source/806-3568/ncg_goldberg.html.
> 
> <snip>
> 
> > so your first issue is simply that floating point numbers have fuzz,
> > which the application may shave or not.
> 
> Why doesn't the OGR "shave the fuzz" then? That would make sense I
> think - data easier to read and parse, less for the OGR to print, the
> user not puzzled. If it shouldn't - then why?

I can't see the printf() in the source, but it is an arbitratry formatting 
decision, I guess influenced by the field width.

> 
> >> Morevover, CAT and LCAT are not Real numbers. They are integer. Why
> >> reported as real?
> 
> > That will depend on the functions reading the underlying DBF, I see both 
> > all reals or a mixture in several shapefiles. It may be that when 
> > "integer" precision is non-zero, it may be being taken as real?
> 
> > In R:
> > 
> > library(rgdal)
> > ogrInfo("streams", "streams")$iteminfo
> > 
> > says:
> > 
> > $name
> > [1] "CAT"       "LCAT"      "Z"         "Z_BREACH"  "Z_BREACH1" "LENGTH"   
> > 
> > $precision
> > [1] 2 2 2 2 2 2
> 
> Hmm, this looks wrong (?) - my CAT and LCAT *do* have a zero precision.
> So I'd expect them to be treated as integer. And the other 4 columns
> have 15 decimal places precision, not 2. Open the dbf in a spreadsheet
> to see that:
> 
> CAT,N,11,0
> LCAT,N,11,0
> Z,N,24,15
> Z_BREACH,N,24,15
> Z_BREACH1,N,24,15
> LENGTH,N,24,15
> 
> > $length
> > [1] 11 11 24 24 24 24
> 
> This seems correct.

But it isn't what OGR is seeing. And since the DBF header is binary, we 
can't see directly what is in the file (I can see something with od, but I 
can't interpret enough of it to make sense). The file does though contain 
verbatim fuzz:

0000340  \r                                       4   8   4            
0000360                           7   3                       1   0   1
0000400   .   5   8   3   3   0   9   9   9   9   9   9   9   9   9   7
0000420                       1   0   1   .   5   8   3   3   0   9   9
0000440   9   9   9   9   9   9   9   7                       1   0   0
0000460   .   5   8   3   3   0   9   9   9   9   9   9   9   9   9   7
0000500                               2   .   0   3   6   2   4   6   0
0000520   0   0   0   0   0   0   0   0

Roger

> 
> Maciek
> 

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no