[PROJ] Is 16-bit quantization of values / (sub-)millimetric error in grids (sometimes) acceptable ?

Even Rouault even.rouault at spatialys.com
Sun Dec 8 07:10:08 PST 2019


Hi,

I'm experimenting with encoding grids with 16-bit integer values, with an 
offset and scale, instead of using IEEE 32-bit floats. There's some connection 
with my previous thread about accuracies of NTv2...

Let's for example take the example of egm08_25.gtx. It is 149.3 MB large
If converting it into a IEEE 32-bit float deflate-compressed (with floating-
point predictor) tiled geotiff, it becomes 80.5 MB large (the compression 
method and floatin-point predictor are fully lossless. There's bit-to-bit 
equivalence for the elevation values regarding the original gtx file)
Now observing that the range of values is in [-107,86], I can remap that to 
[0,65535] with an offset of -107 and scale of (86 - -107) / 65535 ~= 0.0029
The resulting deflate-compressed (with integer predictor) tiled GeoTIFF is now 
23.1 MB large !

Looking at the difference between the unscaled quantized values and the 
original ones, the error is in the [-1.5 mm, 1.5 mm] range (which is expected, 
being the half of the scale value), with a mean value of 4.5e-6 metre (so 
centered), and a standard deviation of 0.85 mm

After that experimentation, I found this interesting page of the GeographicLib 
documentation
https://geographiclib.sourceforge.io/html/geoid.html
which compares the errors introduced by the gridding itself and interpolation 
methods (the EGM model is originally a continuous model), with/without 
quantization. And one conclusion is "If, instead, Geoid were to use data files 
without such quantization artifacts, the overall error would be reduced but 
only modestly". Actually with the bilinear interpolation we use, the max and 
RMS errors with and without quantization are the same... So it seems perfectly 
valid to use such quantized products, at least for EGM2008, right ?

Now looking at horizontal grids, let's consider Australia's 
GDA94_GDA2020_conformal.gsb. It is 83 MB large (half of this size due to the 
error channels, which are set to a dummy value of -1...)
Converting it to a compressed tiled Float32 tif (without those useless error 
channels), make it down to 4.5 MB.
And as a quantitized uint16 compressed tif, down to 1.4 MB (yes, almost 60 
times smaller than original .gsb file). The maximum scale factor is 1.5e-7 
arcsecond, hence a maximum error of 2.3 micrometre... I'm pretty sure we're 
several order of magnitudes beyond the accuracy of the original model, right ?
In EPSG this transformation is reported to have an accuracy of 5cm.
The fact that we get such a small scale factor is due to GDA94 -> GDA2020 
conformal being mostly a uniform shift of ~1.8 m and that the grids is 
mentioned to "Gives identical results to Helmert transformation GDA94 to 
GDA2020 (1)"

If we look at the France' ntf_r93.gsb, which has shifts of an amplitude up to 
130m, the maximum error introduced by the quantization is 0.6 mm. I would tend 
to think this is also acceptable (given the size of that particular file is 
small, compression gains are quite neglectable, but this is mostly to look if 
we can generalize such mechanism). What puzzles me is that in
https://geodesie.ign.fr/contenu/fichiers/documentation/algorithmes/notice/
NT111_V1_HARMEL_TransfoNTF-RGF93_FormatGrilleNTV2.pdf where they compare the  
NTv2 approach regarding their native 3D geocentric correction approach, they 
underline in red a sample point where the difference between the 2 models is 
1.2 mm, as if it had really some importance. For that test point, using the 
quantized approach would increase this difference to 1.3 mm. But when looking 
at the accuracy reported in the grid at that point it is 1.6e-3 arc-second 
(which is the minimum value for the latitude error of the product, by the 
way), ie 5cm, so it seems to me that discussing about millimetric error 
doesn't make sense.
In EPSG this transformation is reported to have an accuracy of 1 metre (which 
is consistent with the mean value of the latitude shift error)

Now, let's look at the freshly introduced BWTA2017.gsb file. 392 MB large
As a Float32 compressed geotiff: 73 MB (5.4x compression rate)
As a Int16 compressed geotiff: 26 MB (15x compression rate)
Maximum error added by quantization for latitude shift: 0.25 mm
Minimum error value advertized for latitude shift: 1.61e-5 arc-second (not 
completely sure about the units...), ie 0.5mm
Mean error value advertized for latitude shift: 6.33e-5 arc-second, ie 1.9mm
Interestingly when looking at the ASCII version of the grid, the values of the 
shifts are given with a precision of 1e-6 arcseconds, that is 0.03 mm !

For Canadian NTv2_0.gsb, on the first subgrid, the quantization error is
0.9 mm for the latitude shift. The advertized error for latitude is in [0, 
13.35 m] (the 0.000 value is really surprising. reached on a couple points 
only), with a mean at 0.27 m and stddev at 0.48 m. In EPSG this transformation 
is reported to have an accuracy of 1.5 metre

~~~~~~~~~~~~~

So, TLDR, is it safe (and worth) to generate quantized products to reducte by 
about a factor of 2 to 3 the size of our grids compared to unquantized 
products, when the maximum error added by the quantization is ~ 1mm or less ? 
or will data producers consider we damage the quality of products they 
carefully crafted ? do some users need millimetric / sub-millimetric accuracy 
?

Or do we need to condition quantization to a criterion or a combination of 
criteria like:
- a maximum absolute error that quantization introduces (1 mm ? 0.1 mm ?)
- a maximum value for the ratio between the maximum absolute error that 
quantization introduces over the minimum error value advertized (when known) 
below some value ? For the BWTA2017 product, this ratio is 0.5. For 
ntf_r93.gsb, 0.012. For NTv2_0.gsb, cannot be computed given sothe min error 
value advertized is 0...
- or, variant of the above, a maximum value for the ratio between the maximum 
absolute that quantization introduces  over the mean of the error value 
advertized (when known). For the BWTA2017 product, this ratio is 0.13. For 
ntf_r93.gsb, 5.5e-4. For NTv2_0.gsb, 3.3e-3
- and perhaps consider that only for products above a given size (still larger 
than 10 MB after lossless compression ?)

~~~~~~~~~~~~~~~~

Jochem mentionned in the previous thread that the Netherlands grids have an 
accuracy of 1mm. I'm really intrigued by what that means. Does that mean that 
the position of control points used to build the grid is known at that 
accuracy, both in the source and target systems, and that when using bilinear 
interpolation with the grid, one remains within that accuracy ? Actually both 
the rdtrans2008.gsb and rdtrans2018.gsb grids report an accuracy of 1mm, but 
when comparing the positions corrected by those 2 grids, I get differences 
above 1mm.

echo "6 53 0" | cct -d 9 +proj=hgridshift +grids=./rdtrans2008.gsb
   5.999476046    52.998912757

echo "6 53 0" | cct -d 9 +proj=hgridshift +grids=./rdtrans2018.gsb
   5.999476020    52.998912753

echo "52.998912757 5.999476046 52.998912753 5.999476020" | geod -I 
+ellps=GRS80
-104d18'21.49"	75d41'38.51"	0.002

That's a difference of 2 mm. I get that difference on a few other "random" 
points.

If applying the quantization on rdtrans2018.gsb, we'd add an additional 
maximal error of 0.6 mm. The grid being 1.5 MB uncompressed, and 284 KB as a 
losslessly compressed TIFF, quantization isn't really worth considering 
(reduces the file size to 78 KB)

Even

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com


More information about the PROJ mailing list