[PROJ] Preferred grid format for transformations?

Even Rouault even.rouault at spatialys.com
Mon Nov 25 14:44:58 PST 2019


On lundi 25 novembre 2019 23:16:14 CET Martin Desruisseaux wrote:
> Le 25/11/2019 à 23:04, Even Rouault a écrit :
> > (...snip...), a ".nc" being netCDF 3 would work with a hand written
> > netCDF v3 reader, but suddenly people would want to use another ".nc"
> > that happens to be netCDF v4/ HDF5 and that wouldn't work.
> 
> But could this problem be mitigated by an explicit error message of the
> kind "The use of this datum shift grid file require PROJ to be compiled
> with option +NetCDF4"? (the NetCDF version can be identified by magic
> numbers).

That would be indeed an obvious thing to do, but users would still be stuck. 
Anyway, a hand-written reader would probably useless, as I would expect most 
binary packagers to consider building against the netcdf library if PROJ could 
be built against it.

However, let me try to summarize the major cons of a netCDF approach for the 
prespective of PROJ that I can anticipate:

- the C netCDF API, be it v3 or v4, does not have a pluggable I/O layer in 
which we could put our own I/O layer, so it cannot be used to access remote 
grids. The libhdf5 API has such a layer (used by GDAL for example). So that 
would restrict to netCDF v4/HDF5 only files using libhdf5. Or netCDF v3 would 
have to be addressed by a hand-written reader. Not an exciting perspective

- libnetcdf and/or libhdf5 are unsecurity tested, at least as far as I know. 
Personal past experience revealed that this is not just a theoretical problem 
and crashes or worse happen on corrupted/hostile files. Upstream didn't really 
seem interested in adressing them at the time I tried to bring that to their 
attention (which I can understand. Being one of the libtiff co-maintainers, 
this is a unsexy and sometimes challenging job). This is a serious problem 
given than we plan to access grids stored on HTTP. Even if in the default 
configuration, we should control what is accessed too, I don't think it would 
be a great service to put our users to a potential risk.

- HDF5 is indeed much more powerful than GeoTIFF. We would have to restrict 
even more severely a profile than I proposed to do with GeoTIFF to avoid 
having to deal with crazy formulations. Actually with my hat of GDAL developer 
on, this very flexibility of HDF5 is a serious problem because data producers 
tend to follow their own personal inspiration of how to structure the data, 
and in particular interoperability of geoferencing encodings is close to null. 
Try to open a random gridded .hdf5 file with gdalinfo, and in most cases, 
you'll get no SRS or geotransformation matrix.

- the cloud-friendliness of HDF5 is unknown to me. On the contrary, I know 
that COG is a technology used heavily.


For 3D (time,long,lat) indexing, I'd assume the number of values along the 
time axis not to be too large (that would be great to have some feedback for 
Brian, Chris or other potential data producers on that). In that case, one 
TIFF IFD per time slice should be able to address that.


Even


-- 
Spatialys - Geospatial professional services
http://www.spatialys.com


More information about the PROJ mailing list