[PROJ] Switch to proj-datumgrid-geotiff for PROJ 7 ?

Greg Troxel gdt at lexort.com
Mon Jan 13 08:10:06 PST 2020


Even Rouault <even.rouault at spatialys.com> writes:

> This is one of the last major topic linked to RFC4: should we switch to 
> GeoTIFF grids as the prime format for the grids we deliver alongside the 
> software? While there is no strong technical requirement to do so, it could be 
> the right moment to do it, or we might prefer postponing that to PROJ 8.
>
> Currently RFC4 implementation continues to refer to .gtx / .gsb files in the 
> grid_alternatives database table. When network access if allowed and needed, 
> the code patches the extension to access a .tif file on the CDN.

I think it's really important for the RFC4 implementation to end up
being an alternative mechanism to get the exact same bits that could
have been packaged.  As a test, this means that someone who has all the
grids in the entire set of released grid files, should have no negatives
and no different outcomes, compared to having used RFC4 instead.

I also think it's important that people not choosing to use RFC4 not be
second-class in their proj experience in any way.

So if the RFC4 world is tif (which is not objectionable; it seems
obvious we are talking a container format for the same bits -- but tell
me if I got that wrong), then it really seems simpler to just have that
as the format.

> It would probably be cleaner to avoid this patching and having 
> grid_alternatives directly point to .tif files. This would also enable us to 
> put https://github.com/osgeo/proj-datumgrid in a pure maintainance state and 
> just make https://github.com/osgeo/proj-datumgrid-geotiff the new home for 
> grids (currently the later has to be resynchronized with the former)

You say "maintenance", but would there be new releases of packages
derived from proj-datumgrid, for example for the benefit of proj5/6
users that have no upgraded?  Or do you mean "it will just sit there as
an archive"?

> Currently:
> proj-datumgrid: total size: 703 MB as 5 .zip and 1.5 GB uncompressed
> proj-datumgrid-geotiff: total size: 486 MB

Do you expect the sizes for the same data to be different?  It seems
obvious that every file in the old directory needs to be transformed to
tif and put in the new one -- but again I may be missing something.

> One potential issue with switching to GeoTIFF files is for people having 
> pipeline strings using the current grid names in them. But similarly to what 
> is currently done for remote access, for local access, if we fail to find a 
> .gtx/.gsb file, we could just retry with the .tif extension.

Basically, you are proposing to consider a grid name to not have an
extension, and that seems sensible

> If we switch to proj-datumgrid-geotiff, would we still split the content in 
> several archives (and if so, according to which logic...) ? Having a single 
> archive would be for sure the simplest solution. Some feedback from packagers 
> would be welcome.

A good question.  With the release of the grid archives associated with
6.3.0, the sizes have really grown.  I tend to think that not having
grids installed is a bug (given that proj will get different answers, as
opposed to throwing a missing grid file exception).  So, the proj6
package I am working on has all of the released grids.  But, as grids
get bigger (which seems expected over time), this becomes less
reasonable, and eventually I would think there would be separate
packages.

It would be good if all packaging systems that have multiple packges for
grids had the same split and more or less the same names.  So that means
that the proj project should define the split, as it does now.

So if anything, I would think the repo should be split up into more
archives.  The current regions seem sensible, and then there perhaps is
another axis of normal things vs. esoteric things.   Right now I can't
articulate that and I am not sure that makes sense.

So for now, I would advise not changing the archive split plan, until we
have a good basis for believing that some other plan is good.

> A bit linked to the above, if we switch to proj-datumgrid-geotiff, in the 
> grid_alternatives table, instead of having each grid entry pointing to a 
> package (like "proj-datumgrid-europe", which itself points to a URL of a .zip 
> archive), we could just point to the exact file on the CDN rather than the 
> archive.

A caution about "CDN".  There is nothing magic, just a way to make it
more efficient for people to download things.  I think we are really
just talking about a collection of files on a web server (that happens
to be a distributed system with spiffy caching).

It seems sensible for the database of what grid files are needed to have
a single namespace, and that one can get them either via download or by
unpacking them (bit for bit identical).

> If we adopt proj-datumgrid-geotiff, I'd also willing to add a proj-datumgrid-
> download utility that would download, in the user-writable directory (or 
> system directory), files from the CDN based on criteria such as bounding box, 
> producer name, country of the producer, etc...

As opposed to them being downloaded because someone asked for a
transform that chose them?  It seems really clear to me that these sorts
of asked-for operations are entirely necessary for the whole system to
make sense.  Surely there would be download by name.  It would be really
nice if one could ask "show me the transform pipelines that this request
would invoke" and also to get from that "this is the list of grids you
need and don't have for one or all" and then to be able to get them.

So what if someone has not enabled RFC4, and asks for a transform that
would use a grid if it were there.  Instead of downloading dynamically,
what happens?  I have always wanted that to throw an exception, unless
the user has disabled something, but I know i'm way on one side on the
"consistent outputs" side of things.




Overall, this makes me think that 7 is being rushed, or that the RFC4
stuff is being rushed into 7.  I don't really understand what's driving
a fixed release date for a major branch, and I don't understand why
having that fixed date instead of an approach of getting the RFC4 stuff
done and out for beta and we'll see how it goes.

But, there is virtually zero chance of me packaging 7.0, as opposed to
some later 7.x version.


More information about the PROJ mailing list