[PROJ] RFC4: last chance to comment before motion
Even Rouault
even.rouault at spatialys.com
Mon Jan 6 06:59:16 PST 2020
On lundi 6 janvier 2020 09:08:31 CET Greg Troxel wrote:
> Even Rouault <even.rouault at spatialys.com> writes:
> > Regarding how files are managed on the CDN, the idea is that a given grid
> > identified by a filename is only updated if it contains errors (in its
> > data
> > or metadata). Which is different from releasing an improved version of a
> > model, for example the successive generations of the USA, Australia or New
> > Zealand geoid models that have each their own filename.
>
> I am really uncomfortable with a single name mapping to variable data.
> This seems very much like replacing a software tarball for
> foo-1.2.tar.gz with a different version.
Data producers do themselves not adhere to super strict versionning. See the
warning notice at top of https://www.ngs.noaa.gov/GEOID/GEOID18/ : the wrong
files have just been replaced with corrected files.
Other example: recently I realized that the NAD83 HARN grids for American
Samoa were completely wrong: there was an error on the sign of the latitude in
the extent of the file, so past grids were completely unusable.
> What is driving the need to not change the name in some way? It would
> seem that some sort of micro-version could be implemented for changes.
The proj.db, that is attached to a given release of PROJ, references for
example a file "foo.tif". Which behaviour would we want ? That it corresponds
to the version of "foo.tif" at the time the PROJ release was made (possibly
with errors corrected later), or the latest version of foo.tif ? Currently
that's the latest option which is implemented.
If we want super strict versionning, that would mean that we would never
change a file that has been released, and if we need to do fixes, we would
then append a v2, v3 etc. to the filename, *and* update proj.db to point to
the latest version. Do we really need/want that ?
As you say later, one advantage of that approach is that once a file is
downloaded, we'd know we don't need to possibly refresh it.
A downside is that it requires a bit of changing (but nothing dramatic) in the
creation process of static snapshots, as we won't be able to just create a zip
from the content of the git repository, but we'd some logic to only put in the
zip the latest version of the file (there's no point in creating a new archive
with obsolete versions of the files. it would be like proposing an option to
emulate past bugs of the code. you may download an older version of the
package if you want to replicate old bugs, but there's generally no point, or
it comes at greater maintainance cost, in providing a way to replicate the
past wrong behaviour in the new release)
> There's also the issue of formats changing. Right now grids are
> released more or less with proj, and there is a notion that proj works
> with those grids. But proj released this week might not work with grids
> that are released in 3 years. That's fine - that's not a reasonable
> expectation.
If in the future we decided to adopt HDF5 or whatever, we could just let the
old .tif files on the CDN and things would just work fine.
> Also, the directory of available files could have a hash, allowing
> validation of downloaded/stored files.
Yes, we could publish .md5 side car files for each file on the CDN if that may
help.
--
Spatialys - Geospatial professional services
http://www.spatialys.com
More information about the PROJ
mailing list