[PROJ] RFC4: last chance to comment before motion

Even Rouault even.rouault at spatialys.com
Mon Jan 6 06:59:16 PST 2020


On lundi 6 janvier 2020 09:08:31 CET Greg Troxel wrote:
> Even Rouault <even.rouault at spatialys.com> writes:
> > Regarding how files are managed on the CDN, the idea is that a given grid
> > identified by a filename is only updated if it contains errors (in its
> > data
> > or metadata). Which is different from releasing an improved version of a
> > model, for example the successive generations of the USA, Australia or New
> > Zealand geoid models that have each their own filename.
> 
> I am really uncomfortable with a single name mapping to variable data.
> This seems very much like replacing a software tarball for
> foo-1.2.tar.gz with a different version.

Data producers do themselves not adhere to super strict versionning. See the 
warning notice at top of https://www.ngs.noaa.gov/GEOID/GEOID18/ : the wrong 
files have just been replaced with corrected files.

Other example: recently I realized that the NAD83 HARN grids for American 
Samoa were completely wrong: there was an error on the sign of the latitude in 
the extent of the file, so past grids were completely unusable.

> What is driving the need to not change the name in some way?   It would
> seem that some sort of micro-version could be implemented for changes.

The proj.db, that is attached to a given release of PROJ, references for 
example a file "foo.tif". Which behaviour would we want ? That it corresponds 
to the version of "foo.tif" at the time the PROJ release was made (possibly 
with errors corrected later), or the latest version of foo.tif ? Currently 
that's the latest option which is implemented.
If we want super strict versionning, that would mean that we would never 
change a file that has been released, and if we need to do fixes, we would 
then append a v2, v3 etc. to the filename, *and* update proj.db to point to 
the latest version. Do we really need/want that ?
As you say later, one advantage of that approach is that once a file is 
downloaded, we'd know we don't need to possibly refresh it.
A downside is that it requires a bit of changing (but nothing dramatic) in the 
creation process of static snapshots, as we won't be able to just create a zip 
from the content of the git repository, but we'd some logic to only put in the 
zip the latest version of the file (there's no point in creating a new archive 
with obsolete versions of the files. it would be like proposing an option to 
emulate past bugs of the code. you may download an older version of the 
package if you want to replicate old bugs, but there's generally no point, or 
it comes at greater maintainance cost, in providing a way to replicate the 
past wrong behaviour in the new release)

> There's also the issue of formats changing.  Right now grids are
> released more or less with proj, and there is a notion that proj works
> with those grids. But proj released this week might not work with grids
> that are released in 3 years.  That's fine - that's not a reasonable
> expectation.

If in the future we decided to adopt HDF5 or whatever, we could just let the 
old .tif files on the CDN and things would just work fine.

> Also, the directory of available files could have a hash, allowing
> validation of downloaded/stored files.

Yes, we could publish .md5 side car files for each file on the CDN if that may 
help.

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com


More information about the PROJ mailing list