[PROJ] RFC4: last chance to comment before motion

Greg Troxel gdt at lexort.com
Mon Jan 6 07:07:56 PST 2020


Even Rouault <even.rouault at spatialys.com> writes:

> On lundi 6 janvier 2020 09:08:31 CET Greg Troxel wrote:
>> Even Rouault <even.rouault at spatialys.com> writes:
>> > Regarding how files are managed on the CDN, the idea is that a given grid
>> > identified by a filename is only updated if it contains errors (in its
>> > data
>> > or metadata). Which is different from releasing an improved version of a
>> > model, for example the successive generations of the USA, Australia or New
>> > Zealand geoid models that have each their own filename.
>> 
>> I am really uncomfortable with a single name mapping to variable data.
>> This seems very much like replacing a software tarball for
>> foo-1.2.tar.gz with a different version.
>
> Data producers do themselves not adhere to super strict versionning. See the 
> warning notice at top of https://www.ngs.noaa.gov/GEOID/GEOID18/ : the wrong 
> files have just been replaced with corrected files.
>
> Other example: recently I realized that the NAD83 HARN grids for American 
> Samoa were completely wrong: there was an error on the sign of the latitude in 
> the extent of the file, so past grids were completely unusable.

I think them replacing without versions is a bug (they are geodesists,
not software engineers), but I realize that's fighting city hall.

Even in the HARN case, I think it's good for people to be able to know
that "foo-1.0" is broken and "foo-1.0.1" is what they should use,
instead of "I have foo-1.0; is that the ok version or not?"

>> What is driving the need to not change the name in some way?   It would
>> seem that some sort of micro-version could be implemented for changes.
>
> The proj.db, that is attached to a given release of PROJ, references for 
> example a file "foo.tif". Which behaviour would we want ? That it corresponds 
> to the version of "foo.tif" at the time the PROJ release was made (possibly 
> with errors corrected later), or the latest version of foo.tif ? Currently 
> that's the latest option which is implemented.
> If we want super strict versionning, that would mean that we would never 
> change a file that has been released, and if we need to do fixes, we would 
> then append a v2, v3 etc. to the filename, *and* update proj.db to point to 
> the latest version. Do we really need/want that ?
> As you say later, one advantage of that approach is that once a file is 
> downloaded, we'd know we don't need to possibly refresh it.
> A downside is that it requires a bit of changing (but nothing dramatic) in the 
> creation process of static snapshots, as we won't be able to just create a zip 
> from the content of the git repository, but we'd some logic to only put in the 
> zip the latest version of the file (there's no point in creating a new archive 
> with obsolete versions of the files.

I would think the old files could be "git rm"ed, and the files built
from them for distribution could remain.  This is much like the old proj
release sources still being on the download area, even though the
sources have long been changed.

> it would be like proposing an option to 
> emulate past bugs of the code. you may download an older version of the 
> package if you want to replicate old bugs, but there's generally no point, or 
> it comes at greater maintainance cost, in providing a way to replicate the 
> past wrong behaviour in the new release)

I see using an older file as exactly like using an older proj version.
We have the old verions of proj-datumgrids on the download site.   My
point really is that just because there is an additional method to
obtain the bits, the basic semantics shouldn't change.

>> There's also the issue of formats changing.  Right now grids are
>> released more or less with proj, and there is a notion that proj works
>> with those grids. But proj released this week might not work with grids
>> that are released in 3 years.  That's fine - that's not a reasonable
>> expectation.
>
> If in the future we decided to adopt HDF5 or whatever, we could just let the 
> old .tif files on the CDN and things would just work fine.

Sure.  I meant that there is an issue if an older proj then starts to
download files that are newer.  But I see your point that the files are
referenced from proj.db, so this wotn' happen.

>> Also, the directory of available files could have a hash, allowing
>> validation of downloaded/stored files.
>
> Yes, we could publish .md5 side car files for each file on the CDN if that may 
> help.


I am probably a minority opinion here, but I am coming at this from a
packaging and a configuration management viewpoint, which is probably
not the norm.  Thanks for listening.


More information about the PROJ mailing list