[PROJ] PROJ grid files CDN

Greg Troxel gdt at lexort.com
Fri Sep 13 11:28:52 PDT 2019


Howard Butler <howard at hobu.co> writes:

>> On Sep 13, 2019, at 12:45 PM, Greg Troxel <gdt at lexort.com> wrote:
>> 
>>  I really don't understand the notion of incorrect results from
>>  missing shift files as other than a bug.
>
> That's why I scare quoted "incorrect", and Even's reply covers it
> thoroughly. If we were to error in the face of missing grids, the only
> thing a user without system access can do is to fully replicate
> PROJ_LIB somewhere locally and then copy in their own grids. It is
> massively inconvenient and easy to screw up.

I would think people could enable some PROJ_APPROXIMATE_OK environment
variable to proceed without grids.  Perhaps with a GUI helper.

>>  One issue with grids seems to be licensing.  I have not investigated
>> deeply, but it seems some have terms that make it difficult to
>> distribute, and some are non-Free (being an issue for Debian,
>> perhaps, an in pkgsrc requiring a non-Free license tag, sort of the
>> same thing).  Are all of the grids you are talking about able to be
>> distributed (verbatim) via no-cost internet downloads?
>
> I propose we wouldn't distribute anything via CDN that wouldn't meet
> Debian's notion of "free", and I would think that a distribution
> approach like this, if especially convenient, would encourage some of
> the licensing laggards of grids to follow along.

So what are you doing to do about grids that don't allow redistribution?
Some scheme to help people download from the original place?

What about grids that do not allow redistribution of modified copies?  I
would think there's a lot of that.  (The EPSG database seems non-Free,
but we aren't really dealing with that either.)

>> As part of a paid-for CDROM that aggregates many things?
>
> If you were to carefully inspect some of the paid-for CDROMs being
> distributed today, you're likely going to find these grid files
> regardless of the specific licensing language on some specific grids.

What people do contrary to terms is not really relevant.  I meant to
inquire about the actual terms, not whether other people routinely
violate them.  But if you are limiting to data that meets DFSG, that's
not relevant.  And, the CDROM thing has the same issue regardless of any
over-the-net fetching scheme.

>>  I don't follow the release cadence point at all.  Yes, proj has
>>  releases, grid shifts have releases, and these all make their way
>>  into packaging systems, which then have releases.  Generally people
>>  want to run more recent versions, except for some people that choose
>>  to run old software (which they call LTS).
>
> You practically never want old grid data, unless you're trying to
> replicate something that happened in the past. This also brings up the
> problem of versioning of the grids, which is not handled currently.

That's also true about old buggy software; one should not want that
either but people do.

But, I'd say that there is a larger problem, which is versions of grid
data.  It seems blindingly obvious that any data that is released should
be versioned, for all the usual reasons of knowing what you have,
knowing if you have to get new data, and recording an identifier for
what you used, for repeating calculations, understanding what happened,
etc.

I would suggest fixing the lack of versioning bugs first, before getting
into any autofetch stuff.  Surely with a CDN you want files with unique
names that never change anyway.

>>  Your point about packaging systems not having grid shift packages
>>  due to size is fair; it's not clear how to deal with that.
>
> Let the packagers continue to bulk copy the redistributable grids into
> packages and place them in PROJ_LIB. Fully unzipped, its ~650mb and
> growing. They can snap their packages and versions from the CDN.

I really don't understand the emphasis on CDN.  The big issues here are
having versions and a naming scheme, as well as a standard for how files
unpack and portable representions (not varying based on CPU type, word
size, and endianness0 and having a central place to get files from.

Then, a CDN solves the problem of not being able to handle or pay for
the bandwidth to that one place by caching.  But it's just an
optimization for fetching and does not bear on the actual hard problems,
as I see it.

>> So a user running proj would not be able to write to the system proj
>> directory.  Do you envision the files going into some per-user
>> directory in their homedir?
>
> Implementation detail, but yes, something like that.

OK - so multiple users would have multiple copies.   I'm not saying
that's terrible, but bears discussing/thinking.


More information about the PROJ mailing list