[PROJ] RFC4: last chance to comment before motion

Even Rouault even.rouault at spatialys.com
Mon Jan 6 05:30:45 PST 2020


On lundi 6 janvier 2020 11:07:41 CET Nyall Dawson wrote:
> On Mon, 6 Jan 2020 at 11:05, Greg Troxel <gdt at lexort.com> wrote:
> > Nyall Dawson <nyall.dawson at gmail.com> writes:
> > > On Sat, 4 Jan 2020 at 07:32, Even Rouault <even.rouault at spatialys.com> 
wrote:
> > >> > And possibly, just do a
> > >> > post-install, first run background task which bulk downloads the
> > >> > complete set of grids from the CDN so that they're available
> > >> > already...
> > >> 
> > >> This brute force approach can work (unless we reach gigabytes of
> > >> geodetic adjustment data :-))
> > > 
> > > The other issue would be keeping track of updates to the grids. I'm
> > > guessing there's no public api for determining the grid changes
> > > available?
> > 
> > I am behind in reading everything, but I would hope that within the proj
> > world the downloaded grids have version numbers and are essentially just
> > a caching distributed filesystem to get parts of the files which could
> > have been downloaded.  So this should be the same as "does your
> > installation have proj-datumgrid-north-america-1.3.tar.gz"?
> 
> Right - but you'd still need an api for querying the latest grid
> version (or you'd have to implement this part yourself)

Regarding how files are managed on the CDN, the idea is that a given grid 
identified by a filename is only updated if it contains errors (in its data
or metadata). Which is different from releasing an improved version of a 
model, for example the successive generations of the USA, Australia or New 
Zealand geoid models that have each their own filename.

The local cache of grid chunks stores the value of a few HTTP headers (file 
size, Last-Modified, ETag) in a table, as well as the timestamp of the last 
time when it has checked them. When the current timestamp is > TTL value + 
last_checked_timestamp (where TTL value defaults to one day), the cache then 
queries again one chunk from the CDN to check if the value of those HTTP 
headers has changed or not. If they have, then it discards all cached chunks 
of that files, so they are retrieved again from the CDN.

There's no public API that exposes that logic for people who would want to
do whole file download.

Would the following function be useful ?

/** Download a file in the PROJ user-writable directory.
 *
 * The file will only be downloaded if it does not exist yet in the
 * user-writable directory, or if it is determined that a more recent
 * version exists. To determine if a more recent version exists, PROJ will
 * use the "downloaded_files" table of its grid cache database.
 * Consequently files manually placed in the user-writable
 * directory without using this function would be considered as
 * non-existing/obsolete and would be unconditionnaly downloaded again.
 *
 * This function can only be used if networking is enabled, and either
 * the default curl network API or a custom one have been installed.
 *
 * @param ignore_ttl_setting If set to FALSE, PROJ will only check the
 *                           recentness of an already downloaded file, if
 *                           the delay between the last time it has been
 *                           verified and the current time exceeds the TTL
 *                           setting. This can save network accesses.
 *                           If set to TRUE, PROJ will unconditionnally
 *                           check from the server the recentness of the file.
 * @return TRUE if the download was successful (or not needed)
 */

int proj_download_file(
  PJ_CONTEXT* ctx,
  const char* url_or_filename,
  int ignore_ttl_setting,
  int (*progress_cbk)(PJ_CONTEXT*, double pct, void* user_data),
  void* user_data);

That said, I can anticipate issues on Windows in the situation where a PROJ 
pipeline would have a grid opened from the PROJ user-writable directory,
and someone would call proj_download_file() on that file, and it would be
determined that we have to update it. We would not be able to replace
the file, because it would be already opened. So we would probably need some 
logic to use a different local filename for the most recent version, store in 
the database the most recent local filename, and the deprecated one(s), and do 
cleanup when we can actually delete files... (reading a bit on the subject,
it appears the FILE_SHARE_DELETE flag of the Win32 API OpenFile() wouldn't 
even solve the issue, as it allows to delete a opened file, but not to create 
a file with the same name while the old version is still opened, contrary to 
POSIX unlink())

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com


More information about the PROJ mailing list