[gdal-dev] Passing open options along dataset name in a string ?

Sean Gillies sean at mapbox.com
Tue Nov 3 14:20:49 PST 2020


Even,

On Mon, Nov 2, 2020 at 1:16 PM Even Rouault <even.rouault at spatialys.com>
wrote:

> Sean,
>
> > We already have a way of passing "open" options for vsicurl:
> >
> https://gdal.org/user/virtual_file_systems.html#vsicurl-http-https-ftp-files
> > -random-access. What about reusing that conceptual framework and syntax?
> >
> > For example:
> >
> > "foo.csv?AUTODETECT_TYPE=YES&KEEP_GEOM_COLUMNS=NO"
>
> I actually considered that, but realized that things would get messy if
> you want
> to use that vsicurl syntax and open options...
>
> You would then have strings like
>
> /vsicurl?max_retry=5&url=
> http://example.com/foo.csv&AUTODETECT_TYPE=YES&KEEP_GEOM_COLUMNS=NO
>
> and the GDALOpen() logic would have to figure out whas is the /vsicurl
> part and the open option part.
>
> Or we would have to URL-escape the "/vsicurl?max_retry=5&url=
> http://example.com/foo.csv" part
> to avoid using '?' and '&', like:
>
> /vsicurl%3Fmax_retry=5%26url=
> http://example.com/foo.csv?AUTODETECT_TYPE=YES&KEEP_GEOM_COLUMNS=NO
>
>
> Another issue is we have connection strings like "WFS:
> http://example.com/wfs?SERVICE=WFS&VERSION=2.0.0" (or actually
> just the "/vsicurl?max_retry=5&url=http://example.com/foo.csv" string
> mentioned above).
> GDALOpen() would then mis-interpret this as dataset name = "WFS:
> http://example.com/wfs"
> with open options SERVICE=WFS and VERSION=2.0.0
>

I see.

I wish our data formats were more standard and less slippery and didn't
need these open options. But it's true that some files are very different
without the proper combination of opening options and there's a benefit to
helping applications use the right combination.

I'm not a fan of the mix of JSON and not-JSON elements in the syntax you
proposed. I think a good solution for naming datasets and including all the
driver options and vsi options looks more like a URN [1] and I think we
should write a GDAL RFC to standardize it. I also think that we should get
some people outside of GDAL involved. Like folks from the Dask community,
who might share some lessons learned from writing fsspec [2].

The URN or GDN version might look something like the thing below, using ?+
and ?= [3] to identify vsi and driver option sections

gdn:curl:csv:
example.com/foo.csv?a=1&b=2?+max_retry=5?=autodetect_type=yes&keep_geom_columns=no

Bringing a little more order to how we name and address datasets was on my
todo list at the start of the year, but then 2020 went into a spiral. I
don't think rasterio's "zip+s3" etc approach is the best. We should start
from scratch and come up with something excellent and expressive and
broadly supported in GDAL, QGIS, rasterio, GeoPandas, GeoTrellis etc.

[1] https://en.wikipedia.org/wiki/Uniform_Resource_Name
[2] https://filesystem-spec.readthedocs.io/en/latest/index.html
[3] https://tools.ietf.org/html/rfc8141#section-2.3

-- 
Sean Gillies
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20201103/4058fe8e/attachment.html>


More information about the gdal-dev mailing list