<div dir="ltr"><div dir="ltr">Even,<br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Nov 2, 2020 at 1:16 PM Even Rouault <<a href="mailto:even.rouault@spatialys.com">even.rouault@spatialys.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Sean,<br>
<br>
> We already have a way of passing "open" options for vsicurl:<br>
> <a href="https://gdal.org/user/virtual_file_systems.html#vsicurl-http-https-ftp-files" rel="noreferrer" target="_blank">https://gdal.org/user/virtual_file_systems.html#vsicurl-http-https-ftp-files</a><br>
> -random-access. What about reusing that conceptual framework and syntax?<br>
> <br>
> For example:<br>
> <br>
> "foo.csv?AUTODETECT_TYPE=YES&KEEP_GEOM_COLUMNS=NO"<br>
<br>
I actually considered that, but realized that things would get messy if you want<br>
to use that vsicurl syntax and open options...<br>
<br>
You would then have strings like<br>
<br>
/vsicurl?max_retry=5&url=<a href="http://example.com/foo.csv&AUTODETECT_TYPE=YES&KEEP_GEOM_COLUMNS=NO" rel="noreferrer" target="_blank">http://example.com/foo.csv&AUTODETECT_TYPE=YES&KEEP_GEOM_COLUMNS=NO</a><br>
<br>
and the GDALOpen() logic would have to figure out whas is the /vsicurl part and the open option part.<br>
<br>
Or we would have to URL-escape the "/vsicurl?max_retry=5&url=<a href="http://example.com/foo.csv" rel="noreferrer" target="_blank">http://example.com/foo.csv</a>" part<br>
to avoid using '?' and '&', like:<br>
<br>
/vsicurl%3Fmax_retry=5%26url=<a href="http://example.com/foo.csv?AUTODETECT_TYPE=YES&KEEP_GEOM_COLUMNS=NO" rel="noreferrer" target="_blank">http://example.com/foo.csv?AUTODETECT_TYPE=YES&KEEP_GEOM_COLUMNS=NO</a><br>
<br>
<br>
Another issue is we have connection strings like "WFS:<a href="http://example.com/wfs?SERVICE=WFS&VERSION=2.0.0" rel="noreferrer" target="_blank">http://example.com/wfs?SERVICE=WFS&VERSION=2.0.0</a>" (or actually<br>
just the "/vsicurl?max_retry=5&url=<a href="http://example.com/foo.csv" rel="noreferrer" target="_blank">http://example.com/foo.csv</a>" string mentioned above).<br>
GDALOpen() would then mis-interpret this as dataset name = "WFS:<a href="http://example.com/wfs" rel="noreferrer" target="_blank">http://example.com/wfs</a>"<br>
with open options SERVICE=WFS and VERSION=2.0.0<br></blockquote><div><br></div><div>I see.</div><div><br></div><div>I wish our data formats were more standard and less slippery and didn't need these open options. But it's true that some files are very different without the proper combination of opening options and there's a benefit to helping applications use the right combination.</div><div><br></div><div>I'm not a fan of the mix of JSON and not-JSON elements in the syntax you proposed. I think a good solution for naming datasets and including all the driver options and vsi options looks more like a URN [1] and I think we should write a GDAL RFC to standardize it. I also think that we should get some people outside of GDAL involved. Like folks from the Dask community, who might share some lessons learned from writing fsspec [2].</div><div><br></div><div>The URN or GDN version might look something like the thing below, using ?+ and ?= [3] to identify vsi and driver option sections </div><div><br></div><div>gdn:curl:csv:<a href="http://example.com/foo.csv?a=1&b=2?+max_retry=5?=autodetect_type=yes&keep_geom_columns=no">example.com/foo.csv?a=1&b=2?+max_retry=5?=autodetect_type=yes&keep_geom_columns=no</a></div><div><br></div><div>Bringing a little more order to how we name and address datasets was on my todo list at the start of the year, but then 2020 went into a spiral. I don't think rasterio's "zip+s3" etc approach is the best. We should start from scratch and come up with something excellent and expressive and broadly supported in GDAL, QGIS, rasterio, GeoPandas, GeoTrellis etc.</div><div><br></div><div>[1] <a href="https://en.wikipedia.org/wiki/Uniform_Resource_Name">https://en.wikipedia.org/wiki/Uniform_Resource_Name</a></div><div>[2] <a href="https://filesystem-spec.readthedocs.io/en/latest/index.html">https://filesystem-spec.readthedocs.io/en/latest/index.html</a></div><div>[3] <a href="https://tools.ietf.org/html/rfc8141#section-2.3">https://tools.ietf.org/html/rfc8141#section-2.3</a></div><div><br></div></div>-- <br><div dir="ltr" class="gmail_signature"><div dir="ltr">Sean Gillies</div></div></div>