<div dir="ltr">Hi Even,<br><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Oct 10, 2017 at 4:02 AM, Even Rouault <span dir="ltr"><<a href="mailto:even.rouault@spatialys.com" target="_blank">even.rouault@spatialys.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><u></u>
<div style="font-family:monospace;font-size:9pt;font-weight:400;font-style:normal">
<p style="margin:0px;text-indent:0px">Hi Sean,</p><span class="gmail-">
<p style="margin:0px;text-indent:0px"> </p>
<p style="margin:0px;text-indent:0px">> </p>
<p style="margin:0px;text-indent:0px">> It's written in</p>
<p style="margin:0px;text-indent:0px">> <a href="http://gdal.org/gdal_virtual_file_systems.html#gdal_virtual_file_systems_vsi" target="_blank">http://gdal.org/gdal_virtual_<wbr>file_systems.html#gdal_<wbr>virtual_file_systems_vsi</a></p>
<p style="margin:0px;text-indent:0px">> curl</p>
</span><p style="margin:0px;text-indent:0px">> > Starting with GDAL 2.3, options can be passed in the filename with the</p><span class="gmail-">
<p style="margin:0px;text-indent:0px">> </p>
<p style="margin:0px;text-indent:0px">> following syntax: /vsicurl/option1=val1[,<wbr>optionN=valN]*,url=http://...</p>
<p style="margin:0px;text-indent:0px">> </p>
<p style="margin:0px;text-indent:0px">> I'd like to discuss the design decisions that are being made here before</p>
<p style="margin:0px;text-indent:0px">> this gets out into the world.</p>
<p style="margin:0px;text-indent:0px">> </p>
<p style="margin:0px;text-indent:0px">> I'm uncomfortable with the way configuration is spread between environment</p>
<p style="margin:0px;text-indent:0px">> variables, config options that surface in the API,</p>
<p style="margin:0px;text-indent:0px"> </p>
</span><p style="margin:0px;text-indent:0px">Just a precision: GDAL only reads configuration options with CPLGetConfigOption(key). Those can be implictly set through environment variables of the same name or with CPLSetConfigOption(key, value).</p><span class="gmail-">
<p style="margin:0px;text-indent:0px"> </p>
<p style="margin:0px;text-indent:0px">> and also in identifiers.</p>
<p style="margin:0px;text-indent:0px">> I don't think it's a great idea to that expand the amount of configuration</p>
<p style="margin:0px;text-indent:0px">> in dataset identifiers. It's redundant, the syntax is complicated, </p>
<p style="margin:0px;text-indent:0px"> </p>
</span><p style="margin:0px;text-indent:0px">Frank answered on the main motivations.</p></div></blockquote><div><br></div><div>Yes, I understand that adding syntax tied to new core GDAL functionality can turn already-deployed software into full-fledged cloud data consumers. For cloud data providers and customers this is a big win.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="font-family:monospace;font-size:9pt;font-weight:400;font-style:normal"><span class="gmail-">
<p style="margin:0px;text-indent:0px"> </p>
<p style="margin:0px;text-indent:0px">> and it</p>
<p style="margin:0px;text-indent:0px">> dilutes the network effects of reusing identifiers in our applications.</p>
<p style="margin:0px;text-indent:0px"> </p>
</span><p style="margin:0px;text-indent:0px">Didn't understand what you meant with the above sentence.</p></div></blockquote><div><br></div><div>I mean that having multiple names for datasets in our domain, <a href="https://example.com/foo.tif">https://example.com/foo.tif</a> vs /vsicurl/<a href="https://example.com/foo.tif">https://example.com/foo.tif</a> vs /viscurl/option1=val,url=https//<a href="http://example.com/foo.tif">example.com/foo.tif</a> dilutes the power of the names and potentially reduces the network effects we could get by using fewer names. This is an abstract concern, however, and I don't want it to distract from talking about the design decisions.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="font-family:monospace;font-size:9pt;font-weight:400;font-style:normal"><span class="gmail-">
<p style="margin:0px;text-indent:0px"> </p>
<p style="margin:0px;text-indent:0px">> </p>
<p style="margin:0px;text-indent:0px">> Are there specific advantages to this</p>
<p style="margin:0px;text-indent:0px">> </p>
<p style="margin:0px;text-indent:0px">>   ogrinfo -so /vsicurl/max_retry=10,url=<a href="https://example.com/poly.shp" target="_blank">http<wbr>s://example.com/poly.shp</a></p>
<p style="margin:0px;text-indent:0px">> </p>
<p style="margin:0px;text-indent:0px">> that we can't also have with a curl-style</p>
<p style="margin:0px;text-indent:0px">> </p>
<p style="margin:0px;text-indent:0px">>   ogrinfo -so --max-retry=10 /vsicurl/<a href="https://example.com/poly.shp" target="_blank">https://example.com/<wbr>poly.shp</a></p>
<p style="margin:0px;text-indent:0px">> </p>
<p style="margin:0px;text-indent:0px">> or, better yet, in my opinion</p>
<p style="margin:0px;text-indent:0px">> </p>
<p style="margin:0px;text-indent:0px">>   ogrinfo -so --max-retry=10 <a href="https://example.com/poly.shp" target="_blank">https://example.com/poly.shp</a></p>
<p style="margin:0px;text-indent:0px">> </p>
<p style="margin:0px;text-indent:0px">> on the command line?</p>
<p style="margin:0px;text-indent:0px"> </p>
</span><p style="margin:0px;text-indent:0px">One issue with you proposal is that it would require ogrinfo (or any utility) to go from the highest level abstraction layers of GDAL to the lowest ones.</p>
<p style="margin:0px;text-indent:0px"> </p>
<p style="margin:0px;text-indent:0px">When ogrinfo is provided</p>
<p style="margin:0px;text-indent:0px">"/vsicurl/max_retry=10,url=<a href="https://example.com/poly.shp" target="_blank">htt<wbr>ps://example.com/poly.shp</a>",</p>
<p style="margin:0px;text-indent:0px">this is just a string used as a dataset name</p>
<p style="margin:0px;text-indent:0px"> </p>
<p style="margin:0px;text-indent:0px">It happily feeds it into GDALOpenEx(), which in turns proposes it sequentially to all drivers</p>
<p style="margin:0px;text-indent:0px"> </p>
<p style="margin:0px;text-indent:0px">The shapefile driver tries this string with VSIFOpenL(), which in turns iterates over all virtual file systems. The /vsicurl/ VFS happens to recognize it, manages to open the file. The shapefile driver can read the few first bytes from it and recognizes that it is a header of a shapefile, etc..</p>
<p style="margin:0px;text-indent:0px"> </p>
<p style="margin:0px;text-indent:0px">So in the current design neither the utility, nor GDALOpenEx(), or the drivers themselves really make a sense of that string. This is quite a strength at the architectural level. This also enables to pass such a string in a VRT file for example.</p></div></blockquote><div><br></div><div>Is the future of open and creation options? Do you imagine this extended to, say, block size, compression, number of threads? An RFC that discussed the scope of this and at what level of abstraction it is implemented at might be warranted? I'd be happy to participate.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="font-family:monospace;font-size:9pt;font-weight:400;font-style:normal">
<p style="margin:0px;text-indent:0px"> </p>
<p style="margin:0px;text-indent:0px">Regarding the direct use of http:// https:// , I also find it is a bit unfortunate that we can't use them directly and vsicurl machinery would be implictly used. It turns that historically we have the HTTP driver that triggers on such dataset name (ingesting the whole file into /vsimem/, and proposing it in turn to other drivers). There's also a few other drivers (DODS, etc..) that trigger on such names.</p>
<p style="margin:0px;text-indent:0px"> </p>
<p style="margin:0px;text-indent:0px">Even</p></div></blockquote><div><br></div><div>On the other hand, <a href="https://example.com/foo.tif">https://example.com/foo.tif</a> identifies only a single resource, whereas /viscurl/url=<a href="https://example.com/foo.tif">https://example.com/foo.tif</a> can identify a GeoTIFF along with all of its sidecars. I presume that the new GDAL cloud utilities like gdal_cp.py take care of the auxiliary files, yes?</div><div><br></div><div>My final concern about the virtual file opening options is the syntax. These /vsicurl/option1=val1[,optionN=valN]*,url=<a href="http://example.com/foo.tif">http://example.com/foo.tif</a> identifiers (or filenames or whatever we call them) may spread from GDAL into the wider geospatial programming domain. Speaking from my experience with Rasterio, open source Python GIS developers expect the /vsi* filenames to "just work" in all software.<font face="monospace"><span style="font-size:12px"> </span></font>Can we consider using a more standard syntax? One that has parsers already deployed everywhere?</div><div><br></div><div>For example, /viscurl?option1=foo&option2=bar&url=<a href="https://example.com/foo.tif">https://example.com/foo.tif</a> can be parsed by standard URL parsers such as Python's.</div><div><br></div><div>>>> from urllib.parse import urlparse, parse_qs</div><div><div>>>> urlparse('/viscurl?option1=foo&option2=bar&url=<a href="https://example.com/foo.tif">https://example.com/foo.tif</a>')</div><div>ParseResult(scheme='', netloc='', path='/viscurl', params='', query='option1=foo&option2=bar&url=<a href="https://example.com/foo.tif">https://example.com/foo.tif</a>', fragment='')</div><div>>>> from urllib.parse import parse_qs</div><div>>>> parse_qs(_.query)</div><div>{'option1': ['foo'], 'url': ['<a href="https://example.com/foo.tif'">https://example.com/foo.tif'</a>], 'option2': ['bar']}</div></div><div><br></div><div>That syntax gives the /vsi* filenames the form of a "reflector" URL such as we see in Google searches (for example: <a href="https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0ahUKEwjC6e7hvevWAhXmjFQKHWsHDyMQFggmMAA&url=http%3A%2F%2Fwww.gdal.org%2F&usg=AOvVaw3fbRv5TusYwkXgz2Acf2kt">https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0ahUKEwjC6e7hvevWAhXmjFQKHWsHDyMQFggmMAA&url=http%3A%2F%2Fwww.gdal.org%2F&usg=AOvVaw3fbRv5TusYwkXgz2Acf2kt</a>) and there are abundant tools and a body of knowledge about how to parse and work with these.</div><div><br></div></div>-- <br><div class="gmail_signature"><div dir="ltr">Sean Gillies</div></div>
</div></div>