[gdal-dev] Open option for vectors in the cloud

Björn Harrtell bjorn.harrtell at gmail.com
Tue Nov 5 12:28:15 PST 2019


Improvements have hit master. I suspect there are some remaining
bottlenecks though, but I currently lack the tests/means to investigate
further in short term and will appreciate feedback.

Den tors 31 okt. 2019 02:17Björn Harrtell <bjorn.harrtell at gmail.com> skrev:

> Thanks for trying out accessing FlatGeobuf via http.
>
> For the record I've been slightly aware of this particular efficiency
> problem and I aim to improve it when I can get to it, because this is a use
> case I definitely want FlatGeobuf to grab the first place. :)
>
> /Björn
>
> Den tors 24 okt. 2019 kl 20:05 skrev Even Rouault <
> even.rouault at spatialys.com>:
>
>> On jeudi 24 octobre 2019 17:42:23 CEST Rahkonen Jukka (MML) wrote:
>> > Hi,
>> >
>> > I was experimenting with accessing some vector files through http (same
>> data
>> > as FlatGeoBuffers, GeoPackage, and shapefile). The file size in each
>> format
>> > was about 850 MB and the amount of data was about 240000 linestrings. I
>> > made ogrinfo request with spatial filter that selects one feature and
>> > cheched the number of http requests and amount of requested data.
>> >
>> > FlatGeoBuffers
>> > 19 http requests
>> > 33046509 bytes read
>>
>> Looking at the debug log, FlatGeoBuf currently loads the whole index-of-
>> features array( "Reading feature offsets index" ), which accounts for
>> 32.7 MB
>> of the above 33 MB. This could probably be avoided by only loading the
>> offsets
>> of the selected features. The shapefile driver a few years ago had the
>> same
>> issue and this was fixed by initializing the offset array to zeroes, and
>> load
>> on demand the offsets when needed.
>>
>> > If somebody
>> > really finds a use case for reading vector data from the web it seems
>> > obvious that having a possibility to cache and re-use the spatial index
>> > would be very beneficial. I can imagine that with shapefile it would
>> mean
>> > downloading the .qix file, with GeoPackage reading the contents of the
>> > rtree index table, and with FlatGeoBuffers probably extracting the
>> Static
>> > packed Hilbert R-tree index.
>>
>> A general caching logic in /vsicurl/ would be preferable (although the
>> download of the 'data' part of files might potentially evict the indexes,
>> but
>> having a dedicated logic in each driver to tell which files / region of
>> the
>> files should be cached would be a bit annoying). Basically doing a HEAD
>> request on the file to get its last update date, and have a local cache
>> of
>> downloaded pieces would be a more general solution.
>>
>> Even
>>
>> --
>> Spatialys - Geospatial professional services
>> http://www.spatialys.com
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20191105/54505ddc/attachment.html>


More information about the gdal-dev mailing list